Methods Report: Natural Language Processing in Evaluation – Reflections, lessons learned, and further analysis

In 2023, Norad’s Department of Evaluation commissioned an evaluation of cross-cutting issues in Norwegian development cooperation. The purpose of the evaluation was to provide evidence about how cross-cutting issues were implemented in the Norwegian aid administration and whether their consideration ultimately contributes to better results.

The evaluation adopted a variety of methods to generate evidence relating to four evaluation questions. One requirement of the evaluation, established by its Terms of Reference, was the development and implementation of a machine learning approach to answer the evaluation’s second question: “How is the Norwegian development administration implementing the four cross-cutting issues (Human rights, Women’s rights and gender equality, Climate change and environment, Anti-corruption) into the management of its programmes and projects? And to what extent is this implementation successful”?

As a basis for answering this question, the evaluation team was provided with a copy of Norad’s digital archives, which were contained in a data dump of approximately 400,000 files dating back to 2003. The evaluation team’s task was to develop a machine-learning approach to extract relevant project and programme documentation from this data dump, examine it for cross-cutting issue implementation, and use the results to answer the evaluation’s second question.

This learning-focused report reflects on the process that the team followed to develop a machine learning approach to answering the second evaluation question. It describes the phases that were followed, the challenges that were encountered and how these were addressed. It also identifies lessons that were learned along the way and offers practical advice and suggestions to evaluators and commissioners on the potential use of similar methods in future evaluations. The report also contains some additional analysis of the results obtained for this evaluation question that were not included within the main evaluation because of time and space limitations.