Part 3 of 3

Thursday July 6, 2023

Implementation of evaluation in humanitarian assistance

Host

Eliza Avgeropoulou

About this session

This is the third session of the series “Evaluation in humanitarian assistance”. It is a one-hour session ideal for Monitoring and Evaluation or other professionals who are interested in the implementation phase of an evaluation.

In summary, we explore:

Implementing an evaluation:

What are the main evaluation designs?
What are the main sampling methods?
What are the main field methods?
What type of analysis corresponds best to the implementation?
Case study: Using ActivityInfo for survey implementation

View the presentation slides of the Webinar.

Is this Webinar for me?

Are you an entry/intermediate level M&E or IM practitioner who wishes to better understand the steps included in an evaluation for humanitarian assistance and more specifically the implementation phase?
Are you assisting in evaluations in your organization, or is that a role you would like to take on and you would like to get a deeper understanding that can facilitate your work?

Then, watch our webinar!

Other parts of this series

The Monitoring and Evaluation webinar series “Evaluation in humanitarian assistance” is a series of three live sessions addressed to M&E professionals working in humanitarian operations. These webinars comprise a course which will help you get a comprehensive understanding of all the steps involved in evaluation in humanitarian assistance including: introduction, planning and design and implementation.

The series is addressed to entry/intermediate level professionals and it is highly recommended that you join or watch the recordings of all webinars in their consecutive order so as to benefit from the complete course.

About the Trainer

Ms Eliza Avgeropoulou earned her BSc from Athens University of Economics and Business, and her MSc degree in Economic Development and Growth from Lund University and Carlos III University, Madrid. She brings eight years of experience in M&E in international NGOs, including CARE, Innovations for Poverty Action and Catholic Relief Services (CRS). The past five years, she has led the MEAL system design for various multi-stakeholders’ projects focusing on education, livelihoods, protection and cash. She believes that evidence-based decision making is the core of high quality program implementation. She now joins us as our M&E Implementation Specialist, bringing together her experience on the ground and passion for data-driven decision making to help our customers achieve success with ActivityInfo.

Transcript

00:00:01 Introduction

Welcome to the third webinar of our three-webinar series. Today we are going to go a step further and build upon the second webinar that focused on evaluation planning and design. We are going to go more in detail on how we can implement evaluation.

We will start with a brief recap of some points mentioned in previous sessions that we consider important reminders for today's session. Then we will go into evaluation designs, specifically the different designs and how to select the most appropriate ones. We will discuss important thoughts regarding bias, which starts when we plan evaluation and continues into implementation, data collection, method selection, analysis, and recommendations.

We will cover sampling, looking at the most common sampling techniques and the growing challenges we face in sampling in humanitarian action. We will discuss field methods, qualitative and quantitative analysis, and recommended analysis models based on the type of evaluation research. Finally, we will present a case study using an example of an evaluation of an education program in Western Tanzania to see how we can use ActivityInfo to professionalize an evaluation in this context, followed by a Q&A session.

Key information to remember includes that evaluation questions always determine the evaluation design, affecting data collection, analysis methods, and sampling approaches. Evaluation has two main purposes: accountability and learning. Learning is the process through which experience and reflection lead to changes in behavior, while accountability involves taking into account the views of different stakeholders, primarily the people affected by authority.

We can categorize questions into five different categories: descriptive, normative, causal, evaluative, and action-oriented. We also need to keep in mind the distinction between qualitative and quantitative methods. Quantitative methods collect numerical data to measure specific amounts, while qualitative methods describe judgments, opinions, perceptions, and attitudes. Lately, evaluators advocate for the use of mixed methods, as each type has strong points. Findings are usually based on multiple sources and information gathered by different field methods, known as triangulation.

00:06:43 Evaluation designs

In a step-by-step approach, we first choose the appropriate evaluation design based on our strategy. This goes hand in hand with the sample size—identifying how many units or people we need for consistent results. We then identify the appropriate method for collecting information and determine the type of analysis needed.

The first design is experimental design. It is the least frequently used in humanitarian action. It involves the use of a treatment or control group before and after the intervention with random assignment. Random assignment means each unit has an equal chance of being assigned to the treatment or control group. This design raises ethical questions but is considered rigorous because random sampling guarantees that differences in outcomes are due to the intervention, not other factors like skills or financial status. The most common experimental design is the Randomized Control Trial (RCT). In the humanitarian sector, we often modify this to compare groups receiving different types of assistance, such as cash versus food, rather than comparing assistance against no assistance.

The second type is non-experimental design, which is very commonly used in the humanitarian sector. We do not use comparison or control groups. A common example is the case study. It is used because it is less demanding, low-cost, and flexible. However, we need to be conscious of moving beyond non-experimental designs to consider designs better suited for specific questions, especially as donors exert pressure for more rigorous evaluations.

The third type is quasi-experimental design. Here we have comparison groups, but without random assignment. A common example is comparing "doers" and "non-doers"—those who participated in an intervention versus those who did not. However, using comparison groups in humanitarian action comes with a high level of contamination due to the number of actors and support networks. We cannot be 100% sure the difference in outcome is due to the intervention, as confounding factors may exist.

To identify which evaluation design to use, ask whether to use control groups and if you have random assignment. If you have random assignment, it is an experimental design. If you have a comparison group but no random assignment, it is a quasi-experimental design. If there is no comparison group, it is a non-experimental design. For example, causal questions usually require experimental designs to attribute causality. If a question asks if participation led to a higher probability of a behavior compared to non-participants, it drives us toward a comparison group. No evaluation design is perfect; we are always constrained by time and money. The chosen options and limitations should always be noted in the inception and final reports.

00:16:09 Bias and engagement

Bias is a cross-cutting issue. Bias refers to threats to causal relationships. Sources of bias include the choice of design, methods, and sampling approaches. Selection bias occurs, for example, if we use a beneficiary list for sampling but the most vulnerable people were not included in the project. Sampling bias happens when the selection method targets a specific population. Bias in collection can occur if, for instance, we use only male enumerators to ask sensitive questions to female respondents.

Evaluator bias is another issue, where an evaluator may unconsciously lean towards positive results for the organization paying them. Reports state that over 50% of evaluations rate performance as good. This is often referred to as the "elephant in the room," where personal biases prevent us from seeing what is apparent to others.

The second cross-cutting consideration is engagement with the affected population. While we have improved, we still have a long way to go. We are moving away from seeing beneficiaries purely as recipients and towards getting their feedback to design centered interventions. Evaluation can be seen as top-down or bottom-up. In reality, we often position beneficiaries as primary data sources, consulting them via interviews and focus group discussions. Engagement should start at the planning phase, considering which questions to ask the affected population and which groups need to be consulted.

00:23:32 Sampling

Sampling is the selection of a population subset because we cannot interview everyone. The sampling strategy goes hand in hand with deciding the sample size. Sampling ranges from simple cases, like convenience sampling, to complex cases, like random sampling from social strata. A mixed method approach is often preferred, combining non-random sampling for qualitative information and random sampling for quantitative data.

Non-random (non-probability) sampling selects the sample based on specific properties. For example, purposively selecting female-headed households. Non-random samples are not representative of the population as a whole, so results cannot be generalized. It is appropriate for qualitative methods, exploratory decisions, limited access, small samples, or case studies.

Random (probability) sampling draws samples randomly, ensuring all units have an equal chance of selection. This allows differences in outcomes to be attributed to the program rather than personal characteristics. These samples are representative. True random sampling requires a list of the whole population (sampling frame). In humanitarian action, we often lack this and use "pseudo-random" sampling based on beneficiary lists.

We must avoid over-reliance on convenience sampling, as it is a weak means of inferring patterns. Small sample sizes with random sampling can also lead to misleading estimates. We must be transparent about the limitations of our sampling strategy. Generally, purposive sampling is recommended for small-scale qualitative studies, while random sampling is for large populations where generalization is needed.

00:36:06 Field methods

Evaluation questions determine the methods, though practical considerations of budget, time, and ethics also play a role. Design refers to the structure, while methods refer to how data is gathered.

Broadly, field methods include:

Focus Group Discussions (FGDs) are frequently used and effective with affected populations. They require a space, an experienced facilitator, and a note-taker. Participants can be selected randomly or purposively (e.g., specific genders or ages). A good practice is to have a team including a translator, note-taker, observer, and facilitator to maintain balance and ensure accurate data capture.

Surveys are ideal for gathering quantitative information. They use a structured protocol and often digital devices to reduce data entry errors. Surveys usually require random or non-random sampling and a large sample size, making them expensive. Essential steps include pre-testing the survey, back-translating tools, and thoroughly training enumerators to ensure valid responses.

00:43:09 Analysis

Analysis looks at relationships within the data. We analyze primary data (collected for the evaluation) and secondary data. We can categorize analysis by sample size: big samples (quantitative) and small samples (qualitative).

For small samples, we use coding. Coding is a process of assigning keywords or themes to sentences or paragraphs to identify emerging themes across responses. It can be deductive (determining codes beforehand) or inductive (letting themes emerge).

For big samples, we use statistical analysis. This includes descriptive statistics (summarizing the population) and inferential statistics (making inferences about the population or hypotheses).

The recommended analysis depends on the question type:

00:50:12 Case study: ActivityInfo

We used a case study based on an article evaluating an education program in a refugee camp in Western Tanzania to demonstrate how ActivityInfo can professionalize the process. The evaluation covered formal and non-formal educational activities, looking at process, quality, impact, and efficiency.

The methodology involved a non-experimental approach with mixed methods. Quantitative evaluation focused on core indicators like enrollment and performance, while qualitative approaches explored processes. Field methods included desk reviews, questionnaires for heads of schools and pupils, and interviews.

We recreated this setup in ActivityInfo. We created forms for data collection during project implementation (monitoring), real-time monitoring reports to facilitate targeting and sampling, and specific forms for evaluation data collection.

Using a system like ActivityInfo aligns monitoring and evaluation. It facilitates the analysis of secondary data for sampling, allows real-time tracking of fieldwork, enables the use of collection links for respondents, and facilitates quantitative analysis through integration with statistical software like R.

01:01:52 Q&A

What are the common challenges in control groups in RCTs related to evaluation setups? The first challenge is ethical considerations; it is difficult to assign people to a control group that receives no assistance. Alternatives include comparing different types of assistance (e.g., cash vs. vouchers) or using "doers" vs. "non-doers." However, humanitarian environments are unstable and risky. Challenges include contamination (people receiving services from multiple actors), attrition (losing contact with populations due to movement), and difficulty comparing pre- and post-intervention states.

What is the difference between purposive and convenience sampling? Purposive sampling involves selecting specific people from a list because you want their specific opinion (e.g., female-headed households). Convenience sampling involves interviewing whoever is available or picks up the phone from a list. Convenience sampling is the least credible approach.

What is the difference between simple and cluster sampling? In simple random sampling, you have a list of the entire population (e.g., a census) and choose randomly from it. In cluster sampling, you randomly select groups or locations (like villages) and then perform the survey within those clusters.

What can be done if there is contamination in the field? First, you must monitor to understand the extent of the contamination. If the groups cannot be compared anymore (e.g., they interacted extensively), you may need to rely on statistical controls if the sample is large enough. If contamination is excessive, you must acknowledge the limitation and be careful with interpretation. Triangulation with other data sources is a good practice to validate results in these cases.

Is an analysis plan necessary? Yes, you always need an analysis plan, even under the Monitoring and Evaluation plan, prior to data collection, to identify the types of analysis you will perform.

Sign up for our newsletter

Sign up for our newsletter and get notified about new resources on M&E and other interesting articles and ActivityInfo news.