Measuring impact quantitatively - statistics for impact measurement
HostAlexander Bertram
About the webinar
About the webinar
This Webinar is the second of two parts on the topic "Measuring impact quantitatively". It is a one-hour session ideal for Monitoring and Evaluation professionals who are interested in learning more about measuring impact and using statistics. The first session discussed the 'design and collect' process; this session will discuss statistics for impact analysis.
Some of the points we will cover are:
- Designing impact evaluations
- Modelling randomness in the measurement process
- Moving from statistical significance (p-values) to measures of effect size and confidence intervals
You can also view the presentation slides.
Is this Webinar for me?
- Are you an M&E practitioner responsible for designing surveys and data collection tools for your programmes?
- Do you wish to learn more about working with quantitative data and statistics?
- Do you wish to understand better surveys and how you can demonstrate the impact of your programmes?
- Are you interested in constantly improving your systems?
Then, watch our Webinar!
Other parts of this series
Other parts of this series
This Webinar is the second of two parts on the topic "Measuring impact quantitatively".
About the Speaker
About the Speaker
Mr. Alexander Bertram, Technical Director of BeDataDriven and founder of ActivityInfo, is a graduate of the American University's School of International Service and started his career in international assistance fifteen years ago working with IOM in Kunduz, Afghanistan and later worked as an Information Management officer with UNICEF in DR Congo. With UNICEF, frustrated with the time required to build data collection systems for each new programme, he worked on the team that developed ActivityInfo, a simplified platform for M&E data collection. In 2010, he left UNICEF to start BeDataDriven and develop ActivityInfo full time. Since then, he has worked with organizations in more than 50 countries to deploy ActivityInfo for monitoring & evaluation.
Transcript
Transcript
00:00:00
Introduction and recap
Welcome to our part two follow-up session on measuring impact quantitatively. I'm Alex Bertram, the Technical Director here at BeDataDriven. I work on ActivityInfo, which is software for monitoring and evaluation. It is a database that will help you track activities, outcomes, beneficiary management, surveys, and work offline or online. Of course, it helps you measure your impact quantitatively.
This series is the second covering techniques and tools for improving the quality of your quantitative impact evaluations. Last week, we covered quite a lot. We talked about what a quantitative impact evaluation is, some of the challenges associated with quantitative measurement, statistics you can use for testing your measurements, and a technique for improving your questionnaires called cognitive interviewing.
Today, I had originally planned to cover two topics: one on causal inference and the second on effect size. However, we had so many good questions last time that I want to reserve more time for discussion. So, today we're actually going to focus just on causal inference. We will schedule a third part sometime in April where we can talk about understanding statistical significance versus effect size.
00:03:15
The fundamental problem of causal inference
Let's start talking about causal inference. This is the simple equation that we had from last week regarding a quantitative impact evaluation. We have basically three parts. First is the thing that we're measuring—that could be income, a latent psychological measurement like resilience, or a reading score on a test. We want to see the outcome conditioned on our program taking place. We want to compare that by subtracting the outcome that we would have had without the program. This gives us a delta, which is the impact.
What we are trying to achieve with this formula is to measure the impact that we can attribute to our program specifically and nothing else. We want to know what we caused to change in the world. This is what is referred to as the factual problem or the fundamental problem of causal inference. What we really want to know is the measurement of the same person with our program and without our program. But the problem is that you can't do both. You can't tweak things in different universes.
In an ideal world, we'd have different universes. In one universe, we'd give Peter Parker vocational training. At the same time, in another universe, we'd give another Peter Parker a cash transfer, and a third Peter Parker no intervention at all. Then we could compare outcomes. Of course, we can't do that because once we've done the program, we can't go back and undo it to measure what would have happened.
That means we have to find other ways of comparing. We need a substitute or a counterfeit for this counterfactual. Whatever group we pick, it is important that these groups have three qualities. First, the groups have to be the same on average; the only difference should be the fact that they participate in our program. Second, the program should only affect the treatment group. Third, the program should potentially affect both groups in the same way. Today, I am going to focus on the first one: ensuring the only difference between the two groups is participation.
00:05:45
Strategy 1: Before and after comparisons
Let's look at the first strategy for coming up with a counterfactual: looking at a before and after, or a baseline and a follow-up. Typically, you select a group of people to participate in the program, measure them at the start (baseline), and then measure them after the program to see if there is a change.
The question here is: is participation the only difference? Are we sure that this is really the only thing that's different between these two groups? Let's take a concrete example. Say we have an agricultural intervention to increase the yields of a specific crop, like maize, in the Netherlands. In 2019, production was 10,000 kilos per hectare. We run a three-year program teaching farmers better soil management practices. Three years later, in 2021, we find that grain production is up to 12,500 kilos per hectare.
On the face of it, this looks like a win. But we have to ask about the differences between these two groups. Certainly, the 2021 group has been through our program. However, there might be other differences. For example, rainfall in 2019 might have been significantly lower than in 2021. Without a sophisticated model to show the effect of rainfall, we wouldn't be able to tell if the impact is due to our intervention or the weather.
This is an example of an impact evaluation that is not very reliable because it doesn't take into account external factors. While we are doing our three-year program, other things are happening. The price of pesticides might fall, or another organization might bring a new strain of maize to market. Without controlling for all those factors, you can't confidently conclude that your program has had the impact.
00:10:30
Strategy 2: Enrolled vs. non-enrolled
Another commonly used technique is to compare the people who have enrolled in your program and those that do not. You might think this is a strategy for finding a better comparison group because non-participants are affected by the same external factors, like weather or market prices. However, this approach also has issues.
Let's take an example of providing vocational training in a refugee camp. We want to measure the impact by comparing the incomes of people who enrolled in the training versus those who did not. We survey people who didn't enroll and find they earn an average of $10 extra a month. We survey our beneficiaries and find they earn $60. It looks like a delta of $50.
But is participation the only difference? There is a very important difference: motivation. The people who chose to enroll might be different from those who did not. You might find that the people who chose to participate are already more educated or more likely to have completed secondary education. Literacy might be a factor. Or, a large percentage of the people who opted in might have been business owners in their home country.
These people had so much going for them that even if we hadn't done a training, their incomes might be higher to begin with. This is called self-selection bias. While enrolled versus non-enrolled controls for changes over time, it introduces significant bias regarding the background and motivation of the participants.
00:14:45
Multiple regression analysis
I want to introduce a statistical technique called multiple regression that you can use to manage some of these external changes. Multiple regression allows you to examine the impact of multiple independent variables simultaneously. We can look at the impact of the variables we have and see what relative effect they have on outcomes.
I've created a vocational training survey in ActivityInfo. We ask for the person's name, whether they participated in the training, and then questions about external factors: "Before leaving your home country, what formal education did you complete?" and "Did you own a business?" Finally, we measure the outcome variable: "How much did you earn in the last month?"
I can use an R script to connect to this data and run a linear model. If we look at just participation, the estimated impact might be $31. But if I make another model taking into account business ownership and education, we might see different results. For example, we might find that owning a business actually has a specific effect distinct from the training. This kind of insight is what multiple regression analysis can help you find.
However, this is not a perfect fix. You need to be able to identify and measure all of the external factors ahead of time. There is always a risk that there is an external effect you didn't know about. Secondly, you have to have enough data and enough variation. For example, in the agricultural study, if everyone in the country is affected by the same rainfall, you can't use regression to control for it because you don't have data on farmers with different rainfall levels.
00:21:30
Strategy 3: Randomized assignment
The next strategy is randomized assignment. The idea is to eliminate the bias involved in self-selection. Instead of opening the program to whoever comes first, we randomly select people. This way, we have two groups: the people randomly selected to participate and the people who were not selected.
We would expect these groups to look much more similar on average. For example, we might find that 20% of the treatment group has secondary education compared to 21% in the control group. Because we are measuring them at the same time and have removed the self-selection bias, the comparison is much more valid. If this is possible in your program, this is really the ideal.
However, it is not always possible. Some interventions are not on an individual level, like a radio campaign. There might be ethical problems; you cannot randomly exclude people from life-saving HIV treatment. Or, if you are building roads nationwide, you can't randomly select nations.
Practically, you start with an initial enumeration to identify potential beneficiaries with well-defined eligibility criteria. Then, you randomly select people to participate and randomly select a comparison group. Finally, you conduct an evaluation of both groups. In ActivityInfo, you can manage this by having an initial survey, defining eligibility, exporting to Excel to assign groups using a random function, and then importing that selection back to link with your evaluation surveys.
00:27:45
Strategy 4: Differences in differences
The last technique is differences in differences. This is for cases where a randomized trial is not possible. For example, a media campaign against domestic violence. You can't randomly assign who hears a radio ad.
The idea is to pick two different districts that are similar socio-economically. We measure both of them before and after the campaign. In our selected district, we do a baseline, run the campaign, and measure afterwards. Suppose we find that domestic violence rates actually went up by 1%. This looks like a failure.
However, if we look at our comparison group district, we might find that during the same period, there was a huge increase in domestic violence—perhaps due to COVID-19 lockdowns—of 3.5%. By comparing the difference in our district (+1%) to the difference in the comparison district (+3.5%), we can see that the increase in our district was actually lower than what likely would have happened without the program. This demonstrates impact by using differences within differences.
00:31:30
Q&A and discussion
Question: Can you further explain the issue with before and after comparisons? Answer: The main issue is that you cannot be sure that the change you see is due to your program. External factors, like weather in an agricultural project or economic changes, could be the real cause of the change. Without a control group, you cannot rule these factors out.
Question: What type of statistical analysis should one use with the difference in difference analysis? Answer: The basic analysis is subtraction. You are subtracting the change in the control group from the change in the treatment group. However, it is important to take into account sampling error. You need to calculate confidence intervals to determine if the difference is statistically significant or just due to random noise. We will cover this in the next webinar on effect size.
Question: Is the process of random assignment and random selection technically different? Answer: In this context, I am using them to mean the same thing. You are randomly choosing people into your treatment group and your comparison group. The important distinction is to define your eligibility criteria first to ensure both groups are drawn from the same eligible population.
Question: Regarding the difference in difference example, can we say the media campaign didn't have much impact if the rate still went up? Answer: Ideally, we would want to go back in time and see what happened in that specific district without the campaign. Since we can't, we use the comparison district. If the comparison district went up by 3.5% due to COVID, and ours only went up by 1%, we can argue the campaign prevented a larger spike. However, this relies on the assumption that the two districts are truly comparable and affected by external factors in the same way.
Question: We often face funding and time constraints that prevent randomized assignment. How do we handle that? Answer: Cost is always a factor. Quantitative impact assessments with randomized groups are expensive. It is useful to think about when not to do one. If a method is already well-proven (like antenatal care reducing mortality), you might focus on monitoring implementation rather than proving impact again. If you don't have the resources to do a valid impact assessment, it might be better to focus on outcome evaluation or monitoring rather than producing unreliable numbers.
Question: What if two implementing partners target the same communities, but one implementation is incomplete? Answer: This relates to monitoring implementation fidelity. If a program isn't implemented as designed (e.g., people didn't receive the tools they were supposed to), measuring the impact is difficult. You could use multiple regression to tease out the impact of specific components (tools vs. training), but you first need to understand why the implementation differed.
Question: We can randomly select participants, but they still have to choose to join. How do we handle that bias? Answer: Even with random selection, if people can refuse to participate, self-selection bias returns. However, because you did the random selection from an initial survey, you hopefully have data on the people who were selected but chose not to join. You can use that data to identify variables correlated with participation (e.g., education, wealth) and control for them in your analysis.
Thank you all for joining. We will schedule another session to cover effect sizes and confidence intervals. I hope to see you there.
Sign up for our newsletter
Sign up for our newsletter and get notified about new resources on M&E and other interesting articles and ActivityInfo news.