Part 1 of 3
Thursday March 10, 2022

Measuring impact quantitatively - designing and collecting data

  • Host
    Alexander Bertram
About the webinar

About the webinar

This Webinar is the first of two parts on the topic "Measuring impact quantitatively". It is a one-hour session ideal for Monitoring and Evaluation professionals who are interested in learning more about measuring impact. The first session will focus on the 'design and collect' process and the second session on March 17th will discuss statistics for impact analysis.

Some of the points we covered were:

  • What is a quantitative impact evaluation?
  • Sources of measurement error
  • Statistics for reliability
  • Using cognitive interviewing to improve survey instruments
  • Designing experiments

You can also view the presentation slides.

Is this Webinar for me?

  • Are you an M&E practitioner responsible for designing surveys and data collection tools for your programmes?
  • Do you wish to learn more about working with quantitative data?
  • Do you wish to understand better surveys and how you can demonstrate the impact of your programmes?
  • Are you interested in constantly improving your systems?

Then, watch our Webinar!

Other parts of this series
About the Speaker

About the Speaker

Mr. Alexander Bertram, Technical Director of BeDataDriven and founder of ActivityInfo, is a graduate of the American University's School of International Service and started his career in international assistance fifteen years ago working with IOM in Kunduz, Afghanistan and later worked as an Information Management officer with UNICEF in DR Congo. With UNICEF, frustrated with the time required to build data collection systems for each new programme, he worked on the team that developed ActivityInfo, a simplified platform for M&E data collection. In 2010, he left UNICEF to start BeDataDriven and develop ActivityInfo full time. Since then, he has worked with organizations in more than 50 countries to deploy ActivityInfo for monitoring & evaluation.

Further reading

Further reading

Some of the resources we mentioned in the webinar are:

Transcript

Transcript

00:00:00 Introduction and learning objectives

Thank you so much, Jane. I'm really excited about these two sessions that we have planned this week and next week looking at some quantitative methods. Of course, quantitative methods are not the only way to do evaluation; there are some great qualitative methods out there. But for those of you who are working on quantitative methods, I wanted to share some tools and approaches that we've encountered, that we've worked with some of our users of ActivityInfo over the last couple of years. These are things that can be quite useful in conducting quantitative impact evaluations and some methods that I would love to see more widely used in this field.

Because there's so much to talk about in this domain, we split it up into two weeks. Today we're going to start from the beginning, talking about what is a quantitative impact evaluation, why would you do one, and why would you not do one. We will look at some measurement challenges. I think that's one of the things that we really want to communicate: there are lots of ways that this can go wrong, so it's worth paying attention to these potential pitfalls. We're going to look at statistics for reliability, some tools for checking your measurements before you start analysis, and another tool that I think is really important: using cognitive interviews to improve survey instruments.

The last thing that we'll do today is introduce the problem of designing experiments. How do you design a quantitative evaluation? This will lay the groundwork for what we're going to talk about next week, where we'll go more into some of the statistics and some of the methods for teasing out causal relationships—really cause and effect—from quantitative data. But today is going to be a lot about measurement.

Some of the learning objectives I hope you'll get from these two hours include an understanding of what a quantitative impact assessment is and when to do one. Most importantly, I want you to understand the challenges and potential pitfalls around such an evaluation. The audience here is M&E professionals, and when you're working on M&E, you have to be the jack of all trades. I hope that this webinar will at least make you aware of some of the tools that are at your disposal so that you know where to go and what kind of tools are out there when you need them.

00:04:15 Understanding quantitative impact evaluation

So, what is a quantitative impact evaluation and where does this fit in with monitoring and evaluation in general? I suspect that most of you are familiar with the results chain diagram, or theory of change. It involves understanding how you go from an idea or problem statement to an intervention that brings change to the world. You go from inputs (time, money, resources) to outputs (trainings, distributing goods, vaccines), to outcomes, and finally impact.

To keep this concrete, let's look at an example from when I worked in eastern Congo with returning communities coming back from the conflict. One of the main problems was that children would fall behind in school because there was a high rate of waterborne diseases. Diarrhea kept kids home and they lost a lot of learning time. Based on this problem statement, you might come up with a results chain that looks first at outputs. If the issue is clean water, let's dig some protected wells by the schools where kids are spending most of their day. This, we believe, is going to lead to the outcome of lower rates of waterborne diseases, which in turn will lead to lower absenteeism, and the ultimate impact we're looking to achieve is that reading scores or educational attainment improves.

If you think about where impact evaluation sits in this, many of these stages are under our control. For example, digging wells at the schools is a question for monitoring. We want to make sure if we plan to dig 100 wells that we actually dig them, that they're done on time, and that our team is doing them correctly. That falls under monitoring. Once we've done our part, we have this theory that the work we do is going to lead to improved educational outcomes. That is the impact, and that's what we're trying to evaluate with an impact evaluation.

It is worth thinking about this in two general groups. In the beginning, through monitoring, we're concerned about the quality of implementation. With impact evaluations, we're concerned about the quality of program design. Did we pick the right thing to do? Did we understand the problem correctly? Maybe the problem with absenteeism isn't the main problem; if there are no textbooks or qualified teachers, then this intervention isn't going to have the impact we hope for. An impact evaluation is a bit like asking: if we execute perfectly, is that still going to have the impact that we expect?

00:08:30 When to conduct an impact evaluation

We want to evaluate the changes directly attributed to the program and we're interested in those ultimate outcomes. The classic impact evaluation formula involves comparing two numbers. We look at the outcome with our program present and subtract what would have been if we hadn't done our program. The result, we hope, is positive. That is the impact that we can attribute to our program.

However, quantitative evaluations can be very expensive. They require significant technical expertise, significant attention from the program team, and data collection costs can be high. You want to think about when it is worth investing in this kind of quantitative evaluation. Typically, there are a couple of situations where quantitative impacts are relevant.

First, if you're doing a program for the first time, you might not be confident that the theory of change is correct. In an emergency situation, like providing clean water to reduce cholera, we have a firm handle on that link, so spending resources on an impact assessment might not be appropriate. But if you're doing something like a market intervention that has never been done before, you might want to test that theory of change.

Second, when you are looking at replicating a program. Maybe you have a new program that worked very well in one context and you want to test if it could be applied more widely. For example, when cash-based assistance in emergencies was relatively new, people were interested in testing the impact in several contexts. Finally, if a program is strategically relevant but requires a lot of resources to expand, having a quantitative impact assessment could be a powerful tool for fundraising and making the argument to donors.

00:11:45 Challenges in quantitative measurement

We've talked about why a quantitative impact evaluation can be useful, but that first word is really important: "Quantitative." You need to assign a number to an outcome or an impact. That process of measurement is a very difficult and fraught process. I want to spend time thinking about how to come up with numbers and what to be aware of.

We can think about different categories of measurements. Some impacts or outcomes we can measure through direct observation. For example, in health interventions, measuring morbidity or mortality is something that can be counted and verified. It is visible to the naked eye and quite objective. Programs designed to increase income might be harder to collect data for, but it's at least an objective figure.

A second category relies on self-reporting. These aren't things we can directly observe, so we rely on surveys to ask people about school attendance, family planning, purchase histories, or safe sex practices. This becomes more complicated because we rely on people to understand the questions and decide how to answer.

A third category, even more complicated, is indirect measurements. There might be things we can't actually measure or even ask people to give us a measurement of directly. For example, psychological resilience, attitudes, opinions, or religious tolerance. How do you quantify that? You might decide that you don't have a good way to do it, which might be a good reason not to do a quantitative measurement.

00:15:20 Sources of measurement error

Generally, when we're talking about data that comes from surveys, I want to highlight four categories of error. Last year we talked about sampling error (not being able to talk to everybody) and non-response bias. A third category to add is response error. This happens when we talk to somebody, but the answer we get is not entirely correct.

This can happen because we phrased the question badly, or the person might misunderstand the question. If we're asking about past behavior, a person may not be able to recall that correctly. For example, if you ask me how many times I exercised in the last two weeks, it’s not easy to just recall immediately. You have to take into account people's ability to recall.

Another source is motivation. If it takes somebody five minutes to come up with a good answer, they might just give you the first thing that comes to mind. Finally, there is social desirability bias. If you ask somebody about things that society frowns upon, you may not get the right answer. This effect gets more pronounced as we deal with sensitive subjects like family planning or safe sex practices.

00:17:50 Measuring latent variables and statistical reliability

In this next section, I want to zoom in on the class of indirect measurements—measuring things where we can't get a self-reported answer and have to find other ways to measure them. These are often called latent variables. For example, a paper on knowledge and attitudes towards family planning in Ethiopia used a series of questions asking people to agree or disagree with statements. They scored these on a scale and took an average to get a single number to measure the attitude.

When doing these kinds of measurements, we can talk about statistical reliability or internal consistency. Think of it like a panel of judges on a talent show. If four judges work well together, they're going to agree more often than not. If you have a set of questions that are internally consistent, the scores should move together.

There is a statistic for this called Cronbach's Alpha. It is a scale from zero to one on how internally consistent a set of questions are. If the questions work well together to measure the same hidden thing, you will get an alpha score closer to one. If the scores are all over the place, the Cronbach's Alpha will go down, telling you that the questions are not a good fit. Before undertaking research, you can look at past studies to see what kind of alpha they achieved for their questions.

00:22:10 Practical demonstration: calculating Cronbach's Alpha

I want to take a quick break from theory and look at something in practice using ActivityInfo and R to calculate Cronbach's Alpha. I've set up a dummy database in ActivityInfo with a survey containing six questions on attitudes toward women in policing. I added calculations for scoring each of these questions.

After collecting data using the mobile app, I can connect ActivityInfo to R to get a quick calculation. Using the ltm package in R, I can query the data from ActivityInfo. I subset the data to just the six scores and use the Cronbach's Alpha function. In this example, I have an alpha of 0.954 across 143 samples, which is pretty good. I can be confident that these six items seem to be measuring the same thing. If you get a low number, you might wonder if a specific question is misunderstood or if you need to increase the number of items.

00:26:45 Using cognitive interviewing to improve surveys

Next, I want to turn to another tool that's very useful for identifying and reducing response error: cognitive interviewing. This tool has been around since the 1980s. It is a tool for evaluating sources of response error in survey questionnaires based on the cognitive theory of how people respond to questions.

The process begins with comprehension—understanding the language and terms. Then comes recall—retrieving information from memory. You have to look at how feasible that is. Then there is the decision process—deciding whether to give an accurate answer, especially regarding sensitive information. Finally, there is the response process—mapping the answer in their head to the choices given.

Cognitive interviewing helps figure out what is going wrong at each step. A common technique is verbal probing, where you ask follow-up questions. You ask the survey question, get the answer, and then ask probes like, "Can you tell me in your own words what that question means to you?" or "How did you go about deciding which answer to pick?"

For example, a study on Spanish speakers living with HIV found that a translation of "emotionally exhausted" was understood by participants as referring to a physical act. Through cognitive interviewing, researchers were able to rephrase and improve the question. This is especially important in humanitarian work where designers are often far removed from the people answering the questions. It doesn't take a large sample; talking to 20 to 50 people, or even just a few, can help find translation or comprehension problems.

00:33:15 The counterfactual problem

We're going to wrap up and set the stage for next week. We'll come back to the causal impact formula where we are looking for a delta, a change. We are comparing what happens when we do the program versus what happens if we don't. This is the counterfactual problem. We cannot do both our intervention and not do our intervention for the same participant. We have to search for counterfeit counterfactuals—substitutes for what we really want to know but can't measure. Options include randomized controlled trials, before and after comparisons, and comparing similar groups. That is what we will look at in next week's webinar.

00:35:00 Q&A and conclusion

I want to do a quick learning check. Thinking about the program you are working on now, would investing in a quantitative impact evaluation be appropriate? What are some sources of measurement error relevant to your work? And what are two tools that can help improve quantitative measurement?

Looking at the questions, someone asked about the dataset needed to measure impact at P=0 or P=1. P=0 refers to the counterfactual—what would have happened if we hadn't done the program. You can't collect this directly from the same people who received the program, so doing a baseline is one way, but it's not foolproof because other things change over time.

Regarding the difference between cognitive interviewing and normal data collection: normal data collection is asking the question to get the data for your program. Cognitive interviewing is done before you start collecting data to see if people understand your questions. For example, checking if people understand what "ORS" stands for.

Sandra mentioned recall bias and social desirability bias in her work with MSME owners. She noted that they had issues with owners recalling topics of technical support because the survey wasn't conducted immediately after. They solved this by prompting the respondents with general topics to help them remember.

Ali, working on a school feeding program in Iraq, mentioned that the perception of questions needs to be meaningful and align with tribal traditions and the local language. They are conducting training for their staff to deal with the common meanings of the community there.

Thank you all so much for joining us. Please check out ActivityInfo.org and sign up for a free trial. Be sure to sign up for next week's webinar, which will be a continuation of these topics, looking at designing experiments, baselines, endlines, and randomized controlled trials. Have a great week!

Sign up for our newsletter

Sign up for our newsletter and get notified about new resources on M&E and other interesting articles and ActivityInfo news.

Which topics are you interested in?
Please check at least one of the following to continue.