Part 1 of 2

Thursday May 27, 2021

A guide to choosing sample sizes for M&E practitioners

Host

Fay Candiliari
Panelist

Alexander Bertram

About the webinar

This Webinar is a one-hour session ideal for Monitoring and Evaluation professionals who are interested in learning more about choosing samples for their surveys. During the Webinar we discuss some key points to keep in mind when choosing a sample for our survey. Some of the key points we cover are:

Baseline surveys
Mid term/final evaluations
Oversampling for groups of specific interest
Taking into consideration the costs of stratified and cluster sampling
Understanding sampling and non-sampling error
Introduction to using confidence intervals in the analysis

We also reply to a series of questions during a 30 minute Q&A session.

View the presentation slides of the webinar

Calculators

Is this Webinar for me?

Are you an M&E practitioner responsible for designing surveys and data collection tools for your programmes?
Do you wish to understand better how to choose a sample for your surveys and what to keep in mind while doing that?
Do you want to ask questions regarding sample sizes or work with a sample calculator?

Then, watch our Webinar!

About the Speakers

Mr. Alexander Bertram, Technical Director of BeDataDriven and founder of ActivityInfo, is a graduate of the American University's School of International Service and started his career in international assistance fifteen years ago working with IOM in Kunduz, Afghanistan and later worked as an Information Management officer with UNICEF in DR Congo. With UNICEF, frustrated with the time required to build data collection systems for each new programme, he worked on the team that developed ActivityInfo, a simplified platform for M&E data collection. In 2010, he left UNICEF to start BeDataDriven and develop ActivityInfo full time. Since then, he has worked with organizations in more than 50 countries to deploy ActivityInfo for monitoring & evaluation.

Transcript

00:00:00 Introduction

Thank you so much, Fay, and thanks to all of you who have joined us here. We are really overwhelmed by the response. I hope you find it useful and worth coming. Today we have a lot of information to cover. I went through this yesterday and had to trim a few things to fit it within an hour, but we do have two really concrete learning objectives.

The goal is that after this hour, you will have some tools to choose sample sizes for two different kinds of surveys: snapshots, like a KAP survey or a needs assessment survey, and sample sizes for measuring change over time. The second objective is that you are aware of and have some tools to account for the costs of stratified and clustered samples. Due to time constraints, we are going to focus on clustered samples, though some of it will also be applicable to stratified samples.

This is a webinar for practitioners, so we are not going to go into the math or dive too much into statistics or probabilities, which is where all of this comes from. It is focused on using these tools. However, it is important that you have a good understanding of a couple of key concepts. I have listed those here on these slides. We are going to introduce all of them today, but fair warning: if you are seeing these for the first time, it might be worth coming back to the video and watching it again because we are going to go quite quickly through these basic sampling ideas.

00:02:15 Sample sizes for snapshots

Let's get started. We are going to start with some concrete examples and then work our way down through theory and the tools. We will begin with the idea of looking for sample sizes for KAP surveys and needs assessments. These are both surveys where we want a snapshot, a moment in time where we want to provide a quantitative snapshot of knowledge, attitudes, and practices of a population, for example, and then use these results to design interventions.

Let's look at an example of a KAP survey and how the results would be used. For example, maybe we have a goal of increasing COVID-19 vaccination rates in a rural population. We might run a KAP survey to find out what the barriers are to becoming vaccinated. Quantitative surveys can give you the idea of the scale of the problem. Qualitative might identify different problems to start, but a quantitative survey can help identify if this is a small problem or a big problem.

If we run a survey and find that only 15% of respondents knew that the vaccine was available for free in this area, and that 5% of respondents said they could not afford to travel to the vaccination site, we might conclude that the big problem here is knowledge. People just don't know that it is available for no cost. If we focus on awareness raising, we can increase vaccination rates. On the other hand, if you did a survey and found that 95% of respondents knew that the vaccine was available for free, but 50% of the respondents said that they could not afford to travel to the vaccination site, then our conclusions would be very different. We would conclude that the bigger problem seems to be transportation, so we should focus on mobile vaccination sites.

In a very similar way, needs assessment surveys provide a snapshot and can be used to design interventions to address the most pressing vulnerabilities. For example, in a humanitarian context, if we are seeking to reduce morbidity and mortality of a large group of newly displaced IDPs, we might not know where to put our resources. A needs assessment survey can help identify the scale of different problems, such as what percentage have adequate shelter or the scale of waterborne diseases.

00:05:30 Key concepts in sampling

The first concept we want to introduce begins with the idea of a population. When we talk about a population in sampling, we are talking about the entire group of people about whom we want to draw conclusions. In our IDP example, that was a group of 10,000 IDPs. When we say sample, we are talking about the specific group that we will collect data from. The sampling method is the process by which we choose which members of the population are included in the sample.

The million-dollar question is: how can we draw conclusions about a population of 20,000 households based on a sample of only 200? To tackle that question, we need two more concepts: sample estimate and error. A sample estimate is the result that we get from our survey. The error is the difference between our sample estimate and the population parameter, or the true value.

Broadly speaking, we think about sampling error and non-sampling error. Sampling error is the error we get that is just the difference between our population and our sample; it is a result of not being able to talk to everybody. Non-sampling error is basically everything else, such as non-response bias, interviewer error, translation problems, or social desirability bias. The big difference is that we can estimate sampling error using mathematics, whereas non-sampling error is difficult or impossible to estimate.

To estimate sampling error, we turn to probability theory. This is based on chance, like flipping a coin. Probability theory gives us tools to answer questions like, "What are the chances that I flip a coin three times and get three heads in a row?" We use this math to determine the likely error we get. I am going to rephrase the question: What is the probability of getting a sample estimate that has an error of 10% or more compared to the population value?

The sampling error depends essentially on five things:

00:10:45 Understanding standard deviation

Let's look at standard deviation. This has an impact on how large of a sample you need. Standard deviation is a measure of how much diversity or variance there is in the population with respect to what we are measuring. Every different question or variable will have its own standard deviation.

If we are looking at percentages, a population with low diversity is a population where the thing we are measuring has a very low prevalence. For example, if the percentage is 10%, the standard deviation is low, and we will need a smaller sample. If we have a population that is evenly split right down the middle (50%), we get a much higher standard deviation, meaning we need to collect more data to get the same level of precision.

If you have no estimate or idea about the thing you are measuring yet, you can plan for the largest standard deviation to be safe. In our calculator, you will pick 50%, and then you are covered. However, if you have information available from previous surveys or a census, you might be able to use that information to save resources and plan for a smaller sample.

00:14:20 Confidence levels and margins of error

The next input is choosing our confidence level and how much sampling error we are willing to allow. You will often see sampling error communicated as a margin of error, for example, "25% plus or minus 5%." You can also write this as a confidence interval, such as "between 20% and 30%." It is important to think about survey results not as precise numbers, but as a range.

We are going to use a 95% confidence level, which is standard in social science. This means that if you did these samples hundreds of times, 95% of the time the sample would be within that interval. The choice we have to make is how much margin of error we want. The smaller the margin of error, the more data you have to collect.

Let's put this in context with our KAP survey. If we had a plus or minus 5% margin of error, we would have data showing between 10% and 20% of the population knows the vaccine is free. Is our conclusion still valid? I would argue yes, because even at 20%, that leaves 80% who don't know. If we had a 10% error, we are talking about between 5% and 25%. As the sampling error goes up, we become less and less certain. You have to reason whether a survey with a 20% margin of error is going to be useful or lead to more questions than answers.

Using the calculator on our website, you can input the population size, the estimated true value (standard deviation), and the margin of error you can live with. If we want a 5% margin of error, the sample size goes up significantly compared to a 10% margin. If the calculated sample size exceeds your budget, consider using other research methods like key informant interviews rather than doing a really small sample that yields a very wide confidence interval.

00:20:10 Sample sizes for measuring change

Now we will look at calculating sample sizes for measuring change. We are often interested in comparisons, such as changing health outcomes or agricultural yields. Let's look at a concrete example: a three-year program to increase women's meaningful participation in the security sector. We want to measure the percentage of security sector staff holding a positive perception of women's leadership.

We have a baseline survey and a final survey. How large of a sample do we need? To answer this, we need to look at confidence intervals again. If we have a baseline sample estimate of 10% and an endline estimate of 12%, and our sample size was only 100, the confidence intervals might overlap significantly. We might not have enough information to know if the change was positive or negative.

We need to introduce the concept of effect size, which is the magnitude of difference between populations. It is not enough to say there is a difference; we need to know how big the difference is. We must define what success looks like for the program. Do we consider a 10% increase a success, or do we need a 20% increase? This determines the sample size. If you want to detect a very small change, you need a large sample.

In statistics, when testing for change, we have two kinds of errors:

We want a large enough sample to minimize the risk of both errors. Using the calculator for comparisons, you input the base estimate and the smallest change you want to detect. If you are only interested in large changes, you can use a smaller sample size. If you need to detect a small change, like 5%, the sample size goes up significantly.

00:30:00 Accounting for sample design

We will now focus on clustered samples. The sampling method is the process of choosing members. Broadly, we have random sampling (without bias, well-defined probability) and convenience sampling (based on convenience, unknown probability). Within random sampling, it is important to distinguish between a simple random sample and a complex sample.

In a simple random sample, every member has an equal and independent chance of being selected. A complex sample might include stratification or clustering. Why use a complex sample? One of the biggest reasons is cost, specifically transportation. If you do a simple random sample of a large area, you might have to travel to many villages just to interview one person.

Cluster sampling is a way to reduce those costs. We do sampling in two stages: first, we select clusters (e.g., villages) randomly, and then we select individuals within those clusters. However, the selection is no longer independent. People are selected in groups, and that has an impact on the sample size we need.

00:34:45 Cluster sampling and design effect

To navigate this, we introduce the Intra-Cluster Correlation Coefficient (ICC). This is the degree to which members of the same cluster resemble each other. This allows us to compute the effective sample size. The ratio between the actual sample size and the effective sample size is called the design effect.

If the ICC is low (e.g., percentage of population under 25), every cluster looks like a miniature version of the whole, and clustering doesn't have a big impact. However, if the ICC is high (e.g., access to an elementary school, which is geographically segregated), clustering can be very problematic. If you have a high ICC, your effective sample size might be close to the number of clusters, not the number of people interviewed.

The design effect summarizes the impact of your sample design on your sampling error. If you have a design effect of 2.0, you need to double the sample size calculated for a simple random sample. You can use our online calculator to estimate the ICC from previous surveys and calculate the design effect. If the design effect is very high, you might need to reconsider using a clustered survey for that specific variable.

00:42:15 Q&A session

Carol asks: Are there specific guidance to establish the real value regarding the value used for standard deviation? Answer: Surveys, previous surveys, and census information are useful. You can also look at key informant interviews. If you have the budget, you can use 50% to be conservative, which covers you for the largest standard deviation.

Sarah asks: What is the true value percentage mentioned? Answer: The true value or population parameter is the actual number in the entire population. We might not ever know this value unless we do a census. The sample estimate is what we get from the survey, and the error is the difference between the true value and the sample estimate.

Arnold asks: What is the maximum threshold for an acceptable margin of error? Answer: It depends on how you are using the information. If you are prioritizing between interventions, a 10-15% margin might be useful. If you are budgeting or ordering supplies (like tents), you probably want a lower margin of error to avoid overspending.

Prince asks: Is the calculator applicable in every situation? Answer: The first two calculators are for simple random surveys. You can use them for cluster surveys if you know your design effect, by multiplying the result by the design effect.

Anonymous asks: Will the error still apply if we go for a group within the sample? Answer: The error calculated is for the whole sample. If you want a specific margin of error for a subgroup (e.g., women), you need to calculate the sample size specifically for that subgroup.

Sahar asks (Live): I am developing a national impact evaluation of gender-based violence (GBV). Our data architecture is weak, and we don't know the true number of cases. I used different data sources to estimate the population. Is this the best way? Answer: A key element is defining your population. If your population of concern is GBV survivors, and you don't have a list, you are doing your best to identify a population using available resources. You will have sampling error, but you also have to grapple with non-sampling error because your population definition might be imperfect or exclude certain groups (e.g., those not reporting to NGOs). It is important to address these non-sampling biases in your research.

Jessica asks: If you have six geographic regions, would your sample size be different for each region? Answer: You have a choice. You could ignore regions and select randomly, which is a simple random sample. Or, you could do stratified sampling, where you sample a specific number from each region. This is useful if you want precise estimates for each region, but you may need to weight the results for the overall population.

Victor asks: How can wrong sample size calculation affect the conclusion of a study negatively? Answer: If you don't choose a large enough sample, you can end up with very large confidence intervals. If you report a specific number (e.g., 55%) without the confidence interval (e.g., +/- 20%), it can be misleading. You might make decisions based on data that is not precise enough.

Muhammad asks: How do we decide the design effects to be considered? Answer: You don't really decide it; you try to minimize it. If you have a high design effect, you might need to increase your sample size significantly. It depends on what you are measuring and how linked it is to geography.

We will organize a follow-up webinar to go more into complex sampling, clustered, and stratified sampling, and reserve more time for Q&A. Thank you for joining.

Sign up for our newsletter

Sign up for our newsletter and get notified about new resources on M&E and other interesting articles and ActivityInfo news.