# Calculate design effect from cluster surveys

## Clustered sampling

The other calculators in this library are based on a simple random sample (SRS), a kind of survey where every one has an *equal* and *independent* chance of being selected for the survey. In many cases, particularly in humanitarian and development contexts, this may not be feasible.

For example, if you are surveying a population, in person, across a large geographic area, a SRS may be completely impractical. Selecting 1,000 individuals completely at random might mean that travelling to 1,000 different villages, which may be too costly or impossible within a certain time frame.

As an alternative, you might consider two stages of random selection: first, randomly choose 25 villages, and then randomly choose 40 individuals to interview in each village. If the villages are the same size, each respondent still has the same chance of being selected, but the chances are no longer *independent*: if one person is selected in a village, their neighbor has a greater chance of being selected.

Whether this effects the precision of your sample depends on something called the *Intra-cluster correlation coefficient* (ICC), which is a measure of how similar people are to each in other in the cluster, at least with regard to what you're trying to measure. An ICC value of 1.0 means that all responses within a cluster are the same, while an ICC of 0.0 means that people within clusters are just as diverse as the general population.

Each variable that you're measuring will have its own ICC that depends on your context. For example, if your survey includes a question about the respondent's sex, in most contexts you wouldn't expect sex to be clustered together in villages: each village is likely to have more or less the same distribution between male and females.

On the other hand, if your survey includes a question about access to water, that is likely to be *highly* dependent on *where* you are asking. In other words, highly correlated with the cluster.

Again, this depends on your context: in some regions, there might be a high degree of gendered labor-force migration which results in some areas with a very high ratio of men.

So when planning your survey, you will need to review previous research and survey data to plan your sample size accordingly. For some variables, like access to water, you might find the intra-cluster correlation to be too high for a clustered population survey, requiring you to find another way to collect this data.

You can use R or other statistical software to estimate ICCs from previously collected survey data. The calculator below provides a simplified estimate of ICCs for equal-sized clusters that can provide a quick indication of the ICC from survey results.

## References

^{1} Wang, H. and Chow, S.-C. 2007. Sample Size Calculation for Comparing Proportions. Wiley Encyclopedia of Clinical Trials.

^{2} Mak, T.K., 1988. Analysing intraclass correlation for dichotomous variables. Applied Statistics, pp.344-352.