(e.g. get a list of all the people and draw a random sample without replacement)
As for Agencies and also things like ...
All that is new here is that we use special programs designed for survey data and the program needs to be told how the sampling was done.
Apart from that it is pretty much business as usual
What is it?
- Think in terms of all units in the entire population being subdivided into non overlapping groups called clusters, usually on the basis of physical proximity (close together)
-
- (e.g. if units are households we could treat all houses in the same street as forming a cluster, or all pupils in the same school could be a cluster)
- A cluster sample would select a sample of clusters from a list of all of the clusters and then select all of the units from the selected clusters
-
- (e.g. sample streets from a list of streets and then take all houses in the sampled streets)
- Multistage cluster sampling employs the clustering idea at several levels
-
- (e.g. sample schools from a list of schools and, for each selected school, sample classes from the list of classes in that school and then either take all or a sample of students from each of the selected classes. OR select towns, then census blocks within towns, then households within census blocks and then, finally, people within households)
- Note: Cluster sampling tends to employ a relatively large number of groups (then called clusters) whereas stratified sampling tends to involve a small number of groups (then called strata). They differ in how we then use these groups in our sampling plan. If we are thinking in terms of strata, it is because we plan to collect data from each and every group.
If we are thinking in terms of clusters, it is because we plan only to collect data just from a sampled subset of the groups.
Why do cluster sampling?
- It can be much cheaper than simple random sampling
- Units in a cluster are closer together (e.g. reducing travelling time)
- We can obtain information from a single source (which also reduces costs)
- So we can often get more accuracy for the same cost (or the same accuracy for a reduced cost)
- We don’t need a complete sampling frame of all individuals in the population, only lists of clusters and then lists of units (or sub-clusters) for the selected clusters only
- If we want to do interventions, we can often only apply them at the level of the cluster
-
- E.g. use different teaching methods on different classes
What are the consequences of ignoring cluster sampling?
- Cluster sampling generally leads to
- positive correlations between units in the same cluster
- An effective sample size which is smaller than the total number of units observed
- We have “less information” than we would from a simple random sample with the same number of units in it
- The effective sample size can be closer to the number of clusters sampled than to the number of units finally obtained
- Design effects (actually 1/ d.eff) give indications of efficiency loss (described in later Lectures)
- Standard errors reported from standard (non-survey) programs tend to be too small
- Coverage of 95% confidence intervals cover
- Estimates from standard programs relating to the whole population are often wrong