Brief introduction to survey data analysis ideas

This is a short set of bullet-point notes by Chris Wild.

Contents

What is special about analysing survey data?

In almost all of the data analysis you have learned to do, the computer programs essentially assume that the observations you have come from a random sample from some process or infinite population Technically, for a random sample all observations are "independent and identically distributed", or in practice in the survey context:

Survey data

Why not just do a simple random sample?

(e.g. get a list of all the people and draw a random sample without replacement)

What do Agencies (e.g. Stats NZ) want to estimate from their data?

What do medical and social science researchers want to estimate from their data?

As for Agencies and also things like ...

All that is new here is that we use special programs designed for survey data and the program needs to be told how the sampling was done. Apart from that it is pretty much business as usual

Simple random sampling (srs)

What is it?

Strengths

Weaknesses

 

Elements of most survey sampling designs used in practice

Sampling without replacement from a finite population

Why do it?

What are the consequences of ignoring sampling without replacement in the analysis?

Stratified sampling

What is it?

Why do stratified sampling?

What are the consequences of ignoring stratified sampling in the analysis?

Cluster sampling

What is it?

If we are thinking in terms of clusters, it is because we plan only to collect data just from a sampled subset of the groups.

Why do cluster sampling?

What are the consequences of ignoring cluster sampling?

One-stage versus multistage cluster sampling