Sampling

Sampling strategy
- Non-probabilistic approaches
- Probabilistic approaches
Sample Weight
- How are the oversampled/ undersampled areas corrected in data analysis?
- What does it mean to normalize the weights?
Pilot Sampling

Sampling strategies are constrained by available budget, field accessibility and time.

hus, the chosen approach for a defined context often reflects a trade-off between representativity of the results, rapid delivery and cost effectiveness.

Sampling strategy

Sampling strategy can be either probabilistic or non-probabilistic. A good introduction can be found here

Non-probabilistic approaches

Non-probabilistic approaches are usually favored during the emergency phase where both time and field access represent the main challenge.

Convenience sampling

A frequently used method in emergency situations, it relies on sampling those respondents who are easiest to access.

Practically speaking those couldd be either: * Key Informants willing to be interviewed.

Individuals or household among those who have settled along roadsides, or who present themselves to administrative center of the returnee settlement or the assistance desk, etc.
Advantages: Easy and quick to implement, especially when time and access are the main constraints.
Disadvantage: The danger with this type of data collection approach is that it will often lead to biased results as the sample may not be representative of the majority, i.e. those with the most resources or power are often the ones who settle in the most easily accessible areas.

Snowball sampling

Snowball sampling (or chain sampling, chain-referral sampling, referral sampling) is a non-probability sampling technique where existing study subjects recruit future subjects from among their acquaintances. This technique is subject to numerous biases. For example, people who have many friends are more likely to be recruited into the sample.

Advantages: Useful when targeting specific groups that might be difficult to reach (hidden population).
Disadvantage: This approach might underweight the most vulnerable individuals.

Purposive sampling

It is based on previous knowledge about who might be able to provide valuable or specific information. It uses the judgement of community representatives, project staff or assessors to select typical locations and/or informants. The sampling of children or women, for example, is a type of purposive sampling.

Purposive sampling can also be done through Key Informant.

Advantages: Moderately rigorous if well and clear criteria for sampling are followed. Useful when targeting specific groups of affected population or specific affected areas. Less time consuming and less expensive than representative sampling.
Disadvantage: Generalisations are biased and not recommended. Samples are not representative of population due to subjectivity of respondents.

The risk of losing certain componnent of the population can be addressed by defining strata within the purposive sample.

In the case of Desk interview or key Informant, the more observations the better. Some kind of credibility scoring can be obtained for each locations based on a review of the key informant.

Quota sample

A quota sample might be representative of the population (if quotas actually do work, which is not always the case). But a quota sample will never satisfy the strict randomness requirements that statistics require. Only if we are working with a random sample can we make inferences from the sample to the population. In quota samples, there is not sufficient randomness, as the interviewer selects the interviewees actively. Therefore, quota samples cannot be used to reason about the general population.

Probabilistic approaches

Whenever the situation is becoming more protracted, probabilistic approaches should be favored. They will allow to generate more reliable results.

Respondent-driven sampling -RDS

A declination of snowball sampling is the Respondent-driven sampling -RDS approach. It combines “snowball sampling” with a mathematical model that weights the sample to compensate for the fact that the sample was collected in a non-random way. As such it can be classified as probabilistic approach. The advantage is that seeds selection is specific and does not require sample frame.

While data requirements for RDS analysis are minimal, there are three pieces of information which are essential for analysis (RDS analysis CANNOT BE PERFORMED without these fields for each respondent):

Personal Network Size (Degree) - Number of people the respondent knows within the target population.
Respondent’s Serial Number - Serial number of the coupon the respondent was recruited with.
Respondent’s Recruiting Serial Numbers - Serial numbers from the coupons the respondent is given to recruit others.

A good introduction to the organisation of RDS is in this presentation.

Time-Location Sampling

The Time-Location Sampling (TLS) approach can be used when the goal is to have a representation of population in movement. The idea and the assumption is to sample persons at locations and at time at which they may be found.

Time-location sampling is used to sample a population for which a sampling frame cannot be constructed but locations are known at which the population of interest can be found, or for which it is more efficient to sample at these locations. As such the approach is likely appropriate when the survey is taking place at a border.

More practical guidelines for TLS are available in a dedicated Resource Guide TLS and some application on Border Monitoring for tourism or illegal migrants.

Random sampling

If you need a purely random sample, the size of the sample is a calculation that takes 3 variables:

Size of the full population. In refugee Context, Data is coming from proGres while in IDP context, data is coming from a Displacement Tracking System.
Confidence level: for what proportion of the population you want to get the right estimation (usually either 90%, 95% or 99%)
Error Margin (or confidence interval): How much error are you willing to tolerate for each questions? i.e. + or – your estimated ratio for each questions on the top of the confidence interval (usually either 5%, 2% or 1%)

There are online calculator for this. Alternatively one can use the excel formula from this example

For 400,000 Syrians	5% error margin	2% error margin	1% error margin
90% Confidence level	272	1694	6692
95% Confidence level	384	2387	9379
99% Confidence level	662	4105	15929

For 150,000 Afghans	5% error margin	2% error margin	1% error margin
90% Confidence level	272	1682	6511
95% Confidence level	383	2363	9026
99% Confidence level	661	4036	14937

Usually the decision on the right confidence level and error margin to be selected is also influenced by cost implication and the final usage of the figures that is looked for.

Stratified sampling

You can refer to this Introduction video or this presentation and this one frorm the WFP VAM.

A stratified random sample can only be carried out if a complete list of the population is available. In stratified sampling the population is partitioned into groups, called strata, and sampling is performed separately within each stratum.

This can be done for the following reasons:

Population groups may have different values for the responses of interest.
If we want to improve our estimation for each group separately.
To ensure adequate sample size for each group.

In stratified sampling designs, it is assumed that:

stratum variables are mutually exclusive (non-over lapping), e.g., urban/rural areas, economic categories, geographic regions, race, sex, etc.
the population (elements) should be homogenous within-stratum, and
the population (elements) should be heterogenous between the strata.

The major task of stratified sampling design is the appropriate allocation of samples to different strata. The different types of allocation methods includes:

Equal allocation: Divide the number of sample units n equally among the k strata. This implies to use “weighted analysis” (disproportionate selection).
Proportional to stratum size: Make the proportion of each stratum sample is identical to the proportion of the population. A major disadvantage of proportional allocation is that sample size in a stratum may be low and provide unreliable stratum-specific results. In terms of analysis, data will be Self-weighted (equal proportion from each stratum).
Allocation based on variance differences among the strata (called Optimal allocation). Optimal allocation minimizes the overall variance for a specified cost, or equivalently minimizes the overall cost for a specified variance. In situations where the standard deviations of the strata are known it may be advantageous to make a disproportionate allocation. Suppose that, we had stratum A and stratum B, but we know that the individuals assigned to stratum A were more varied with respect to their opinions than those assigned to stratum B. Optimum allocation minimises the standard error of the estimated mean by ensuring that more respondents are assigned to the stratum within which there is greatest variation. Stratum variances are usually defined by previous surveys. This approach also implies to use “weighted analysis” (disproportionate selection).
Allocation based on the relative cost of each survey record (called Neyman Allocation). Neyman allocation is a special case of optimal allocation where the costs per unit are the same for all strata. In this case, the ideal sample allocation allow to maximize precision, given a Stratified Sample With a fixed Sample Size. The ideal sample allocation plan would provide the most precision for the least cost. This implies to sample more heavily from a stratum when the cost to sample an element from the stratum is low, the population size of the stratum is large or the variability within the stratum is large. This approach also implies to use “weighted analysis” (disproportionate selection).

Typically, when developing the stata definition, in case of optimal or Neyman allocation, i.e. when stratea variance are already known through a previous survey, the following objectives can be looked at:

Find minimum sample size, given a fixed error
Find minimum error, given a fixed sample size
Find minimum error, given a fixed budget
Find minimum cost to achieve a fixed error

Typical workflow to define sample size in case of stratified sampling:

Choose the stratification (e.g.regions, district…)
Define the population (N) of each strata
Decide on key indicator(s)
Estimate mean & variance or prevalence of key indicator
Decide on precision and confidence level
Calculate the initial total sample size (n) according to the budget/time
Use simple random sample per strata to select your representative sample

To estimate sample size, you need to know:

Estimate of the prevalence or mean & STDev of the key indicator (e.g. 30% return intention). Prevalence is the total number of cases for a variable of interest that is typically binary within a population divided by its total population. Mean is the expected value of a variable of interest that is typically continuous within a prescribed range for a given population (e.g. expenditure per case)
Precision desired (for example: ± 5%). Precision is the variability of the estimate.
Level of confidence (for example: 95%). It represents the probability of the same result if you re-sampled, all other things equal.
Population (only if below 10,000, otherwise it will not influence the required sample size)
Expected response rate (for example: 90%)
Number of eligible individuals per household (if applicable)

Stratified sampling can be performed with R. Tutorial scripts are available here.

Post stratification

One can also use weights, computed through a post-stratification process, to get potentially biased surveys, like online surveys, to better fit the underlying population. The only thing that weights can do, is ensure that your sample composition better mimics the general population’s characteristics. Weights will never help you if the process governing non-response is part of the puzzle you want to solve.

In a random sample, we define a population, draw from that population at random and then compute and apply weights to align the sample with the population. This weighting is necessary because some people originally sampled might be e.g. harder to reach than others, thereby biasing the sample. Once the post-stratification weights have been applied, the random sample is representative of the population it was drawn from. Statistics gives us a method to tell just how accurately the findings from the sample can be generalized.

Cluster sampling

Cluster sampling is a technique that allows to reduce the surveying budget when travel cost are important. Instead of covering a whole territory, the cluster sampling implies to divide the population into separate groups, called clusters. Then, a simple random sample of clusters is selected from the population.

Cluster sampling are therefore not relevant when techniques such as phone interview are used as there’s no marginal surveying cost involved with location of interview.

Given equal sample sizes, cluster sampling usually provides less precision than either simple random sampling or stratified sampling.

Different approaches can be used for cluster sampling

One-stage sampling. All of the elements within selected clusters are included in the sample.
Two-stage sampling. A subset of elements within each selected cluster is randomly selected for inclusion in the sample.

Sampling with Replacement and Sampling without Replacement

What is replacement?

When a population element can be selected more than one time, we are sampling with replacement. When a population element can be selected only one time, we are sampling without replacement. When we sample with replacement, the two sample values are independent. Practically, this means that what we get on the first one doesn’t affect what we get on the second. Mathematically, this means that the covariance between the two is zero. In sampling without replacement, the two sample values aren’t independent. Practically, this means that what we got for the first one affects what we can get for the second one. Mathematically, this means that the covariance between the two isn’t zero.

With or without?

In small populations and often in large ones, sampling is typically done “without replacement”, i.e. , one deliberately avoids choosing any member of the population more than once.

Less commonly, sampling can also be conducted with replacement. This allows to address low response rate.

For a small sample from a large population, sampling without replacement is approximately the same as sampling with replacement, since the odds of choosing the same individual twice is low. This can be measure by calculating the covariance: how much two items’ probabilities are linked together. The higher the covariance, the more the results can be influenced. A covariance of zero would mean there’s no difference between sampling with replacement or sampling without.

The specific case of phone surveys

As explained in this paper, bias may be introduced into population estimates through telephone surveys, however, by the exclusion of non-telephone households from these surveys. The bias introduced can be significant since “non-telephone households” may differ from telephone households in ways that are not adequately handled by poststratification. Many households, called “transients”, move in and out of the telephone population during the year, sometimes due to economic reasons or relocation. The transient telephone population may be representative of the non-telephone population in general since its members have recently been in the non-telephone population.

Sample Weight

Over-sampling in regions with small populations ensures that they have a large enough sample to be representative. Under-sampling is done in regions with large populations to save costs. Sample weights are mathematical adjustments applied to the data to correct for over-sampling, under-sampling, and different response rates to the survey in different regions.

How are the oversampled/ undersampled areas corrected in data analysis?

The samples are designed to permit data analysis of regional subsets within the sample population. When the expected number of cases for some of these regions is too small for analysis, it is necessary to oversample those areas. When the expected number of cases for some of these regions is unnecessarily large, those areas may be undersampled to accommodate logistical or budgetary constraints.

During analysis, it is then necessary to “weight down” the oversampled areas and “weight up” the undersampled areas. The developing of the sampling weights has taken this factor into account. Always use the weight variable found in the DHS data set. Even in surveys that come from a self-weighting sample, it is still necessary to use the sampling weights in analysis because the response behavior may differ by response groups.

What does it mean to normalize the weights?

After the weights are initially calculated, they are normalized, or standardized, by dividing each weight by the average of the initial weights (equal to the sum of the initial weight divided by the sum of the number of cases) so that the sum of the normalized/standardized weights equals the sum of the cases over the entire sample. The standardization is done separately for each weight for the entire sample.

The entire set of household sample weights is multiplied by a constant, thus, the total weighted number of households equals the total unweighted number of households at the national level.

Individual sample weights are normalized separately for women and men. Thus, the total weighted number of women equals the total unweighted number of women, and the total weighted number of men equals the total unweighted number of men. Women and men are normalized separately because all non-HIV calculations are performed on women and men separately. We do not provide survey estimates on the joint population of women and men combined for anything other than HIV prevalence.

Pilot Sampling

In the desing phase of questionnaires it is recommended that a pilot study should be undertaken for the purpose of testing the reliability and validity of the tool.

The sampling phase should consist of the following steps:

1- Sample size calculation: to apply the statistical tests with enough statistical power, sufficient sample size should be calculated for the piloting. There are no formulas or standard mathematical equations to determine the sample size. However, as a rule of thumb, it is recommended statistically to have the following criteria:

a)  Each question and dimension in the questionnaire should have at least 3-5 observations, meaning each question must be answered by at least three participants. For example, if the questionnaire consists of 10 questions and 2 dimensions the minimum sample size = (10 + 2) × 3 = 12 × 3 = 36 participants.

b)  A margin of at least 10% should be added to allow for missingness, errors, attrition, etc. So, using the previous example, four additional participants should be added, bringing the sample size to 40

c)  The minimum recommended sample size, regardless of the number of questions and dimensions, is 30. A sample smaller than this would make the statistical tests lose considerable power. 

d)  If you are implementing more than one version, each version is a different questionnaire. In other words, for example, English and Arabic versions are different from each other, and the samples should not be added together. Also, you cannot use the sample from the English or Arabic version to validate the other. If you want to test both the English and Arabic versions, each questionnaire requires its sample of at least 30 participants, or the recommended sample size described in (a) and (b). Also, the participants should not answer more than one version of the questionnaire.

2- Sampling methods: there are several ways for sampling methods. Each method has its advantages and disadvantages. The following are the most commonly used methods in pilot studies:

a)  Purposive sample: a non-probability sample that is selected based on characteristics of a population and the objective of the study. Purposive sampling is also known as judgmental, selective, or subjective sampling. It is used when we want to target specific profiles and characteristics to ensure that we have selected what serves the objective of the study. Using proGres, a sample of participants can be selected based on certain criteria. For example, to cover the study’s aim, it is required to have participants from every GCC country, only Syrians, with and without jobs, do have household members, etc. Then the sample can be selected by the researcher(s) from the eligible list.

b)  Random sampling: a random sample of participants selected from the list of participants available. Like the purposive sample in assigning inclusion and exclusion criteria. However, the sample is chosen randomly out of eligible participants. 

c)  Convenient sampling: this method used to save time and resources. Convenient sampling is done by collecting those who are directly available to us without being concerned too much about their profiles.

Recommendation: The methods in (a) and (b) are recommended for the best results.