Interpreting results of field surveys using probability calculators
By Oleg O Bilukha and Curtis Blanton
Dr. Oleg Bilukha is a Medical Epidemiologist with the International Emergency and Refugee Health Branch (IERHB), Centers for Disease Control and Prevention (CDC) with over 10 years of public health and nutrition experience in emergencies. He is a member of SMART Technical Advisory Group and of the Expert Reference Group of Health and Nutrition Tracking Service.
Curtis Blanton is a Statistician with the International Emergency and Refugee Health Branch, Centers for Disease Control and Prevention (CDC). He has over 15 years experience analyzing and designing domestic and international surveys.
Field practitioners in humanitarian settings often face challenges analysing and interpreting the results of nutrition surveys. Most key variables of interest in field surveys are categorical, i.e. expressed as discrete categories such as yes/no, or normal/moderate/ severe. Examples of categorical variables commonly measured in emergency surveys include prevalence of Global Acute Malnutrition (GAM), stunting, underweight, anaemia, coverage of measles immunisation and vitamin A distribution programmes, and several others.
Some of these variables are 'inherently' categorical - for example, measles immunisation and Vitamin A distribution are measured as yes/no. Other variables, for example anthropometric indicators and anaemia, are originally measured as continuous variables (e.g. haemoglobin concentration for anaemia or Z scores for anthropometric indicators), and are converted into categorical variables at the analysis stage using internationally established case definition cut-offs. For example, children 6-59 months of age are classified as stunted if their height for age Z score is <-2, and as non-stunted if Z score is >-2; the prevalence of stunting is presented as proportion of children classified as stunted among all children in a given sample or population. This article discusses analysis of key categorical variables measured in field surveys, irrespective of whether they are 'inherently' categorical, or have been converted into categorical form from continuous data. The theoretical and practical discourse presented below equally applies to any categorical variable measured as a percentage or proportion of the total.
Interpreting survey results vis-à-vis thresholds
The most common way to analyse categorical data in the field is to calculate the prevalence estimate and the 95% confidence interval (95% CI) around such estimate. In most cases (unless data analysts have the capacity to perform more advanced statistical analyses), programme managers and decision-makers have to rely on these three numbers - prevalence estimate, lower confidence limit, and upper confidence limit - to interpret the results and make programmatic decisions.
The key underlying idea in using the estimated prevalence from a representative sample survey is that a (often relatively small) fraction or sample of the population can provide a reliable estimate of the true population prevalence. For example, the prevalence of GAM measured from a survey of 500 children (sample prevalence estimate) would be sufficiently close to the prevalence of GAM in all 100,000 children in the surveyed population (true population prevalence). Note that we cannot measure all 100,000 children, and therefore we will never know for sure what the true population prevalence is, but instead rely on a sample prevalence estimate and the 95% CI limits to provide a range where the true population prevalence is most likely to lie. In the surveys where collected data are valid and representative, there is a 95% probability that the true population prevalence lies between the lower and the upper limits of the 95% CI (note that there is still a 5% probability that the true population prevalence lies outside of the 95% CI limits). For example, in the survey where GAM prevalence estimate is 12% and the 95% CI limits are 8% and 16%, there is 95% chance that the true population prevalence of GAM lies somewhere between 8% and 16%.
The key goal when analysing and interpreting categorical survey data is often to infer not only how high or low the true population prevalence is likely to be, but also how likely is it to exceed the pre-determined action thresholds (e.g., 5%, 10%, 15% for GAM;1 20% and 40% for anaemia;2 etc.) The chance of the true population prevalence falling within a given range is described by the area under the probability distribution curve. For example, Figure 1 presents a binomial probability distribution curve for the prevalence estimate of GAM from the survey example above. As can be seen, 95% of the area under the distribution curve falls between lower and upper 95% CI limits, whereas 2.5% of the area under the curve falls below the lower 95% CI limits, and 2.5% of the area falls above the upper 95% CI limit. Therefore there is a 2.5% chance that the true population value is below the lower 95% CI limit, and 2.5% chance that the true population value would be higher than the upper 95% CI limit. Similarly, from Figure 2, since 50% of the area under the distribution curve lies below the survey prevalence estimate, and 50% lies above, we can conclude that there is an equal chance that the true population prevalence would be below or above the survey prevalence estimate (in this example, the true population prevalence of GAM is equally likely to be below or above 12%).
When we look at the practices presently used in the field, the most common way of classifying GAM (or other indicators) relative to the thresholds is based solely on the magnitude of the survey prevalence estimate (e.g., if the GAM prevalence observed in the survey exceeds the threshold, then the area is declared above the threshold, and vice-versa). From a statistical perspective, this means that GAM is declared above the threshold when statistical probability of the true population value of GAM exceeding the threshold is above 50%. One drawback of this approach is that the width of the confidence interval becomes virtually irrelevant; it may be, in fact, often ignored in summarising the data for decision-making. Another question is whether 50% constitutes sufficient 'risk' or 'confidence' to make programmatic decisions.
When comparing survey results to pre-determined thresholds, the primary interest is to estimate the probability, or 'risk', that the true population prevalence exceeds the threshold. The higher the 'risk,' the more seriously decisionmakers would need to consider implementing appropriate interventions. The probability of the true population prevalence exceeding the threshold is described by the area under the distribution curve that falls above the threshold, as depicted in Figure 3. Using our previous survey example, the area under the curve represents the probability of the true population prevalence to exceed the 10% threshold.
'Threshold' probability calculator To provide additional information for decisionmaking, we developed a 'threshold' probability calculator that provides the estimated probability of the true population prevalence exceeding the threshold. We used a one-sided t-test for proportions, where the alternative hypothesis tested is that the true population prevalence is lower than the threshold. P-value for this test provides an estimated probability (or 'risk') that the true population prevalence exceeds the threshold.3,4
The calculator is in a spreadsheet format, where the user needs to enter some summary survey statistics to obtain the probabilities of exceeding the thresholds. There are three versions of the calculator (included as separate spreadsheets on the Excel file):
- To use for cluster survey designs, when the design effect (DEFF) for the indicator is known. In this case, the user needs to enter total survey sample size, the number of clusters, survey prevalence estimate, and the DEFF.
- To use for cluster survey designs, when DEFF for the indicator is not known. In this case, the user needs to enter total survey sample size, the number of clusters, survey prevalence estimate, and the upper and lower 95% CI limits for this estimate.
- To use in simple or systematic sample surveys. In this case, the user needs only to enter total survey sample size and survey prevalence estimate.
Figure 4 provides the screenshot of the calculator. The information mentioned above is entered in the green cells. The thresholds for which the probabilities are provided are in the yellow column. These thresholds can be defined/changed by the user. The probabilities of the true population value exceeding the threshold are calculated automatically and displayed in the orange column. Figure 4 provides an example of the survey that we used in discussions above (GAM prevalence estimate of 12% and the 95% CI limits 8% to 16%), assuming that this was a cluster survey with 30 clusters and a total sample size of 360 children.
From the values in the orange column on Figure 4 we can see that in this survey area, the probability of the true population value of GAM to exceed 5% threshold is close to 100%, and probabilities of exceeding 10%, 15% and 20% thresholds are 86%, 9% and 0.1%, respectively. This provides much richer information on population 'risk' for decision makers, compared with information based solely on the prevalence and confidence interval limits described above. For example, it tells the user that it is quite likely (86% probability) that the true value of GAM exceeds the 10% threshold, and quite unlikely (9% probability) that the true value of GAM exceeds the 15% threshold. We believe that this information directly quantifying the 'risk' of the true population prevalence exceeding the threshold, combined with other contextual information on risk and protective factors should prove useful for decision-making.
Note that we do not intend to discuss what level of 'risk' (25%, 50%, 95% or other) is high enough to be taken 'seriously' and trigger action. We believe that these decisions should be context-specific, and action should be considered taking into account both the statistical 'risk' estimated from survey data, as well as other existing and potential risk factors.5 Note also that we do not necessarily endorse the appropriateness of currently used action thresholds for various indicators, or the concept of making programmatic decisions based on comparing the observed prevalence to preexisting thresholds. We only provide a convenient statistical tool for those field practitioners who feel compelled to conduct these types of analyses.
The calculator presented on Figure 4 can be used for any categorical variable for which results are expressed as a proportion (or percentage) of the total - for example, for prevalence of anaemia, immunization coverage, stunting, wasting, etc. As mentioned, the thresholds can be changed as necessary for a given indicator. For example, it is possible to test what is the probability that measles immunization coverage exceeds a minimum acceptable level, or whether anaemia prevalence exceeds programmatic action threshold that calls for blanket iron supplementation, etc.
Another challenge for field practitioners is presented when the situation requires assessing significance of the difference between two survey results. For example, consider testing the difference between the surveys conducted in the same area in two different seasons or in two different years, or testing the differences between the results obtained from the surveys in two neighbouring districts or livelihood zones. In these cases, field practitioners often use the 'overlapping confidence intervals test' - i.e. if the 95% CI limits around the estimates from two surveys do overlap, the results are declared not statistically different, and if confidence limits do not overlap, the results are considered statistically different. The problem is that in many instances when confidence intervals do overlap slightly, results may still be significant at 95% confidence level. This is especially true if a one-sided test can be used as discussed below.
To assist field practitioners in these situations, we developed a 'two-survey' calculator for testing the statistical significance of the difference between the estimates from two surveys (or from two strata of the same survey). The statistics in this calculator are based on a t-test for the difference between two proportions, testing an alternative hypothesis that the true population values in the two surveys are different from each other.6,7 The two-tailed probability that the true population values are different from each other is calculated as 1-p, where p is a p-value of the above ttest for two proportions. The calculator provides both 1-tailed and 2-tailed probabilities.
Similarly to the 'threshold' calculator, the 'two-survey' calculator is also available in Excel format and has three spreadsheets:
- For cluster survey designs where prevalence estimates and DEFF in both surveys are known.
- For cluster survey designs where prevalence estimates are known but DEFF are unknown.
- For simple or systematic random surveys.
The information that users need to enter for each of the surveys is the same as in the 'threshold' calculator.
Figure 5 presents a screenshot of the 'two-survey; calculator. Users enter information in the green cells, the p-value is presented in the turquoise coloured cell, the 2-tailed probability is in the yellow cell, and the 1-tailed probability is in the blue cell.
Consider comparing GAM prevalence from the two surveys conducted in neighbouring districts A and B (Figure 8). District A results are the ones we used as an example in a 'threshold' calculator, and district B results are as follows: GAM prevalence of 19%, 95% CI from 15% to 23%, sample size 450, 32 clusters. Note that the 95% CI for the two surveys overlap (8%-16% in survey A and 15%-23% in survey B), so by the 'overlapping confidence intervals test' the difference between two surveys would be declared non-significant. From the output in Figure 5, however, we see that the p-value for the 2-tailed test (p=0.014) is significant at 0.05 level, and the 2-tailed probability is 98.6%, meaning that there is about 98.6% statistical probability that the true prevalence of GAM in districts A and B are different from each other.
So, when should we use 1-tailed versus 2-tailed test? For most comparisons between two surveys, a 2-tailed test would be an appropriate test to use. It is more conservative of the two, and does not depend on the a priori hypotheses. The 1- tailed test is more powerful (it always returns a higher probability that two surveys differ from each other), but must be used cautiously and only in specific situations. Generally, we can use a 1-tailed test if we have an a priori hypothesis that one population's prevalence is higher than the other, and can clearly justify our thinking. For example, we could use a 1-tailed test in our example above if before doing surveys in districts A and B we could publicly declare that we expect GAM to be higher in District B, and could explain why we expect that (e.g. because blanket supplementary feeding and general food distribution are implemented in District A and not B, or because District B and not District A experienced drought and had poor harvest, etc.) Note that if our a priori guess turns out to be incorrect (e.g. we expected GAM to be higher in District A, and the surveys showed a higher GAM in District B), we cannot use a 1-tailed test.
As was the case with the 'threshold' calculator, the 'two-survey' calculator can also be used for any categorical variable for which results are expressed as a proportion (or percentage) of the total - for example, for prevalence of anaemia, immunization coverage, stunting, wasting, etc.
In conclusion, we wanted to emphasise that analyses performed by these calculators can also be performed using any common statistical software, like SPSS, SAS or STATA. We propose them solely for their convenience, realising that field practitioners often do not have advanced skills in data management and analysis, or do not have access to statistical software that require expensive licensing rights.
The calculators described in this paper are available from the website of the International Emergency and Refugee Health Branch, CDC: http://www.cdc.gov/nceh/ierh/
The authors look forward to a feedback from field practitioners on the use of these tools. Please send your questions, comments or suggestions to Dr. Oleg Bilukha: firstname.lastname@example.org
From the editors:
In the next issues of Field Exchange we are planning to publish additional reports on this topic, describing a variety of experiences of using these tools in the field to interpret results of nutrition surveys and make programmatic decisions. In the interim, if you are interested to learn more about these field experiences or share your thoughts, please contact Peter Hailey (email: email@example.com), David Doledec (email: firstname.lastname@example.org), and Grainne Moloney (email: email@example.com).
1World health Organisation: Management of Nutrition in Major Emergencies. Geneva: WHO, 2000
2World Health Organisation: Iron Deficiency Anaemia. Assessment, Prevention and Control. Geneva: WHO, 2001
3Campbell MK, Mollison J, Steen N, Grimshaw JM, Eccles M. Analysis of cluster randomised trials in primary care: a practical approach. Fam Pract 2000, 17:192-6
4Fleiss JL, Levin B, Paik MC. Statistical Methods for Rates and Proportions, 3rd ed. New York: John Wiley & Sons; 2003
5Bilukha OO, Blanton C. Interpreting results of cluster surveys in emergency settings:is the LQAS test the best option? Emerg Themes Epidemiol 2008, 5:25. http://www.ete-online.com/content/5/1/25
6Murray DM: Design and Analysis of Group Randomized Trials. New York: Oxford University Press;1998.
7Donner A, Klar N. Design and Analysis of Cluster Randomization Trials in Health Research. New York: Hodder Arnold; 2000.
More like this
en-net: Negative Confidence Interval
What does negative confidence interval implies? In some assessment findings, I noticed a -ve CI. For e.g., SAM 0.2 [-0.2-0.6 95%CI]. Also, how wider the CI that we allowed to...
en-net: Wide confidence interval SAM
I'm looking at a national SMART survey. The confidence intervals appear quite wide for SAM. As an example one province has SAM prevalence of 2.1 (95%CI 1.1 - 4.0),...
Our programme is a blanket supplementary feeding programme (BSFP) that is targeting very high rates of chronic malnutrition, the largest nutrition problem in the country. We...
en-net: Intervale of confidence
Hi, We just finish a SQUEAC in Haiti region. Our food security coordinator asking a excellent question (we had the same question in Mali) and we don’t have any answer for now...
en-net: Impact of CMAM
I am working as Provincial Coordinator Nutrition Cell Government of Balochistan (Pakistan). We have been implementing CMAM in food insecure districts of our province from the...
Do you have any research, findings, or any related article explaining scientifically why GAM rates are higher in the lowland areas (Coastal areas) than in the...
Hi, I just encountered a question. Can the results of a exhaustive nutrition screening (based only on MUAC) be presented in terms of GAM or GAM can only be presented in...
Now days, the term "Small scale SMART survey, SSSS" is becoming common. In principle, is it different from that of normal SMART survey procedure? To make it clear, does SSSS...
There is a project that aims to determine change in the knowledge-attitude-practice (KAP) among general public after certain intervention and after 12 months period. The design...
By Tom Oguta, Grainne Moloney and Louise Masese Tom Oguta has been working with FAO/FSAU in the Nutrition Surveillance Project in Somalia as a Nutrition Project Officer for...
FEX: Assessment of the PROBIT approach for estimating the prevalence of acute malnutrition from population surveys
Summary of research1 Location: N/A What we know already:Prevalence of GAM is normally estimated using two stage cluster sampled surveys using the SMART method. The PROBIT...
en-net: Is it possible to use LQAS methodology to assess the prevalence of stunting in non-emergency settings?
We are wanting to assess the prevalence of stunting in a non-emergency setting. Only stunting (height and age) and no other variables. Does anyone know if it is possible to...
I have some results but not the full data set from two rounds of data collection.
I would like to be able to conduct a significance test to...
FEX: Is there a systematic bias in estimates of programme coverage returned by SQUEAC coverage assessments?
View this article as a pdf Lisez cet article en français ici By Mark Myatt and Ernest Gueverra Mark Myatt is a consultant epidemiologist. His areas of expertise...
From Caroline Muthiga: We conducted a rapid assessment recently using MUAC and measured 120 children. We found 13 children with a MUAC of <11cm, 35 between 11.0- 11.9,...
en-net: Comparing 2 surveys
I have 2 anthropometrc survey datasets, one survey was conducted using 30x30 cluster sampling method (where children were selected using quota sampling method) and the ther...
en-net: Representative sample
What is the smallest/least representative sample or number of children/households one can assess in a SMART survey. Thank you Mark for this comprehensive answer. Indeed, you...
en-net: Estimating Coverage
Dear Advisors We are currently finishing a SQUEAC assessment in a high SAM prevalence area. Because of a number of reasons, we decided not to do Stage 3, but rather end the...
en-net: SQUEAC Stage
As per the SQUEAC investigation manual/guide, the purpose of the exercise in Stage 2 is to confirm hypothesis of homogeneity/heterogeneity of coverage in the program area. if...
Concern Worldwide/South Sudan Programme Terms of Reference for Consultant to carry out nutritional survey in Aweil North and West Counties, Northern Bahr el Ghazal State, South...
Reference this page
Oleg O Bilukha and Curtis Blanton (2010). Interpreting results of field surveys using probability calculators. Field Exchange 39, September 2010. p37. www.ennonline.net/fex/39/results