Improving the quality of nutritional survey data worldwide: Putting Child Kwashiorkor on the Map Initiative
By Lauren Browne
Lauren Browne is the Data Manager for the Kwashiorkor Mapping project and joined the team after interning with ACF-UK and Save the Children UK. She completed her Master of Science in Nutrition for Global Health at the London School of Hygiene and Tropical Medicine and previously served as a Peace Corps Volunteer for the US Government.
What we know: Kwashiorkor, or oedematous malnutrition, is overlooked in scientific and public health fora. The burden of kwashiorkor is unknown; metadata analysis has the potential to fill this information gap.
What this article adds: A CMAM Forum/ACF-UK/UNICEF/WHO collaboration undertook an updated mapping to estimate the numbers and location of kwashiorkor and identify high burden countries/areas. A total of 2,350 datasets from various UN/NGO/government sources were included. Significant limitations to the meta-analysis included barriers to data access, lack of standard formats (file names, datasheet types, varied coding and classifications between surveys), poor data quality, missing variables, no access to raw data, and duplicate surveys shared. Only one out of 36 MICS4 surveys included MUAC; all DHS surveys were missing both MUAC and oedema variables. There is a clear need for defined standards for all nutritional survey data, better overview of data quality and improved storage of raw original datasets. A minimal set of data, including MUAC and oedema, should be included in DHS, MICS and SMART surveys. Going forward, an expert inter-agency group should determine standard definitions, labels, codes and units for all indicators deemed to be of importance for inclusion in nutritional surveys. This group could facilitate management of an open access or licence-accessed central data repository for professionals and researchers.
Summary of the Putting Child Kwashiorkor on the Map initiative
Putting Child Kwashiorkor on the Map was a collaborative effort between the CMAM Forum, ACF-UK, UNICEF and WHO. The current phase of the project (Phase 2) was launched in late 2014 to help improve and strengthen the data used for the map produced in the first phase of mapping conducted in 2013 (Alvarez et al, 2013). The aims of Phase 2 were:
1. To refine and update the initial kwashiorkor map, provide a broad estimate of the numbers and location of kwashiorkor and identify high burden countries/areas; and
2. To strengthen the evidence base and support advocacy for inclusion of kwashiorkor in relevant methodology discussions at global level.
Non-governmental organisations (NGOs), United Nations (UN) agencies and governments involved with nutrition programmes were asked to share nutritional surveys. Requests were accompanied by a project information sheet and a data-sharing letter of agreement. A Technical Advisory Group (representatives from Centers for Disease Control and Prevention (CDC); CRED/University Uclouvain; Jimma University Ethiopia; Kenya Medical Research Institute (KEMRI); Mwanamugimu Nutrition Unit, Uganda; Médecins Sans Frontières (MSF); Washington University in St. Louis; University of Tampere and Valid International) guided the type of information to be collected, the database construction, the analyses and the final report.
Any nutritional survey adopting the SMART methodology (or similar methodology used before the development of SMART), with Population Proportional to Size (PPS) or exhaustive sampling, simple random sampling or systematic sampling, and including the variables age, sex, weight, height, mid-upper arm circumference (MUAC) and presence or absence of bilateral pitting oedema for children aged 6-59 months was deemed eligible for inclusion in a central database.
The initial map from Phase 1 (557 surveys held by Brixton Health) was updated during Phase 2 with more robust estimates of the prevalence of kwashiorkor based on a total of 2,277 surveys collected from 11 NGOs (ACF, Concern Worldwide, GOAL, IMC, IRC, MSF, Plan International, Save the Children, Terre des Hommes, World Vision and Zerca y Lejos), 15 national governments/UNICEF, FEWS NET1, FSNAU2 and UNHCR for 55 countries. The eligible surveys were conducted from 1992 to 2015 and included the data of over 1.7 million children. Outcomes in terms of prevalence are included in an accompanying article in this edition of Field Exchange.
Findings and implications
One of the findings from the project was the “…need for systematic collection, storage, and standardisation of nutritional survey data, software and definitions… Inconsistencies were found across surveys, including lack of a standard format, varying codes for some indicators, loss of original files (often with past employees who left or through corrupted files), no clear contact person, etc. Variation was found in the type of software used, coding/labelling and units…” This article aims to expand on this finding and provide a more detailed description of the data issues encountered; specifically the barriers to data access, lack of standardisation, poor data quality, missing variables and receipt of raw and cleaned data.
Barriers to data access
Obtaining data permission was often a very lengthy process and some countries did not provide permission for use of nutritional surveys outside the country of origin. Furthermore, data agreements specified restrictions on use of the data and were time-bound. These problems are often encountered by researchers and have previously been discussed in Field Exchange (Guerrero, 2015).
Lack of standardisation
Surveys were received in five different formats (ENA3 for SMART, EpiInfo/EpiData (REC), STATA, SPSS and Excel), which required time-consuming file conversions to the common CSV format needed to aggregate all the data within the analytical software (R Analytic Flow was the statistical programme utilised for the project). Some files received were corrupted, most likely due to ineffective conversions, while others were received in unfamiliar formats that could not be converted.
Twenty-nine (18%) of the ineligible datasets were excluded because file labelling was poor and inadequate descriptive information was provided about the survey, such as location.
The metadata provided for surveys varied widely, was not standardised and was often either not present, coded opaquely, or classified differently. For instance, in those datasets that identified the population type, the definitions used by organisations to describe the surveyed population varied. Some surveys used general classifications (e.g. rural or urban) for the variable, while others disaggregated it into sub-groups (e.g. agrarian or pastoralist, instead of rural). Unknown codes utilised for variables were a problem for 11% (n=18) of the excluded datasets; some indicators were coded differently by different agencies and even in surveys conducted by the same agency, specifically oedema and sex.
Poor data quality
Data entry errors were extremely common in the received datasets, with values often typed into the wrong columns or typed incorrectly. The MUAC variable was most often recorded incorrectly and was sometimes recorded in both millimetres and centimetres within the same dataset. Very extreme values came up frequently for MUAC but also occurred for weight and height.
A total of 2,515 datasets were received, with nearly 7% (n=165) not eligible for inclusion in the database since they were missing one or more of the needed key variables (age, sex, weight, height, MUAC and/or oedema). No Demographic Health Survey (DHS) datasets had all the required variables (all were missing both the MUAC and oedema variables). Only Multiple Indicator Cluster Survey (MICS) 4 databases were sourced, since only MICS4 could potentially have all the variables needed. Of 36 MICS4 databases received, 35 were missing the MUAC variable and so were ineligible for inclusion. Overall, 63% (n=105) of the excluded datasets were missing MUAC; fewer were missing oedema or other variables.
A total of 114 children with oedema had incomplete case records, meaning they did not have one or more of the accompanying variables recorded (age, sex, weight, height or MUAC) and were therefore not included in the database. Of these, 83% (n=95) were missing MUAC, with the majority of the rest missing weight and/or height.
Receipt of raw and cleaned data
Raw data was specifically requested, but agencies found it difficult to locate all the original raw datasets, especially from older surveys. Many agencies had lost the data and could only provide narrative reports.
It was unclear whether datasets had already been cleaned prior to receipt, so an unknown number of included surveys were either cleaned based on the contributing organisation’s standards or the project’s standards, resulting in variability. Furthermore, agencies may have used WHO and/or SMART flagging criteria, either deleting flagged records or leaving them in, which was not evident from the datasets received.
Of the 2,350 eligible datasets, over 3% (n=73) were identified as duplicates, due to inter-agency collaboration during surveys and shared ownership of the data. Potential duplicate datasets were identified via the calculation of file-level checksums. However, the duplicate code could not account for cleaning differences among data entry persons, so this may have prevented some duplicate surveys from being detected. For example, if the same dataset had been cleaned by one collaborating organisation but not the other prior to sharing, then the code used for the analyses would not have picked up the duplicate dataset. It was not possible to systematically spot by eye all additional duplicates that could have been missed by the code due to the extensive nature of the database. The provision of raw original data by all organisations involved would have prevented these difficulties, thus minimising the number of duplicated dataset omissions.
Recommendations for the improvement of survey quality
It is recommended that in the future, a minimal set of data (including especially MUAC and oedema, since these are admission criteria for services managing acute malnutrition) be collected across all nutritional surveys, including standard national surveys like SMART, MICS and DHS.
Systematic storage of raw datasets, particularly in a common format (e.g. CSV) often used in large international research projects, should be prioritised, done at headquarter or country level and stored with the accompanying narrative reports.
It is important that nutritional survey datasets are properly standardised. It is recommended that an international, inter-agency technical advisory group determine standard definitions, labels, codes and units for all variables to be automatically included in nutrition surveys, including definitions for a minimal set of metadata. In addition, basic information must be integrated into each dataset, ideally in the file name.
Conclusions and the way forward
There is a clear need for defined standards for all nutritional survey data (especially surrounding file type and labels, codes, variables and metadata), better overview of data quality and improved storage of raw original datasets.
Going forward, an expert inter-agency group should determine standard definitions, labels, codes and units for all indicators deemed to be of importance for inclusion in nutritional surveys. In addition, if widely agreed, this group could facilitate management of an open access or licence-accessed central data repository for professionals and researchers.
For more information, email: Lauren Browne
1 Famine Early Warning Systems Network.
2 Food Security and Nutrition Analysis Unit, Somalia.
3 (Emergency Nutrition Assessment) software is an analytical programme recommended by SMART.
Alvarez JL, Dent N, Browne L, Myatt M, & Briend A. Putting Child Kwashiorkor on the Map. CMAM Forum Technical brief. London, March 2016.
Guerrero S. Strength in Numbers. Field Exchange 50. August 2015. p76. ENN.
More like this
Jose Luis Alvarez, Nicky Dent, Lauren Browne, Mark Myatt and André Briend Putting Kwashiorkor on the Map started as a call for sharing data to give an idea of...
FEX: Admission profile and discharge outcomes for infants aged less than six months admitted to inpatient therapeutic care in ten countries
Summary of research* Location: Global (Burundi, DRC, Kenya, Liberia, Myanmar, Niger, Somalia, Sudan, Tajikistan, Uganda) What we know: The burden of acute malnutrition in...
FEX: Effect of nutrition survey ‘cleaning criteria’ on estimates of malnutrition prevalence and disease burden: secondary data analysis
Summary of research1 Location: Global What we know: Standardised methods for collection and reporting malnutrition prevalence data in nutrition surveys are used. What this...
UNICEF has issued a request to organisations to share data to inform an analysis of the incidence of severe acute malnutrition (SAM) at country level. This analysis aims to...
en-net: Highest SAM rate
I've just got the report of a nutrition survey. Severe acute malnutrition figure is extremely high (above 20%, OMS2006) and I was wondering whether this is a plausible result...
FEX: Monitoring and evaluation of programmes in unstable populations: Experiences with the UNHCR Global SENS Database
By Melody Tondeur, Caroline Wilkinson, Valerie Gatchell, Tanya Khara and Mark Myatt View this article as a pdf Lisez cet article en français ici Mélody Tondeur...
en-net: Epi Info command syntax error
I am analysing SMART data using EpiInfo 7. We used cluster sampling, so during the analysis I am using complex sample statistics. When I try to run a complex sample frequency I...
FEX: A growth reference for MUAC-for-age among school age children and adolescents and validation for mortality
Summary of research1 Location: Kenya, Uganda and Zimbabwe What we know: An internationally accepted reference for mid-upper arm circumference (MUAC) does not exist for...
Resource: Putting Child Kwashiorkor on the Map
This report highlights the importance of kwashiorkor as a public health problem, as reflected by its prevalence and also by the proportion of SAM cases it represents in...
Summary of presentation1 View this article as a pdf By Michael H. Golden and Emmanuel Grellety Michael Golden is a retired professor of medicine with 45 years' experience of...
By Mark Myatt and Frances Mason Mark Myatt is a consultant epidemiologist and senior research fellow at the Division of Epidemiology, Institute of Opthalmology, University...
[b]5 weeks consultancy to update a survey database and conduct initial analysis Location (Nairobi, home based)[/b] Established in 1915 with Helen Keller as a founding trustee,...
Summary of published paper1 An infant having MUAC measured during the study in Kenya Current WHO guidelines for the management of severe malnutrition in children recommend...
Hi all, I would like to know how better we can utilize the past nutrition surveys for secondary analysis. The datasets have IYCF, Health, Anthropometric, dietary and...
By Elena Rivero, Núria Salse and Eric Zapatero Elena Rivero is currently working for Action Against Hunger as Surveillance Advisor in the Malawi Integrated Nutrition and Food...
FEX: Risk factors associated with severe acute malnutrition in infants under six months in India: a cross sectional analysis
By Susan Thurstans Susan is a registered nurse and midwife with over 12 years' experience in maternal and child health and nutrition programmes in both development and...
en-net: Comparing 2 surveys
I have 2 anthropometrc survey datasets, one survey was conducted using 30x30 cluster sampling method (where children were selected using quota sampling method) and the ther...
FEX: Stunting and wasting in children under two years old in a semi-nomadic pastoralist population in Kenya
By Amelia Reese-Masterson, Masumi Maehara and Mark Murage Gathii Amelia Reese-Masterson is Research Advisor in the Nutrition, Food Security and Livelihoods Unit at...
I want to analyze associations between wasting and (morbidity, sex and age groups.... etc) in a nutrition survey - 30X24 two stage cluster sampling. I have tried different...
FEX: Assessment of the PROBIT approach for estimating the prevalence of acute malnutrition from population surveys
Summary of research1 Location: N/A What we know already:Prevalence of GAM is normally estimated using two stage cluster sampled surveys using the SMART method. The PROBIT...
Reference this page
Lauren Browne (2016). Improving the quality of nutritional survey data worldwide: Putting Child Kwashiorkor on the Map Initiative. Field Exchange 52, June 2016. p49. www.ennonline.net/fex/52/surveychildkwashiorkor