OBJECTIVES: In 1994, WHO/International Council for the Control of Iodine Deficiency Disorders recommended replacing the WHO 1960 four-grade goiter classification with a simplified two-grade system. The effect of this change in criteria on the estimation of goiter prevalence in field studies is unclear. In areas of mild iodine deficiency disorders (IDD) where goiters are small, ultrasound is preferable to palpation to estimate goiter prevalence. However, in areas of moderate to severe IDD, goiter screening by palpation may be an acceptable alternative to thyroid ultrasound. To address these two issues, we compared WHO 1960 and 1994 criteria with thyroid ultrasound for determination of goiter prevalence in areas of mild and severe IDD in Morocco. DESIGN: A cross-sectional study of 400 six- to 13-year-old children from two mountain villages (Ait M'hamed and Brikcha) in rural Morocco was carried out. METHODS: Urinary iodine concentration (UI), whole blood TSH and serum thyroxine were measured. Thyroid size was graded by inspection and palpation by two examiners using both WHO 1960 and 1994 criteria. Thyroid volume was determined by ultrasound. Variation between examiners and examination methods was assessed. Sensitivity and specificity of the two classification systems compared with ultrasound were calculated. RESULTS: Median UIs in Ait M'hamed and Brikcha were 183 and 24 microg/l respectively. In Ait M'hamed, using 1960 and 1994 criteria, goiter prevalence was 21 and 26% respectively, compared with 13% by ultrasound. In Brikcha, with 1960 and 1994 criteria, goiter prevalence was 64 and 67% respectively, compared with 64% by ultrasound. Agreement between observers was better with the 1994 criteria than with the 1960 criteria in Ait M'hamed (kappa=0.53 and 0.47 respectively), while in Brikcha observer agreement was similar with the two systems (kappa=0.67). Using either the 1994 or 1960 criteria, agreement with ultrasound was only moderate in Ait M'hamed (kappa=0.41-0.44), but good in Brikcha (kappa=0.55-0.64). Overall, compared with ultrasound, sensitivity increased 3-4% using 1994 criteria, while specificity decreased 4-5%. CONCLUSIONS: The WHO 1994 criteria are simpler to use than the 1960 criteria and provide increased sensitivity with only a small reduction in specificity. Agreement between observers is better with the 1994 criteria than with the 1960 criteria, particularly in areas of mild IDD. Like the 1960 criteria, the 1994 criteria overestimate goiter prevalence in areas of mild IDD, compared with ultrasound. However, the 1994 palpation criteria provide an accurate estimate of goiter prevalence in areas of severe IDD, and may be an acceptable and affordable alternative to thyroid ultrasound in these areas.
M Zimmermann, A Saad, S Hess, T Torresani and N Chaouki
MB Zimmermann, L Molinari, M Spehl, J Weidinger-Toth, J Podoba, S Hess and F Delange
OBJECTIVE: Interpretation of thyroid ultrasonography for assessing goiter prevalence requires valid reference criteria from iodine-sufficient populations. Reports have suggested the current reference criteria for thyroid volume (T(vol)) of WHO/ICCIDD (International Council for the Control of Iodine Deficiency Disorders) may be too high. Our objective was to determine if inter-observer and/or inter-equipment variability contributes to the disagreement in sonographic T(vol) in children reported from iodine-sufficient areas. DESIGN: A 2-day workshop in which four experienced ultrasound examiners from around Europe measured T(vol) in 45 6--12-year-old Swiss schoolchildren using four different portable ultrasound machines. One of the participating examiners (observer A) had generated the T(vol) data in European children that are the basis for the WHO/ICCIDD reference criteria. METHODS: Sonographic T(vol) was measured in each child by all four examiners on all four machines. Six hundred and eighty-four examinations were completed, with examiners having no knowledge of one another's results. Inter-observer and inter-equipment variation was calculated. RESULTS: Mean inter-equipment variation in T(vol) was 15.2% (95% CI: 14.1, 16.3%). There were no significant differences in T(vol) between equipment (P=0.51). For all observers, the mean inter-observer variation in T(vol) was 25.6% (95% CI: 23.9, 27.2%). At all ages and all body surface areas, there was a large systematic measurement bias (+30% volume) between the mean T(vol) of observer A and the mean Tvol of observers B, C and D. Reanalysis using data from observers B, C and D reduced the mean inter-observer variation in T(vol) to 13.3% (95% CI: 11.9, 14.7%). A correction factor for the systematic difference of operator A for the P50 and P97 of T(vol) was estimated using analysis of covariance. When applied to the WHO/ICCIDD reference data, it sharply reduced the discrepancy between the WHO/ICCIDD criteria and those from other iodine-sufficient children around the world. CONCLUSIONS: Inter-equipment error contributes minimally to reported differences in sonographic T(vol). Even among experienced examiners, inter-observer variation in sonographic T(vol) in children can be high, and probably contributes to the current disagreement on normative values in iodine-sufficient children. A systematic bias at least partially explains why the WHO/ICCIDD reference data differ from those reported from other iodine-sufficient children around the world. The findings argue strongly for the standardization of methods used for sonographic measurement of T(vol) in children.