OBJECTIVE: Interpretation of thyroid ultrasonography for assessing goiter prevalence requires valid reference criteria from iodine-sufficient populations. Reports have suggested the current reference criteria for thyroid volume (T(vol)) of WHO/ICCIDD (International Council for the Control of Iodine Deficiency Disorders) may be too high. Our objective was to determine if inter-observer and/or inter-equipment variability contributes to the disagreement in sonographic T(vol) in children reported from iodine-sufficient areas. DESIGN: A 2-day workshop in which four experienced ultrasound examiners from around Europe measured T(vol) in 45 6--12-year-old Swiss schoolchildren using four different portable ultrasound machines. One of the participating examiners (observer A) had generated the T(vol) data in European children that are the basis for the WHO/ICCIDD reference criteria. METHODS: Sonographic T(vol) was measured in each child by all four examiners on all four machines. Six hundred and eighty-four examinations were completed, with examiners having no knowledge of one another's results. Inter-observer and inter-equipment variation was calculated. RESULTS: Mean inter-equipment variation in T(vol) was 15.2% (95% CI: 14.1, 16.3%). There were no significant differences in T(vol) between equipment (P=0.51). For all observers, the mean inter-observer variation in T(vol) was 25.6% (95% CI: 23.9, 27.2%). At all ages and all body surface areas, there was a large systematic measurement bias (+30% volume) between the mean T(vol) of observer A and the mean Tvol of observers B, C and D. Reanalysis using data from observers B, C and D reduced the mean inter-observer variation in T(vol) to 13.3% (95% CI: 11.9, 14.7%). A correction factor for the systematic difference of operator A for the P50 and P97 of T(vol) was estimated using analysis of covariance. When applied to the WHO/ICCIDD reference data, it sharply reduced the discrepancy between the WHO/ICCIDD criteria and those from other iodine-sufficient children around the world. CONCLUSIONS: Inter-equipment error contributes minimally to reported differences in sonographic T(vol). Even among experienced examiners, inter-observer variation in sonographic T(vol) in children can be high, and probably contributes to the current disagreement on normative values in iodine-sufficient children. A systematic bias at least partially explains why the WHO/ICCIDD reference data differ from those reported from other iodine-sufficient children around the world. The findings argue strongly for the standardization of methods used for sonographic measurement of T(vol) in children.