MANAGEMENT OF ENDOCRINE DISEASE: Imaging for the diagnosis of malignancy in incidentally discovered adrenal masses: a systematic review and meta-analysis

Objective Adrenal masses are incidentally discovered in 5% of CT scans. In 2013/2014, 81 million CT examinations were undertaken in the USA and 5 million in the UK. However, uncertainty remains around the optimal imaging approach for diagnosing malignancy. We aimed to review the evidence on the accuracy of imaging tests for differentiating malignant from benign adrenal masses. Design A systematic review and meta-analysis was conducted. Methods We searched MEDLINE, EMBASE, Cochrane CENTRAL Register of Controlled Trials, Science Citation Index, Conference Proceedings Citation Index, and ZETOC (January 1990 to August 2015). We included studies evaluating the accuracy of CT, MRI, or 18F-fluoro-deoxyglucose (FDG)-PET compared with an adequate histological or imaging-based follow-up reference standard. Results We identified 37 studies suitable for inclusion, after screening 5469 references and 525 full-text articles. Studies evaluated the accuracy of CT (n=16), MRI (n=15), and FDG-PET (n=9) and were generally small and at high or unclear risk of bias. Only 19 studies were eligible for meta-analysis. Limited data suggest that CT density >10HU has high sensitivity for detection of adrenal malignancy in participants with no prior indication for adrenal imaging, that is, masses with ≤10HU are unlikely to be malignant. All other estimates of test performance are based on too small numbers. Conclusions Despite their widespread use in routine assessment, there is insufficient evidence for the diagnostic value of individual imaging tests in distinguishing benign from malignant adrenal masses. Future research is urgently needed and should include prospective test validation studies for imaging and novel diagnostic approaches alongside detailed health economics analysis.


Introduction
An incidentally discovered adrenal mass is a frequent occurrence, serendipitously discovered in around 5% of cross-sectional abdominal imaging carried out for purposes other than a suspected adrenal problem (1,2,3).Due to the increasingly widespread use of cross-sectional imaging, adrenal incidentalomas represent a significant challenge to health care budgets.The rates of computed tomography (CT) scans carried out in the USA soared from 3 million per annum in 1980 to 81.2 million in 2014 (4).Concurrently, in the UK, 5 million CT scans were undertaken in 2012/2013, increasing from 1 million in 1996/1997 (www.england.nhs.uk/statistics/statisticalwork-areas/diagnostics-waiting-times-and-activity/imaging-and-radiodiagnostics-annual-data/).The use of repeated and multiple modality imaging in adrenal incidentalomas represents a major challenge to health care budgets and a burden to patients affected.Therefore, evidence-based guidance on the use of imaging in adrenal incidentalomas is urgently needed.
Prevalence of adrenal incidentalomas increases with age (3% at 40 years, 10% at 70 years) (5), and is very low in children (<0.5%) (6).A key consideration for the diagnostic workup of adrenal incidentalomas is whether the adrenal mass is hormone-producing, requiring exclusion of pheochromocytoma, Cushing syndrome, and, in hypertensive patients, primary aldosteronism.Second, and usually perceived as most important by the affected patient, the possibility of malignancy has to be considered.
In patients with a history of extra-adrenal malignancy, the detection of a new adrenal mass raises suspicion of metastasis, but also requires careful exclusion of other causes.In cancer patients, the likelihood of an adrenal nodule being malignant is approximately 20%; eventually, only 70% of adrenal lesions surgically removed on the basis of imaging results are confirmed as metastasis by histology (7,8,9).
While the detection of adrenal metastasis is a rarity in adrenal incidentaloma patients who do not have a history of extra-adrenal malignancy, the discovery of an adrenocortical carcinoma (ACC) is not uncommon.Larger clinical and surgical adrenal incidentaloma series report an ACC prevalence of 1.4-12% (2,10,11,12), with variability mostly driven by referral bias.Radiological studies describe lower rates of malignant and functionally active adrenal tumors, but usually lack uniform endocrine evaluation and an optimal reference standard such that malignant lesions could be missed (3).
An adrenal incidentaloma is most frequently noted on CT or MRI scans carried out for other purposes.Both imaging modalities can assess the lipid content in the adrenal mass, which serves as the basis for differentiating between a benign (high lipid content) and a potentially malignant (low lipid content) adrenal mass.However, at least a third of benign adrenal adenomas have been shown to be lipid-poor (13,14).This lack of specificity causes many patients to undergo multiple scans and imaging modalities, often followed by surgery, with histology ultimately revealing a benign mass that would not have required surgery in 30-55% of patients (2,15).
In addition to the general radiological criteria of size of the mass and its appearance (heterogeneity, borders, invasion) (13,16), multiple imaging parameters are employed for the differential diagnosis of adrenal incidentaloma.These include unenhanced CT with assessment of tumor density, contrast-enhanced timed washout CT studies, MRI chemical shift analysis, and, more recently, 18 F-fluoro-deoxyglucose (FDG)-PET (FDG-PET) in combination with CT (PET-CT).
However, despite their widespread use in the workup of adrenal incidentalomas, the optimal choice, sequence and performance of imaging tests to distinguish benign from malignant adrenal masses is unclear (17), and clinical practice remains more expert-based than evidence-based.Individually, published reports are often unconvincing due to small sample sizes, heterogeneity of included populations and different imaging techniques or cut-offs as well as poor reference standards.Due to this, many patients with adrenal tumors undergo multiple scans, annual follow-up imaging and even unnecessary surgery (2), with previous guidelines and reviews requesting annual follow-up imaging for up to 2 years in most adrenal incidentaloma patients not undergoing surgery (16,17,18).
We have carried out a systematic review and metaanalysis of the diagnostic performance of imaging tests in incidentally discovered adrenal masses, with the aim of facilitating evidence-based recommendations on the effective use of imaging in adrenal incidentalomas.With advances in the evaluation of diagnostic test accuracy increasing the awareness of potential sources of bias (19,20,21,22), as well as summarizing study findings, we provide insights into the validity and applicability of the available evidence base and identify current limitations.

Methods
This review follows methods as set out in the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy (23)

Study selection
We considered all studies of CT, MRI, or FDG-PET in adult participants with incidentally identified adrenal masses for inclusion.These included both patients in whom imaging for any indication other than an adrenal mass led to the detection of an adrenal mass (true adrenal incidentalomas) and patients with an adrenal mass detected by imaging carried out for staging or follow-up of extra-adrenal malignancy.Studies that did not report the original indication for imaging are reported, but were not included in the meta-analyses.The target condition of interest was the detection of adrenocortical carcinoma (ACC) or adrenal metastases from an extra-adrenal primary malignancy.We included all studies with reference standards where i) at least 50% of participants with ACC or a malignant adrenal mass had a histologically proven reference standard diagnosis (obtained either through adrenalectomy or adrenal biopsy) and ii) at least 50% of those with a benign adrenal mass had their final diagnosis reached by either histology or imaging-based follow-up of any duration.
In collaboration with clinical and radiological experts from the European Society of Endocrinology (ESE) and European Network for the Study of Adrenal Tumors (ENSAT) Clinical Practice Guideline Committee for the management of adrenal incidentalomas, we selected five commonly used diagnostic imaging thresholds for inclusion: (i) non-contrast CT: tumor density measured in Hounsfield units (HU) >10; (ii) contrast-enhanced CT washout studies: absolute percentage washout (APW) and/or relative percentage washout (RPW) at any washout percentage or delay time on enhanced CT; (iii) MRI chemical shift analysis: loss of signal intensity between in and out of phase images (including both qualitative and quantitative estimates of signal loss); and, for FDG-PET or PET-CT, (iv) the maximum standardized uptake value (SUV max ); and (v) the ratio of SUV max in the adrenal gland compared with the liver (adrenal liver ratio (ALR)).
We excluded studies where more than half of participants presented with endocrine symptoms, or were otherwise suspected of hormone excess, and those concerned with the diagnosis of adrenomedullary tumors; pheochromocytomas can usually be detected by measuring plasma or urinary metanephrines and their imaging characteristics overlap with those observed in adrenocortical malignancy and adrenal metastases.Therefore, studies with more than 30% pheochromocytomas in the disease-positive group were excluded, unless data could be disaggregated to allow their exclusion from the analysis.We also excluded studies in pediatric populations, sample size <10, data collection before 1990, and with insufficient data presented to allow the construction of a 2 × 2 diagnostic contingency table.Non-English language studies and studies only reported in conference abstracts were excluded.
Title and abstract screening and full-text inclusion was carried out independently by two reviewers (I B, J Di,).Any disagreements were resolved through discussion or referral to a third reviewer (C D, V C, L F R).

Data extraction and quality assessment
Data extraction was carried out independently by at least two authors (I B, J Di, L F R, V C, C D) using a standardized and piloted data extraction form.Details of the study design, participants, lesion characteristics, index test(s) or test combinations and index test positivity thresholds, reference standards, and 2 × 2 diagnostic contingency table data were extracted.Any malignant masses detected in addition to ACC or adrenal metastases (malignant pheochromocytomas, other malignant medullary tumors or other malignancy) were considered disease positive, as 175:2 R54 Review J Dinnes, I Bancos and others Imaging in adrenal incidentalomas their clinical management is sufficiently similar.If study data could not be fully disaggregated, the malignant group could include up to 10% benign masses and up to 10% of the benign group could include medullary tumors (pheochromocytomas, neuroblastoma, ganglioneuroma, or schwannoma).Discrepancies in data extraction were resolved by consensus or by a third reviewer.We considered the risk of bias and concerns about the applicability of findings related to the patients, tests, reference standard, and execution of each study, using the QUADAS-2 checklist (19), tailored to the review topic.Three authors (I B and V C plus J Di or L F R) independently rated each study with disagreement resolved by consensus.
Patient selection was regarded at risk of bias if consecutive or random selection was not used, patients were selected according to presence of adrenalectomy data, or patients were inappropriately excluded based on previous lesion assessments.Test and reference standard implementation were considered at risk of bias when each was undertaken with knowledge of the other, or when test thresholds were not prespecified, and when final diagnoses of malignancy were not all based on histology or tumor sampling was inadequate, or benignity was assumed without histology or <12 months imaging follow-up.Non-blinded interpretation of other imaging tests added to bias in test interpretation.Risk of bias in the execution of the study was considered when reference standards were not undertaken in all patients, when participants were excluded from analyses, when the reference standards used in malignant or benign cases varied, or when there was no follow-up of suspected benign cases within 6 months.
Concerns about applicability were noted for participants when <90% were recruited with incidentally discovered adrenal tumors or having known or prior malignancy; for tests, when inadequate detail of the test measure was given to allow replication or standard thresholds were not used; and when the reference standard did not allow full disaggregation of the tumor types into malignant and benign.

Data synthesis and analysis
Data synthesis focused on estimating the accuracy of each test for diagnosis of malignancy for separate clinical pathways for (i) adrenal incidentaloma, that is, investigation of an adrenal tumor detected by imaging carried out for an indication other than suspected adrenal disease and for (ii) history of extra-adrenal malignancy, that is, imaging evaluation or staging in patients with known or prior non-adrenal malignancy.It was considered possible that the accuracy of each test may differ between these clinical pathways.Each study was characterized according to whether the majority (>50%) or nearly all (>90%) individuals were recruited in each pathway, and separate analyses were undertaken for each group.Studies that did not meet these criteria or where the reasons for imaging could not be ascertained were excluded from the analysis.For analysis of MRI chemical shift we restricted inclusion to studies using 1.5 Tesla machines, which were the majority.
Estimates of sensitivity and specificity and 95% CIs for the detection of malignancy were calculated using the binomial exact method when there was only one study, or when there were no false negatives or false positives.Otherwise, the bivariate hierarchical model was used to obtain meta-analytical estimates of average sensitivity and specificity (25).Where possible, the model included terms for random effects for sensitivity and specificity and their correlation, but was simplified when inadequate numbers of studies were available (26).
Summary study characteristics are presented in Table 1.CT was evaluated in 16 studies (non-contrast CT was evaluated in 13 studies, contrast-enhanced CT washout studies in 6 studies), MRI in 15 studies, and PET in 9 studies.Studies were generally small with a median  Where reported, study populations were highly varied, with only 7 studies (19%) including a majority of participants with purely incidental findings and 11 (29%) focusing primarily on participants with known extraadrenal malignancy (>50% of population) (Table 1).Studies variously excluded masses with particular imaging characteristics including CT HU < 10 (n = 3), size <10 mm

Study quality
The vast majority (84%) of studies were at high or unclear risk of bias across all quality domains assessed (Fig. 1A and Supplementary Figs 1, 2, 3).A third of studies (n = 12) only included participants selected for adrenalectomy and therefore at higher risk of malignancy, and four adopted a case-control type approach with separate selection of those with confirmed malignancy and benign disease (33,39,49,53).PET (Supplementary Fig. 3) and MRI evaluations (Supplementary Fig.

Results according to clinical pathway
Poor reporting of the clinical pathways leading to the conduct of the imaging tests resulted in exclusion of 19 of 37 eligible studies from analysis (described in Supplementary Table 3).Characteristics of the 18 studies eligible for analysis are provided according to clinical pathway in Table 2 and results of test performance are reported in Table 3, with raw data for all test evaluations provided in Supplementary Table 4.

Test performance in the investigation of incidentally detected tumors
Seven studies presented data on test performance (two for CT (27,30), three for MRI (28,31,46), and two for PET-CT (29,61)) in patient groups presenting with more than 50% (and two with >90%) incidentally detected tumors.Two studies evaluating tumor density >10 HU on non-contrast CT (27,30), and one evaluating CT contrastenhanced washout tests (27) showed high sensitivity and specificity.Only two (28,31) of the three studies of MRI used 1.5 Tesla machines and reported slightly lower sensitivity and specificity than CT for measures of adrenalliver and adrenal-spleen ratios and loss of signal intensity.The performance of PET for ALR and SUV max measures was no better than CT.
The data suggest that CT density >10 HU has high sensitivity for the detection of malignancy, the 95% CI suggesting that this is above 90%.However, all other estimates of test performance are based on small numbers of studies with few patients, and 95% CIs are notably wide, indicating uncertainty in test performance for all other imaging markers.It is not possible to discern from the available data whether any test performs adequately or better than alternative tests.

Test performance in the investigation of tumors in participants with current or prior non-adrenal malignancy
Eleven studies presented data on test performance (five for CT (7,33,34,35,37), five for MRI (32,34,35,36,60), and three for PET-CT (8,38,62)) in patient groups presenting with more than 50% (and 9 with >90%) tumors detected in patients undergoing imaging following previous nonadrenal malignancy.The five studies evaluating CT density >10 HU on non-contrast CT (7,33,34,35,37) showed high sensitivity (93%) but variable specificity; CT contrastenhanced washout tests were only reported in one study (33), which showed very low sensitivity (16%).Four (32,34,36,60) of the five studies of MRI used 1.5 Tesla machines and reported high sensitivity (89-99%) for measures of adrenal-liver, adrenal-spleen, adrenal-muscle ratios and loss of signal intensity.Specificity varied (60-93%) but was high for most MRI measures.The performance of PET was similar to MRI for ALR and SUV max measures.
Although more studies had evaluated CT, MRI, and PET in the pathway for follow-up of known malignancy than for incidentally discovered adrenal lesions, estimates of test performance are still based on too small numbers

Discussion
Our main finding cautiously suggests that in patients without known extra-adrenal malignancy, a non-contrast CT tumor density of 10 HU is a diagnostically relevant cut-off, albeit based only on data from two small studies.The sensitivity of >10 HU for detecting malignancy was high (100%; 95% CI: 91, 100%), however, the specificity was poor.Conversely, this means that an incidentally discovered adrenal mass with a non-contrast CT tumor density of ≤10 HU is unlikely to be malignant.Tumor density ≤10 HU was less conclusive for ruling out malignancy in patients with a history of extra-adrenal malignancy, however, with a pooled false-negative rate of 7%, although CIs were wide.With positive predictive values for detection of malignancy in the order of 70-80% in both populations, a considerable number of adrenal masses with tumor density >10 HU are likely to be benign.These and all other pooled estimates have such wide CIs that no further conclusions can be drawn regarding the accuracy of imaging tests for the detection of malignancy in incidentally discovered adrenal masses.
Possible clinical explanations for this uncertainty include variability in the lipid content of adenomas, tissue heterogeneity, small size of metastatic lesions, or differences in selecting regions of interest for HU measurement.However, most of the uncertainty is due to small numbers of eligible studies and hence results from few patients available for analysis.Despite the availability of a significant number of studies addressing imaging characteristics in patients with an adrenal mass, more than 90% of full-text papers retrieved had to be excluded.Many had small sample sizes, mixed populations, inadequate reporting on imaging techniques and thresholds, as well as unacceptable reference standards for both malignant and benign masses.Even with our stringent eligibility criteria, included studies were characterized by heterogeneity in study populations, imaging tests and thresholds, and reference standards as well as poor methodological quality.Given differences in patient spectrum according to the indication for adrenal imaging and the potential impact on accuracy (63,64,65), our meta-analysis was further restricted to studies where a majority of participants had either incidentaloma or were undergoing imaging due to known malignancy, leading to the exclusion of another 50% of included studies.Heterogeneity in study conduct and poor methodological quality remained, further contributing to the lack of certainty in pooled estimates.
Our findings are disappointingly consistent with another systematic review of the literature on tests for adrenal incidentaloma published almost 15 years ago (66).Observed heterogeneity in tests and populations meant that no meta-analysis was undertaken and no clear conclusions could be drawn (66).Almost three-quarters (27/37) of the studies in our review were published in the interim period; however, methodological and reporting quality have not improved sufficiently to allow any new conclusions to be drawn.A more recent meta-analysis of FDG-PET (67) applied considerably less stringent inclusion criteria compared with our review, thereby including more studies (n = 21); however, highly heterogeneous data limited the conclusions that could be drawn.
Our findings of poor quality and reporting of test accuracy studies are similar to findings from other fields (68,69,70).Introduction of the Standards for Reporting Diagnostic Accuracy (STARD) statement (71) has only led to small improvements in reporting (72) and our results indicate that greater awareness is required of methodological considerations in the design and delivery of multicenter studies in this field, as in many others, to improve reporting.
The strengths of this review include an in-depth comprehensive literature search, a focused review question, and stringent predefined reference standard.The limitations were derived from the heterogeneity and low quality of included studies.Unclear definitions of study populations, various and often data-driven thresholds, as well as different techniques for the same imaging tests, limit the interpretation and generalization of results.The weak conclusions derived from this systematic review and meta-analysis should be interpreted in relation to the low volume and poor quality of included studies (Fig. 1B).
Our results do not suggest that current imaging practice is inappropriate: small study numbers prevent us from providing substantive evidence to either support current practice or to prompt a need for a change in imaging practice.We suggest further studies are needed to answer the following key questions: 1. Do adrenal lesions with unenhanced CT tumor density ≤10 HU need additional imaging, in particular in patients with a history of extra-adrenal malignancy? 2. What is the best second-line imaging study that would accurately diagnose (or exclude) a malignant adrenal mass? 3. What additional factors influence decisions on imaging choice?(patient preference, radiation risks, costs) 4. How much tumor growth, and over what period of time, is indicative of a malignant adrenal mass?
In addition, future studies should include the systematic evaluation of alternative testing approaches and detailed analysis of health economics impact.All these questions can only be answered with larger multicenter studies, with prospective recruitment of consecutive series of participants in appropriately defined clinical pathways, and imaging test interpretation blinded to the reference standard diagnosis and to the result of any other imaging tests.Diagnostic thresholds for determining benignity or malignancy must be prespecified to avoid data-driven threshold selection and overestimation of test accuracy.The reliance on a histological reference standard leads to study populations with a high pretest probability of malignancy, however, imaging follow-up of those with indeterminate imaging characteristics needs to be long enough to ensure that malignant masses are not missed.Centralized radiological and pathology review would further help to strengthen the results.Future investigators must also meet the updated STARD recommendations (20) so that study conduct and quality can be judged appropriately.
and reporting standards set in the Studies published before 1990 were not considered to be representative of current imaging technologies.The full search strategy as designed for MEDLINE is available in Supplementary Table1, see section on supplementary data given at the end of this article.The reference lists of included studies and relevant systematic reviews were reviewed for additional eligible studies.

Table 1
summary of the characteristics of the 37 studies fulfilling the inclusion criteria.max , maximum standardized uptake value; aLR sUV max , ratio of sUV max in the adrenal gland compared with the liver.*Mean; † Range; ‡ Mean of reported means. sUV

Table 2
characteristics of the 18 studies eligible for meta-analysis.masses considered to be malignant if their signal was more intense than liver signal; † masses considered to be metastases if their signal was more intense than liver signal and inferior to kidney acc, adrenocortical carcinoma; aDc, apparent diffusion coefficient; aLR, adrenal to liver ratio; aMR, adrenal to muscle ratio; aPW, absolute percentage washout; asR, adrenal to spleen ratio; BPc, between-person comparison (multiple index tests evaluated in partial study population); cs, chemical shift; cT, computed tomography; Dis, diseased; HE, hormone excess; HU, Hounsfield units; I, incidental; IP, in-phase; KM, known malignancy; les, lesions; mets, metastases; nc, non-comparative study; nR, not reported; OP, opposed phase; P, prospective data collection; Pat., patients; PET, positron emission tomography; R, retrospective data collection; RPW, relative percentage washout; s, symptomatic; sI, signal intensity; sII, signal intensity index; sUV max , maximum standardized uptake value; WPc, within-person comparison (multiple index tests evaluated in all study participants).*

Table 3
Test performance according to clinical pathway.studies focusing on truly incidentally discovered adrenal masses (incidentaloma pathway) vs studies on adrenal masses discovered during follow-up monitoring for extra-adrenal malignancy (follow-up from previous malignancy pathway).sUV max , ratio of sUV max in the adrenal gland compared with the liver; HU, Hounsfield units; n, number of cases; N, total population; PET, positron emission tomography; sUV max , maximum standardized uptake value.*refers to ≥50% with incidentaloma in studies in the incidentaloma pathway and ≥50% with current or prior non-adrenal malignancy in the follow-up from previous malignancy pathway; **refers to ≥90% with incidentaloma in studies in the incidentaloma pathway and ≥90% with current or prior non-adrenal malignancy in the follow-up from previous malignancy pathway. aLR