3Evidence base for policy formulation

Publication Details

3.1. Use of IGRAs in diagnosis of active TB

3.1.1. Study characteristics

Studies included were those that evaluated the performance of the most recent generation of commercial, RD1 antigen based IGRAs (QuantiFERON-TB Gold In-Tube (QFT-GIT) [Celestis, Victoria, Australia] and T-SPOT [Oxford Immunotec, Oxford, United Kingdom]) among adult (>15 years) active pulmonary TB suspects or cases in low- and middle-income countries.

Studies excluded were those that evaluated non-commercial IGRAs, PPD-based IGRAs, QuantiFERON-TB Gold (2G), IGRAs performed in specimens other than blood; those reporting longitudinal data focused on the effect of anti-TB treatment on IGRA response; studies including <10 eligible individuals; studies focused on extrapulmonary tuberculosis in children; studies reporting insufficient data to determine diagnostic accuracy measures; and conference abstracts, letters without original data and reviews.

The initial search yielded 789 citations. After full-text review of 185 papers evaluating IGRAs for the diagnosis of active TB, 22 were determined to meet eligibility criteria, covering 33 unique evaluations of one or more IGRAs (hereafter referred to as studies) in 19 published and 3 unpublished reports. Of the 33 studies, 10 (30%) were from low-income countries, and 23 (70%) were from middle-income countries. Seventeen studies (52%) included HIV-infected individuals (n=1,057), and 27 (82%) studies involved ambulatory subjects (out-patients as well as hospitalized patients).

IGRAs were performed in persons suspected of having active TB in 19 (58%) studies and in persons with known active TB in 14 (42%) studies. Because of the focus on diagnostic accuracy for active TB and the high prevalence of LTBI in high TB-burden settings, IGRA specificity was estimated exclusively among studies enrolling TB suspects where the diagnostic workup ultimately showed no evidence of active disease.

3.1.2. Summary of results

The results demonstrated that in low- and middle-income countries:

  • The sensitivity of IGRAs in detecting active TB among persons suspected of having TB ranged from 73-83% and specificity ranged from 49-58%; One in four patients, on average, with culture-confirmed active TB could therefore be expected to be IGRA-negative in low-and middle income countries, with serious consequences for patients in terms of morbidity and mortality;
  • There was no evidence that IGRAs have added value beyond conventional microbiological tests for the diagnosis of active TB. Among studies that enrolled TB suspects (ie. patients with diagnostic uncertainty), both IGRAs demonstrated suboptimal ‘rule-out’ values for active TB;
  • Even though data were limited, the sensitivity of both IGRAs was lower among HIV-positive patients (around 60-70%), suggesting that nearly one in three HIV-positive patients with active TB would be IGRA-negative;
  • There was no consistent evidence that either IGRA was more sensitive than the TST for active TB diagnosis, although comparisons with pooled estimates of TST sensitivity were difficult to interpret due to substantial heterogeneity;
  • The few available head-to-head comparisons between QFT-GIT and T-SPOT demonstrated higher sensitivity for the T-SPOT platform, though this difference did not reach statistical significance;
  • The specificity of both IGRAs for active TB was low, regardless of HIV status, and suggested that one in two patients without active TB would be IGRA-positive, with adverse consequences for patients because of unnecessary therapy for TB and a missed differential diagnosis;
  • Two unpublished reports reported no incremental or added value of IGRA test results combined with important baseline patient characteristics (eg. demographics, symptoms, or chest radiograph findings), thus not supporting a meaningful contribution of IGRAs for diagnosis of active TB beyond readily available patient data and conventional tests;
  • The systematic review focused on the use of IGRAs to diagnose active pulmonary TB, data for extra-pulmonary TB being non-existent; nevertheless, consensus by the Expert Group was that recommendations for pulmonary TB could reasonably be extrapolated to extra-pulmonary TB;
  • Industry involvement was unknown in 18% studies and acknowledged in 27% studies, including donation of IGRA kits as well as work/financial relationships between authors and IGRA manufacturers.

3.1.3. Strengths and limitations of the evidence base

Heterogeneity was substantial for the primary outcomes of sensitivity and specificity. Empirical random effects weighting, excluding studies contributing fewer than 10 eligible individuals, and separately synthesizing data for currently manufactured IGRAs were performed in order to minimize heterogeneity.

No standard criteria exist for defining high TB incidence countries and the World Bank income classification is an imperfect surrogate for national TB incidence; nevertheless, results were fundamentally unchanged when restricted to countries with an arbitrarily chosen annual TB incidence of greater than or equal to 50/100,000 population.

It is possible that ongoing studies were missed despite systematic searching. It is also possible that studies that found poor IGRA performance were less likely to be published. Given the lack of statistical methods to account for publication bias in diagnostic meta-analyses, it would be prudent to assume some degree of overestimation of estimates due to publication bias.

The systematic review focused on test accuracy (ie. sensitivity and specificity) and indirect assessment of patient impact (false-positive and false-negative results). None of the studies reviewed provided information on patient-important outcomes, ie. showing that IGRAs used in a given situation resulted in a clinically relevant improvement in patient care and/or outcomes. In addition, no information was available on the values and preferences of patients.

3.1.4. Grade evidence profiles and final policy recommendations

The GRADE evidence profiles are provided in Tables 1 and 2. Based on these assessments, the Expert Group concluded that the quality of evidence for use of IGRAs and the TST in diagnosis of active TB was low and recommended that these tests should not be used in low- and middle-income countries as a replacement for conventional microbiological diagnosis of pulmonary and extra-pulmonary TB (strong recommendation).

The Expert Group also noted that current evidence did not support the use of IGRAs or the TST as part of the diagnostic work-up of adults suspected of active TB in low-and middle-income countries, irrespective of HIV status. This recommendation placed a high value on avoiding the consequences of unnecessary treatment (high false-positives) given the low specificity of IGRAs and the TST in these settings.

The systematic review results have subsequently been published.1

Policy recommendation: IGRAs (and the TST) should not be used in low- and middle-income countries for the diagnosis of pulmonary or extra-pulmonary TB, nor for the diagnostic work-up of adults (including HIV-positive individuals) suspected of active TB in these settings (strong recommendation). This recommendation places a high value on avoiding the consequences of unnecessary treatment (high false-positives) given the low specificity of IGRAs (and the TST) in these settings.

3.2. Use of IGRAs in children

3.2.1. Study characteristics

The initial search yielded 234 citations. After full-text review of 68 papers evaluating IGRAs in children, 32 were determined to meet eligibility criteria, covering 33 unique evaluations of one or more IGRAs (hereafter referred to as studies) in 18 countries. Of the 33 studies, three were from low-income countries, and 11 were from middle-income countries. The incidence of smear-positive TB was <25/100,000 in 18 of these countries and >=25/100,000 in the remaining countries. Studies performed in high-income countries included between 11% and 100% immigrant children from countries with higher burdens of TB.

All studies included in the review assessed either or both commercial IGRAs, QuantiFERON (QFT, in its Gold and In-Tube version) and T-SPOT.TB (T-SPOT) as well as the TST in children. Very few studies clearly reported on the sampling methods (consecutive, random or convenience) and representativeness of the patient spectrum. Blinding of clinicians to IGRA results were absent for most studies. Wide variation was evident on the criteria used for the definition of the reference standard (active TB).

Among studies in low- and middle-income countries analysing the test performance for latent TB infection, 4 studies used “exposed” and “unexposed” as comparison groups and 5 studies allowed analysis of the correlation between different grades of exposure and test results. Six studies from low- and middle-income countries were included in the analysis of test performance in TB disease, with varying definitions for each group of TB suspects/patients and for the “no TB” categories.

3.2.2. Summary of results

The majority of IGRA studies in children had been performed in high-income countries and extrapolation to low- and middle-income settings with high background TB infection rates was not appropriate. However, based on available data, the results indicated that in low- and middle-income countries:

  • IGRAs and the TST had very similar accuracy for diagnosis of LTBI and active TB in children;
  • Major methodological inconsistencies between studies had a negative effect on the comparability of studies and results. A key constraint was the lack of appropriate reference standards for diagnosis of paediatric TB, limiting the interpretation of estimates of test accuracy in children other than those with definite TB;
  • A clear advantage of IGRAs over TST in detecting LTBI in exposed or unexposed individuals or in a gradient of exposure was not detected;
  • Lower sensitivity of both IGRAs and TST was found in study populations with >50% BCG coverage. The reasons were not clear; however, BCG coverage may capture populations from settings with a higher burden of TB, hence with different epidemiological background and underlying conditions that may impair test accuracy, such as co-infections with helminths and malnutrition;
  • Both IGRAs and TST showed lower sensitivity in HIV-infected children in one study assessed;
  • Overall, the ability of TST and IGRAs were suboptimal to ‘rule out’ active TB. The main limitation for assessment of the specificity of the diagnostic assays among ‘no-TB’ groups was the small number of studies that described adequate methodology to exclude and diagnose active TB;
  • Indeterminate IGRA results varied across all studies, but higher rates were associated with young age, immune-suppression or helminth co-infection in individual studies on TB exposure;
  • In studies on active TB no correlation was found between indeterminate results and age, HIV status, TB burden or BCG vaccination status;
  • Studies rarely addressed the operational aspects and implementation feasibility of IGRAs. Cost was noted as an important and limiting factor. Aspects inherent to the use of IGRAs in children, such as the difficulty of phlebotomy and the amount of blood needed in young children, are relevant implementation considerations.
  • A third of studies were supported by manufacturers of IGRAs, mainly through donation of test kits.

3.2.3. Strengths and limitations of the evidence base

Studies included assessed very different populations in diverse settings, with the biggest challenge and limitation related to major differences in methodological approaches across studies and non-standardised definitions of reference standards, TB exposure and TB disease.

Sample sizes in the different studies varied greatly and were less than ten in some of the subgroups analysed, which adversely impact on generalisability of the findings.

Empirical random effects weighting and separately synthesizing data for currently manufactured IGRAs were performed in order to minimize heterogeneity; however, heterogeneity remained substantial for the primary outcomes of sensitivity and specificity.

No standard criteria exist for defining high TB-incidence countries and the World Bank income classification is an imperfect surrogate for national TB incidence; nevertheless, results were fundamentally unchanged when restricted to countries with an arbitrarily chosen annual TB incidence of greater than or equal to 25/100,000.

It is possible that ongoing studies were missed despite systematic searching. It is also possible that studies that found poor IGRA performance were less likely to be published. Given the lack of statistical methods to account for publication bias in diagnostic meta-analyses, it would be prudent to assume some degree of overestimation of estimates due to publication bias.

The systematic review focused on test accuracy (ie. sensitivity and specificity) for the diagnosis of active TB and TB exposure as surrogate for LTBI. None of the studies reviewed provided information on patient-important outcomes, ie. showing that IGRAs or the TST used in a given situation resulted in a clinically relevant improvement in patient care and/or outcomes. In addition, no information was available on the values and preferences of patients.

3.2.4. Grade evidence profiles and final policy recommendations

The GRADE evidence profiles are provided in Tables 3 to 6. Based on these assessments, the Expert Group concluded that the quality of evidence for use of IGRAS in children was very low and recommended that these tests should not be used in low- and middle-income countries as an alternative to TST in paediatric TB for the diagnosis of latent TB infection, nor as an alternative to TST in the workup of a diagnosis of active TB disease in children, irrespective of HIV status (strong recommendation).

The Expert Group also noted that there may be additional harms associated with blood collection in children and that issues such as acceptability and cost had not been adequately addressed in any studies.

The systematic review results have subsequently been published.2

Policy recommendation: IGRAs should not replace the TST in low- and middle-income countries for the diagnosis of latent TB infection in children, nor for the diagnostic work-up of children (irrespective of HIV status) suspected of active TB in these settings (strong recommendation). It should also be noted that there may be additional harms associated with blood collection in children and that issues such as acceptability and cost had not been adequately addressed in any studies.

3.3. Use of IGRAs for the diagnosis of LTBI in HIV-infected individuals

3.3.1. Study characteristics

The initial search yielded 791 citations. After full-text review of 129 papers evaluating IGRAs in immunocompromised individuals, 29 were determined to meet eligibility criteria, covering 37 unique evaluations (hereafter referred to as studies). Of these, 22 studies were conducted in low- and middle-income countries.

There was a high degree of variation in study design and study populations. 15/22 (68%) of studies included only ambulatory HIV-positive individuals. IGRAs were performed in persons with or suspected of having active TB in 12 studies, 6 studies evaluated asymptomatic HIV-positive persons for LTBI, and 4 studies considered both asymptomatic as well as symptomatic individuals with HIV co-infection.

3.3.2. Summary of results

Results indicated that in low- and middle-income countries:

  • The optimal test for identifying HIV-infected persons who could benefit from IPT remains an unanswered question although WHO recently endorsed IPT as one of three key public health strategies to reduce the impact of TB on persons living with HIV;
  • The majority of persons latently infected with TB, including persons co-infected with HIV, do not develop active TB. The clinical utility of any diagnostic test for LTBI is therefore dependent on its ability to identify which persons are truly at increased risk for progression to active TB and could benefit from IPT;
  • All three studies of the predictive value of IGRAs in HIV-infected individuals showed that IGRAs have poor positive predictive value but high negative predictive value for active TB. While these results suggest that a negative IGRA result is reassuring (no person with a negative IGRA result developed culture-positive TB), the studies had serious limitations, including small sample sizes with short-duration of follow-up and differential evaluation and/or follow-up of persons with positive and negative IGRA results;
  • Large prospective cohort studies have established that persons with a positive TST have a 1.4 to 1.7-fold higher rate of active TB within one year compared to persons with a negative TST result. Randomised controlled trials in HIV-infected persons demonstrated that IPT confers a 20-60% reduction in the risk of active TB and that this reduction occurs only in persons with positive TST results;
  • In spite of limited data on predictive value, it has been suggested that IGRAs may have a role for identifying TB infection in HIV-infected individuals given the known decreased performance of TST in immunosuppressed persons. However, neither IGRA was consistently more sensitive than TST in head-to-head comparisons and there was no data to show that individuals with TST-negative/IGRA-positive results had improved outcomes on IPT. Data on the impact of immunosuppression on IGRA validity remains unclear;
  • Seven (32%) studies reported industry involvement, including donation of IGRA test kits and work/financial relationships between IGRA manufacturers and principal authors.

3.3.3. Strengths and limitations of the evidence base

The major limitation was the lack of an adequate reference standard to evaluate the accuracy of IGRAs for diagnosis of LTBI. The majority of studies were small (< 100 patients in 12 of 22 studies), only five studies performed a head-to-head comparison of IGRA and TST results to a reference standard, and there were insufficient studies to perform meta-analysis in many sub-groups.

Given that both TST and IGRAs have suboptimal sensitivity and that discordant results are common, it would be relevant to evaluate outcomes when both tests are used, either simultaneously or sequentially, for diagnosing LTBI in HIV-infected persons.

3.3.4. Grade evidence profiles and final policy recommendations

The GRADE evidence profiles are provided in Tables 7 and 8. Based on these assessments, the Expert Group concluded that the quality of evidence for use of IGRAS in individuals living with HIV infection was very low and recommended that these tests should not be used in low- and middle-income countries as a replacement for TST for the assessment of LTBI (strong recommendation).

The systematic review results have subsequently been published.3

Policy recommendation: IGRAs should not replace the TST in low- and middle-income countries for the diagnosis of latent TB infection in individuals living with HIV infection (strong recommendation). This recommendation also applies to HIV-positive children based on the generalisation of data from adults.

3.4. Use of IGRAs for screening of health care workers

3.4.1. Study characteristics

The initial search yielded 546 citations. After full-text review of 56 papers evaluating commercial IGRAs in health care workers (HCWs), 48 were deemed to have met the eligibility criteria. Of these, only five (12%) were done in low- and middle-income settings.

Studies varied greatly in design, execution, and reported outcomes. IGRA performance varied greatly across populations; therefore, results were also stratified by TB incidence (>100 estimated incident TB cases/ 100,000 population; <= 100/100,000 as reported to WHO) in the countries where the studies were done. Due to the variety of study designs and HCW screening guidelines, study populations included HCWs with widely differing risks of TB exposure.

3.4.2. Summary of results

Results indicated that in low- and middle-income countries:

  • Prevalence of LTBI in HCWs depended on the test used and the particular TB incidence setting. Two cross-sectional studies comparing IGRA and TST positivity rates in HCWs showed high TST positivity rates (40% to 66%) and slightly lower rate for IGRA positivity (statistically significant in only one study, which also showed the lowest rate of BCG vaccination among participants);
  • Both the TST and IGRAs appeared to be associated with markers of TB exposure, but the magnitude of associations varied greatly; TST performance was adversely affected by BCG vaccination while IGRA performance seemed to be unaffected;
  • Both IGRAs and the TST had suboptimal sensitivity and discordant results were common. IFN-γ g responses seemed to have natural variation and tended to fluctuate around the cut-off, causing apparent IGRA conversions and reversions. The exact cause of the conversions and reversions remained unclear, and might indicate spontaneous clearance of TB infection, or dynamic changes within the spectrum of latent TB infection;
  • The use of IGRAs for serial testing was complicated by lack of data on optimum cut-offs for serial testing, and unclear interpretation and prognosis of conversions and reversions;
  • Conversion rates were highest when a simple negative to positive change was used to define a conversion. This was true in both high and low incidence settings and had implications for deciding on criteria (cut-offs) for conversions and reversions;
  • There were no data to show that IGRAs performed better at identifying incidence of new TB infections among HCWs than the TST, irrespective of HIV status.

3.4.3. Strengths and limitations of the evidence base

The systematic review used a comprehensive search strategy using multiple sources and databases to retrieve relevant studies, including unpublished studies and conference proceedings. Only two studies in low- or middle-income countries were identified. Serial testing data, evidence on the predictive value of IGRAs in HCWs, as well as reproducibility data were seriously limited.

3.4.4. Grade evidence profiles and final policy recommendations

The GRADE evidence profiles are provided in Tables 9 and 10. Based on these assessments, the Expert Group concluded that the quality of evidence for use of IGRAS for screening in health care workers in low- and middle-income countries was very low and recommended that these tests should not be used in health care worker screening programmes in these countries (strong recommendation). The Expert Group also noted the lack of WHO policy on using the TST in health care worker screening programmes.

The systematic review results have subsequently been published.4

Policy recommendation: IGRAs should not be used in health care worker screening programmes in low- and middle-income countries (strong recommendation).

3.5. Use of IGRAs in contact screening and outbreak investigations

3.5.1. Study characteristics

The initial search yielded 608 citations. After full-text review of 99 papers evaluating commercial IGRAs in screening of contacts and outbreak investigations, 65 studies conducted in high-income countries were excluded, as were 18 studies using pre-commercial and in-house IGRAs. 16 studies were deemed to have met the eligibility criteria.

Most studies were small (39-301 participants); however, the inclusion of one unpublished study doubled the total sample size (2,211 study participants). All studies included BCG vaccinated participants. HIV status was frequently unreported, but when it was documented, rates were low (0-1.5%) with the exception of the large unpublished study where the reported HIV infection rate was around 38% in the adult study population, and one study reporting an HIV infection rate of 5% in the paediatric study population.

Only one study did not include household contacts but evaluated HCWs exposed to a smear-positive TB case. The remaining 15 studies all included household contacts, while three studies also included school or work contacts. Nine (56%) of the studies exclusively examined child contacts, three studies included both child and adult contacts, and four studies exclusively included adult contacts. Most studies involved only contacts of confirmed active TB cases; however, five studies recruited a comparison group with no known TB exposure.

Studies varied in quality, with several quality indicators frequently unreported. For example, only three of 14 studies reported that study personnel were blinded to other test results or TB exposure when performing and interpreting test results.

3.5.2. Summary of results

Results indicated that in low- and middle-income countries:

  • The prevalence of positive tests varied greatly between studies and across assays. Prevalence of positive TST results ranged from 22% in children less than 5yrs to 84% in adult HCWs exposed to a smear-positive TB case. Prevalence of positive IGRA results ranged from 10% to 75% respectively. The majority of studies showed comparable LTBI prevalence by TST or IGRA in contacts;
  • The most commonly observed discordance was of the TST-positive/IGRA-negative type;
  • Both IGRAs and the TST seemed to show positive associations with higher levels of exposure in cross-sectional studies, but the strength of the association (effect) varied across studies;
  • IGRAs appeared to be dynamic assays with frequent conversions and reversions;
  • Both IGRAs and TST seemed to have similar and modest predictive value.
  • Five of 15 studies reported industry involvement, most frequently the donation of IGRA test kits. One study reported one of its authors having been a paid consultant of the manufacturer of the IGRA assay evaluated.

3.5.3. Strengths and limitations of the evidence base

Due to significant heterogeneity in study designs and outcomes assessed in each study, it was not appropriate to pool the data. The majority of studies were cross-sectional and looked at concordance between TST and IGRAs. Studies that assessed associations between exposure and test positivity used different categorisation of exposure variables, making it difficult to compare results across studies.

3.5.4. Grade evidence profiles and final policy recommendations

The GRADE evidence profiles are provided in Table 11. Based on these assessments, the Expert Group concluded that the quality of evidence for use of IGRAS for LTBI screening in contact and outbreak investigations was very low and recommended that these tests should not be used in low- and middle-income countries as a replacement for TST, neither in adults nor children investigated as close contacts of patients with confirmed active TB (strong recommendation).

Policy recommendation: IGRAs should not replace the TST in low- and middle-income countries for the screening of latent TB infection in adult and paediatric contacts, or in outbreak investigations (strong recommendation).

3.6. The predictive value of IGRAs for incident active TB

3.6.1. Study characteristics

The initial search yielded 722 citations. After full-text review of 14 papers evaluating the predictive value of commercial IGRAs for active TB, 8 studies conducted in high-income countries were excluded, as were three studies using in-house IGRAs.

Three studies were deemed to have met the eligibility criteria. The at-risk populations included in the three studies were all different (older males with confirmed silicosis, school-going adolescents, and adult TB contacts including HIV-infected individuals). Included studies vary in quality, particularly with regard to comparability (adjustments made to effect measures) and outcome (ascertainment of incident TB, losses to follow-up, and reporting of incidence rates vs. cumulative incidence), leading to possible verification bias. One study incorporated IGRA results in their reference standard for TB, leading to incorporation bias.

3.6.2. Summary of results

Results indicated that in low- and middle-income countries:

  • The vast majority of individuals (>95%) with a positive IGRA results did not progress to active TB disease during follow-up, although a modest but statistically insignificant increase in incidence rates of TB in IGRA- positives compared to IGRA-negatives had been observed;
  • IGRA sensitivity for incident TB ranged from 75% to 88% (95% CI 46% - 99% depending on the country/study population), while IGRA specificity ranged from 35% to 51% (95% CI 30% - 54% depending on the country/study population). TST sensitivity for incident TB was similar, ranging from 73% to 76% (95% CI 50% to 93% depending on the country/study population). Specificity was equally low, ranging from 35% to 58% (95% CI 29% - 58% depending on the country/study population). One study reported lower TST sensitivity and higher specificity but acknowledged that logistical issues at the clinical sites could have affected the TST results;
  • Both IGRAs and the TST appeared to have only modest predictive value and did not help to identify those who are at highest risk of progression to TB disease. Patient relevant outcomes based on sensitivity and specificity appeared comparable between the two tests.

3.6.3. Grade evidence profiles and final policy recommendations

The GRADE evidence profiles are provided in Table 12 and 13. Based on these assessments, the Expert Group concluded that the quality of evidence for the predictive value of IGRAs was very low and recommended that these assays should not be used in low- and middle-income countries to identify individuals at risk of active TB disease(strong recommendation).

The systematic review results have subsequently been published.5

Policy recommendation: Neither IGRAs nor the TST should be used in low- and middle-income countries for the identification of individuals at risk of developing active TB (strong recommendation).