U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Danan ER, Diem S, Sowerby C, et al. Genitourinary Syndrome of Menopause: A Systematic Review [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2024 Jul. (Comparative Effectiveness Review, No. 272.)

Cover of Genitourinary Syndrome of Menopause: A Systematic Review

Genitourinary Syndrome of Menopause: A Systematic Review [Internet].

Show details

2Methods

2.1. Review Approach

For all Key Questions (KQs), the systematic review (SR) followed Evidence-based Practice Center (EPC) program methodology, as laid out in the EPC Methods Guide. We registered the protocol for this SR in PROSPERO (registration number CRD42023400684).

2.2. Key Questions

After discussion with Key Informants and our team’s content and methods experts, we chose to interpret the term “screening” in KQ1 as identifying underreported, symptom-based conditions (similar to screening for anxiety and depression), rather than “screening” for an asymptomatic condition. Based on input from public commenters, Key Informants, and members of a Technical Expert Panel, we drafted the following KQs.

KQ1.

What are the effectiveness and harms of screening strategies to identify genitourinary syndrome of menopause (GSM) in postmenopausal women? Does screening impact patient reported symptoms or improve quality of life (QoL)?

KQ2.

What are the effectiveness and comparative effectiveness of hormonal, non-hormonal, and energy-based interventions when used alone or in combination for treatment of GSM symptoms? Which treatments show improvement for which symptoms?

KQ3.

What are the harms (and comparative harms) of hormonal, non-hormonal, and energy-based interventions for GSM symptoms?

KQ4.

What is the appropriate followup interval to assess improvement, sustained improvement, or regression of symptoms of GSM in women treated with hormonal, non-hormonal, and energy-based interventions?

KQ5.

What are the effectiveness, comparative effectiveness, and harms of endometrial surveillance among women who have a uterus and are using hormonal therapy for GSM?

For all KQs, how do the findings vary for women with a history of breast cancer or other hormone-related cancers, a high risk of cancer, or conditions such as primary ovarian insufficiency, women experiencing surgical menopause, gender diverse individuals, and within subgroups defined by severity of GSM symptoms, and patient characteristics (i.e., by age, race, socioeconomic status, etc.)?

2.3. Analytic Framework

Based on discussions with Key Informants and Technical Expert Panel members, we developed an analytic framework for the KQs (Figure 1).

Figure 1 is an Analytic Framework that depicts Key Questions 1-5 within the context of the review’s Population, Interventions, and Outcomes of interest. The figure illustrates how screening or case-finding (KQ1) may identify patients with GSM, who may then be treated with hormonal, non-hormonal, or energy-based interventions. These interventions may result in intermediate outcomes such as change in genitourinary, vulvovaginal, or sexual symptoms (KQ2) and/or patient-centered outcomes (e.g., change in psychological symptoms or QoL). Also, adverse effects (AEs) may occur at any point after patients are screened (KQ3). The prevalence and magnitude of patient-centered outcomes may vary based on time of follow-up (KQ4). The figure also includes the overarching question about sub-groups such as women with a history of breast cancer or hormone-related cancers, a high risk of cancer, or conditions such as primary ovarian insufficiency; women experiencing surgical menopause, gender diverse individuals, or groups defined by severity of GSM symptoms or patient characteristics (i.e., by age, race, socioeconomic status, etc.).

Figure 1

Analytic framework for genitourinary syndrome of menopause. Abbreviations: GSM=Genitourinary Syndrome of Menopause; KQ=Key Question

This figure depicts the Key Questions within the context of the PICOTS (Population, Intervention, Comparator, Outcomes, Timing, and Study design/setting) described above. In general, the figure illustrates how screening or case-finding may identify patients with GSM, who may then be treated with hormonal, non-hormonal, or energy-based interventions. These interventions may result in intermediate outcomes such as change in genitourinary, vulvovaginal, or sexual symptoms and/or patient-centered outcomes such as change in psychological symptoms or QoL. Also, adverse effects (AEs) may occur at any point after patients are screened.

2.4. Study Selection

We searched for published studies for all KQs in MEDLINE®, Embase®, and CINAHL® from database inception through December 11, 2023 (Appendix A). We included vocabulary and natural language terms, along with free-text words, relevant to the KQ. We supplemented our bibliographic database searches with citation searching of relevant systematic reviews and original research. All searches were independently peer reviewed.

After we removed duplicates, we uploaded citations into DistillerSR, a programmable online citation and article screening tool for SRs.43 Using prespecified inclusion and exclusion criteria (Table 1), titles and abstracts were initially screened by two independent reviewers for potential relevance to the Key Questions. Articles included by either reviewer underwent full-text screening. We screened abstracts with the assistance of DistillerSR’s Artificial Intelligence System (DAISY) until the DAISY-predicted score for likelihood of inclusion was less than 0.1 percent and the inclusion rate had fallen to less than 5 percent. The remaining abstracts (~2000) with an inclusion score less than 0.1 percent were not screened by a second reviewer, but a word search of the titles and abstracts was completed to ensure that any relevant articles were not missed. At the full-text screening stage, two independent reviewers agreed on the final inclusion or exclusion decision. Articles that met eligibility criteria were included for risk of bias (RoB) assessment. See Appendix B for references excluded at the full-text screening stage.

After initial study selection, citations were grouped by type of intervention: hormonal, non-hormonal, or energy-based, based on the primary intervention studied (trials of combined interventions were assigned to a single category for organizational purposes but described separately). Non-hormonal interventions were heterogeneous and we created an evidence map with limited data extraction. Because of the potential for long-term effects of energy-based treatments and the shorter duration of clinical experience with these relatively new interventions, we included uncontrolled observational studies of energy-based interventions that reported long-term harms (See Section 2.7, Data Synthesis).

Study selection criteria for each KQ are listed in Table 1 below. Minimum followup duration of 8 weeks was selected to allow time for outcomes to change. Study size criteria was set at 20 participants per study group based on expert consensus that it is difficult to achieve balance from randomization with smaller trials.

Table Icon

Table 1

Study eligibility criteria.

2.5. Assessment of Risk of Bias

We did not assess RoB for studies included in the non-hormonal interventions evidence map. For remaining studies, we evaluated each study for RoB using the Cochrane Risk of Bias Tool 2.0 (RoB-2)45 for randomized controlled trials (RCTs) and the Risk of Bias in non-Randomized Studies - of Interventions (ROBINS-I) for observational studies.46 Components of the RoB-2 include participant group assignment (random sequence generation, allocation concealment), blinding (performance and detection bias), completeness of followup (attrition bias), analyses and outcome reporting consistent with predefined protocols (selective reporting bias), and other issues (such as appropriateness of analytic approach). RCT RoB was assessed for each domain above, and assigned a summary RoB rating for each study as low, some concerns, or high.

Components of the ROBINS-I include assessing bias due to confounding, classification of interventions, selection of participants (into the study, or into the analysis), deviations from intended interventions, missing data, measurement of the outcome and selection of the reported results. Observational studies RoB was assessed for each domain above, and assigned a summary RoB rating for each study as low, moderate, serious, or critical.

One investigator assessed RoB and a second reviewed; discrepancies were reconciled via consensus.

2.6. Data Extraction and Data Management

We extracted data into DistillerSR43 and present detailed evidence tables in Appendix C. For trials that were rated low or some concerns RoB with the RoB-2 tool, or observational studies rated low or moderate RoB using the ROBINS-I tool, data elements extracted included author, year, trial registration number, study funding, setting, subject inclusion and exclusion criteria, intervention and control characteristics, sample size, followup duration, participant baseline age, race, and results of primary outcomes and adverse effects. Data were extracted by one reviewer and verified for accuracy by a second reviewer. When data were only presented in figures, point estimates were visually estimated from the figure to be entered into the meta-analysis. Instances where the point estimate needed to conduct the meta-analysis was not reported were calculated with the use of available point estimates following Cochrane guidance.47 We followed guidance to calculate mean differences as well as standard deviations. In studies that provided insufficient data to calculate a standard deviation, imputation was employed, again following Cochrane guidance.

For trials that were rated high RoB with the RoB-2 tool, or observational studies rated serious or critical with the ROBINS-I tool, we extracted limited study characteristics data, including author, year, trial registration number, study funding, setting, subject inclusion and exclusion criteria, intervention and control characteristics, sample size, followup duration and outcomes reported. Consistent with standard practice for systematic reviews, we did not extract detailed results data for these studies, except for harms reported in long-term energy-based therapy studies. No synthesis or sensitivity analysis was performed using high RoB studies.

For studies included in the non-hormonal evidence map, we extracted limited study characteristics data, including author, year, trial registration number, study funding, setting, intervention and control characteristics, sample size, followup duration and outcomes reported. We did not extract detailed results data for these studies.

2.7. Data Synthesis

After initial study selection, citations were grouped by type of intervention (hormonal, non-hormonal, or energy based), then organized by specific treatment-outcome comparisons. Each class of intervention required a different synthesis strategy as detailed below.

2.7.1. Hormonal Interventions

Hormonal interventions included vaginal estrogen therapy (including vaginal cream, tablets, inserts, or ring), vaginal or systemic dehydroepiandrosterone (DHEA), oxytocin vaginal gel, selective estrogen receptor modulators (SERMs), and vaginal or systemic testosterone. In comparisons of vaginal estrogen to placebo, all formulations and doses were grouped together.48 Vaginal estrogen formulations and doses were evaluated separately in trials designed for those comparisons. Systemic estrogen therapy was not included as an intervention of interest. For SERMs, we evaluated ospemifene separately from raloxifene and bazedoxifene because of its unique estrogen receptor agonist activity in vaginal tissue. For DHEA and testosterone, we included both vaginal and systemic formulations, but evaluated efficacy and safety separately for vaginal and systemic formulations.

For studies with low or some concerns RoB, we synthesized evidence for each unique comparison with meta-analysis, when possible and appropriate. In cases where we found too few studies to calculate a pooled estimate (i.e., at least 3 studies of the same intervention/comparison with the same outcome measure/assessment), we provide a narrative summary.49 We assessed the clinical and methodological heterogeneity and variation in treatment effect size to determine the appropriateness of pooling data.49 If pooling was possible, we planned to synthesize data using the metacont function using a random effects model in R (a language and environment for statistical computing, https://www.R-project.org/). We used the Knapp-Hartung adjustment, Hedges’ g, and the restricted maximum likelihood estimator for τ2 when calculating the standardized mean differences (SMDs).50 We planned to calculate SMDs with the corresponding 95 percent confidence intervals (CIs) for continuous outcomes when combining similar outcomes measured with different instruments.

We identified heterogeneity (inconsistency) of treatment effects on outcomes through visual inspection of the forest plots to assess the amount of overlap of CIs, 95% prediction intervals, Ƭ2, and the I2 statistic to assess the impact of heterogeneity on the meta-analysis.51 The I2 statistic as interpreted as follows:52

  • 0% to 40%: heterogeneity across studies may not be important
  • 30% to 60%: may indicate moderate heterogeneity
  • 50% to 90%: may indicate substantial heterogeneity
  • 75% to 100%: considerable heterogeneity

When heterogeneity was identified, we examined individual study and subgroup characteristics to better understand the possible contributing sources. When heterogeneity exceeded 75 percent, we did not calculate a pooled estimate.

2.7.2. Non-Hormonal Interventions

Non-hormonal interventions included over-the-counter non-hormonal vaginal lubricants and moisturizers, hyaluronic acid, herbal therapies/supplemental alternatives, phytoestrogens, vitamin D, vitamin E, probiotics, mind and body practices, educational interventions, non-hormonal pharmaceuticals, and pelvic floor physical therapy to treat vaginal or sexual symptoms of GSM. Phytoestrogens have variable agonist and antagonist activity on hormone receptors53 and were grouped with non-hormonal interventions to be consistent with prior literature.54, 55

Vaginal moisturizers are considered a first-line therapy for GSM, so were included for full RoB assessment and data extraction. We distinguished between vaginal lubricants and moisturizers based on intended use: lubricants are primarily used for short-term relief during sexual activity, whereas moisturizers are applied regularly and are intended to mimic the natural secretions in an estrogenized vagina.56 Many studies used a water-based lubricant as a placebo or control treatment, often applied regularly to match the treatment schedule for the active intervention; lubricants were not evaluated as an independent intervention. For comparisons of moisturizers to placebo, all moisturizers, including hyaluronic acid-based moisturizers, were grouped together.

We took an evidence map approach to synthesizing the remaining non-hormonal intervention studies.57, 58 We organized studies by the type of intervention, according to the National Center for Complementary and Integrative Health (NCCIH) framework,59 with narrative summaries of the body of evidence provided, including population and study characteristics, intervention(s), and outcomes reported.

2.7.3. Energy-Based Interventions

Energy-based interventions included carbon dioxide (CO2) laser, erbium-doped yttrium aluminum garnet (Er:YAG) laser, and radiofrequency. The most commonly used lasers for GSM include the fractional microablative CO2 laser and the nonablative Er:YAG laser.60 We included any treatment protocol and grouped findings by laser type and comparison. For prospective studies of low or some concerns RoB, we synthesized evidence for each unique treatment-outcome comparison with meta-analysis, when possible and appropriate (i.e., at least three studies of the same intervention/comparator with similar outcome assessment) using the methods described above for hormonal interventions. We narratively summarized outcomes not suitable for meta-analysis. For studies without a comparison group, we narratively summarized AEs reported at a minimum of 1-year post-intervention.

2.8. Grading the Strength of Evidence

The American Urological Association (AUA) intends to use this evidence report to develop guidelines on the topic of GSM. Because this organization uses Grading of Recommendations Assessment, Development and Evaluation (GRADE)61 for rating evidence certainty, we used the GRADE framework to assess the overall certainty of evidence (COE).

We present the overall certainty of the evidence for the eight outcomes identified in the Core Outcomes in Menopause (COMMA) review (these include (1) pain with sex, (2) vulvovaginal dryness, (3) vulvovaginal discomfort/irritation, (4) dysuria, (5) change in MBS, (6) distress, bother, or interference of genitourinary symptoms (i.e., QoL), (7) satisfaction with treatment, and (8) side effects of treatment).44 Remaining outcomes from Table 1 are described in the text without a summary COE statement.

We considered measures of vulvovaginal lubrication together with measures of vulvovaginal dryness. For “change in MBS,” we included studies in which most bothersome symptom (MBS) was used as a global scale. In a global MBS assessment, patients review a list of several GSM symptoms, often including vulvovaginal dryness, dyspareunia, and irritation or itching, and rate the severity of each symptom on a 4-point scale (none, mild, moderate, severe). Patients are asked to select which symptom is most bothersome at baseline and then followed for change in severity of that symptom after treatment. However, some studies used the MBS 4-point severity scale in different ways: either to restrict the study population to only those patients for whom dryness or dyspareunia was the MBS, for example, or to measure the change in severity of a single symptom for all patients in the study, regardless of whether that symptom was the patient’s MBS. In these instances, we used the change in severity outcomes to describe the specific symptoms being reported, not as a global measure of “change in MBS.” We interpreted “distress, bother, or interference of genitourinary symptoms” to mean the impact that GSM symptoms had on QoL. For side effects of treatment, we described serious, any, and common adverse effects, discontinuation due to adverse effects, as well as endometrial safety, and provide GRADE statements for the most serious effects reported.

The GRADE approach assesses five criteria which measure either internal validity (RoB, inconsistency, imprecision, publication bias) or external validity (directness of results).61 Briefly, for each prioritized outcome, we evaluated characteristics of the evidence across 5 domains: study limitations (RoB), imprecision (number of events, sample size, and precision of effect estimates reported by included studies),62 inconsistency (whether the direction and magnitude of effects are similar [or different] across the included studies), indirectness (how applicable the results were to our Key Questions), and publication bias (preferential reporting of positive results). The overall COE takes into consideration individual ratings in each of these 5 domains, but domains may not be weighted equally in determining the overall rating. If a study reported multiple measures for a single outcome, we described all results, but selected the highest quality or most commonly used measure for GRADE assessment (e.g., we selected a validated scale over a single-item measure)

For each treatment-outcome comparison, one reviewer rated the collective COE as high, moderate, low, or very low using GRADEpro GDT (www.gradepro.org). We then discussed those ratings as a team and came to a team consensus on overall COE ratings.

For each intervention, we present a summary of the evidence for the main outcomes in a summary of findings table, which provides key information about the direction of effect reported by included studies for each relevant comparison of alternative management strategies; numbers of participants and studies addressing each important outcome; and the rating of the overall confidence in summary statements for each outcome.63

We derived COE based on statistical rather than clinical significance (non-contextualized approach) in part because validated measures of clinical significance were not available. We assessed magnitude of effect by reporting differences between intervention and control.

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (7.8M)

Other titles in this collection

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...