NCBI Logo
GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
          Go
Sample GSM2431943 Query DataSets for GSM2431943
Status Public on Dec 20, 2016
Title DLPFC_Male_Control_p_UMichigan_102579
Sample type RNA
 
Source name DLPFC_Male_Control_p_UMichigan
Organism Homo sapiens
Characteristics cohort: Dep Cohort 6
site of processing: U_Michigan
diagnosis: Control
subject id: 102579
agonal factor: 0
tissue ph (cerebellum): 6.53
gender: M
race: Caucasian
age: 52
post-mortem interval: 18.8
suicide (1=yes): 0
tissue: Dorsolateral Prefrontal Cortex
qc_batch: 13
Extracted molecule total RNA
Extraction protocol Coronal slices of the brain were rapidly frozen on pre-cooled (to −120°C) aluminum plates, and stored at −80°C. Dorsolateral prefrontal cortex samples (area 9 plus 46) were trimmed to include an approximately equal ratio of gray and white matter, the latter being restricted to a region approximately the same thickness as layer VI of the cortex. Samples were taken from the left side. Following tissue dissection, total RNA was isolated by using TRIzol reagents (Invitrogen, Carlsbad, CA, USA), and shipped to the three research groups that collaborate on this project (University of California, Irvine; University of California, Davis; University of Michigan). For details, see (Evans 2003 Neurobiol Dis 14:240-250, Vawter 2003 Neuropsychopharm 29: 373-384, Li 2004 Biol Psychiatry 55:346-352, Li 2013 PNAS 110: 9950-9955).
Label biotin
Label protocol Sample labeling and hybridization followed the exact Affymetrix procedures.
 
Hybridization protocol Sample labeling and hybridization followed the exact Affymetrix procedures.
Scan protocol standard Affymetrix protocol
Data processing A fully-cited description of the preprocessing can be found in the supplementary section of Li 2013 PNAS 110: 9950-9955: Microarray experiments were performed in separate experimental cohorts, ranging from five to eight cohorts depending on the brain region. Each cohort contained a mixture of cases and controls, with most RNA samples analyzed in duplicate at two laboratories (some analyzed in three laboratories). The generation of probe-level intensity data (i.e., the CEL files) relied on standard Affymetrix library files and further processed using a custom annotation file (see below). While most cohorts were analyzed on Affymetrix U133A platform, several of the latest cohorts were analyzed on the newer, U133Plus-v2 platform, which contains all U133A probe sets as a subset. We extracted the U133A subset of the data for these samples and combined them with data for those analyzed on the U133A platform. We applied RMA (Robust Multi-array Analysis) to summarize probe set expression levels, using the web interface at http://arrayanalysis.mbni.med.umich.edu. RMA output, in the form of logged (base 2) expression levels, was generated using the custom ENTREZ12.1 Chip Definition Files (CDF), which defined probe sets for 11,912 ENTREZ transcripts and 68 control probe sets (http://nmg-r.bioinformatics.nl/Packages_for_R2.12.html). The reason for using our custom-defined CDF files rather than the probe annotation provided by Affymetrix was to re-map all probes to the latest human genome build available, and to annotate probes according to one of the most detailed gene models. The RMA results in this study thus represented 11,912 transcripts defined by ENTREZ in 03/2010 and are covered by probes on the U133A microarrays. All downstream analyses were performed in R, using contributed packages available in early 2010. To gain an overview of sample heterogeneity, we calculated sample-sample similarities for each region using pairwise Pearson's correlation coefficients (r), and calculated the average r of each sample to all other samples of the same region. We chose the threshold of average r = 0.85 to define and remove outlier microarrays. The outliers could result from either technical or biological differences. We removed additional microarrays corresponding to data produced at one laboratory for one cohort due to low average r and poor match with the duplicate microarrays from the second laboratory (denoted “batch 15” in the GEO metadata). Although the RMA method has normalized probe intensity distributions across microarrays, the resulting probe set summaries still showed between-cohort, between microarray type (U133A versus U133 Plus v2), and between-laboratory variations, thus requiring further normalization. For each brain region, we quantile-normalized the probe set values, and used pairwise correlation coefficients to define recognizable batches, which usually coincides with naturally occurring sample groups (>15 samples) according to cohorts or chip types. The 68 negative control probe sets on the microarray platform, representing spiked-in non-human transcripts, show nearly identical batch effects as using all probes (not shown), indicating that most of the batch variation is due to technical differences in reagents and instruments rather than due to biological differences between samples in different batches. To adjust for batch effects, we median-centered the expression levels of each transcript within each batch, and confirmed, by using the correlation matrices, that the batch-effects were removed after the adjustment. We compared the result of this simple correction with the alternative, Bayesian batch-correction approach implemented in combat, and did not see meaningful differences in performance in terms of duplicate-sample concordance (results not shown). While this is contrary to the published comparison results showing that combat is a better algorithm for dealing with batch effects, its advantage is probably blunted in our dataset because (1) we have larger sample sizes per batch (typically >15) than what was tested in the published comparisons, and we used median centering rather than mean-centering. The latter is susceptible to the influence of outlier values, yet was used in earlier comparisons with combat. We note that combat has decreased the scale of variation for most transcripts (as a consequence of improving the group variance estimation) and resulted in under-reporting of fold-changes between sample groups. We therefore opted to maintain the use of the median-centering approach in this study. After per-batch median-centering, we quantile-normalized the resulting values, and averaged the replicate microarrays for the same samples, yielding a dataset for unique subjects for each region.
Microarray experiments were performed in separate experimental cohorts, ranging from five to eight cohorts depending on the brain region. Each cohort contained a mixture of cases and controls, with most RNA samples analyzed in duplicate at two laboratories (some analyzed in three laboratories). The generation of probe-level intensity data (i.e., the CEL files) relied on standard Affymetrix library files and further processed using a custom annotation file (see below).
While most cohorts were analyzed on Affymetrix U133A platform, several of the latest cohorts were analyzed on the newer, U133Plus-v2 platform, which contains all U133A probe sets as a subset. We extracted the U133A subset of the data for these samples and combined them with data for those analyzed on the U133A platform. We applied RMA (Robust Multi-array Analysis) to summarize probe set expression levels, using the web interface at http://arrayanalysis.mbni.med.umich.edu. RMA output, in the form of logged (base 2) expression levels, was generated using the custom ENTREZ12.1 Chip Definition Files (CDF), which defined probe sets for 11,912 ENTREZ transcripts and 68 control probe sets (http://nmg-r.bioinformatics.nl/Packages_for_R2.12.html). The reason for using our custom-defined CDF files rather than the probe annotation provided by Affymetrix was to re-map all probes to the latest human genome build available, and to annotate probes according to one of the most detailed gene models. The RMA results in this study thus represented 11,912 transcripts defined by ENTREZ in 03/2010 and are covered by probes on the U133A microarrays. All downstream analyses were performed in R, using contributed packages available in early 2010.
To gain an overview of sample heterogeneity, we calculated sample-sample similarities for each region using pairwise Pearson's correlation coefficients (r), and calculated the average r of each sample to all other samples of the same region. We chose the threshold of average r = 0.85 to define and remove outlier microarrays. The outliers could result from either technical or biological differences. We removed additional microarrays corresponding to data produced at one laboratory for one cohort due to low average r and poor match with the duplicate microarrays from the second laboratory (denoted “batch 15” in the GEO metadata).
Although the RMA method has normalized probe intensity distributions across microarrays, the resulting probe set summaries still showed between-cohort, between microarray type (U133A versus U133 Plus v2), and between-laboratory variations, thus requiring further normalization. For each brain region, we quantile-normalized the probe set values, and used pairwise correlation coefficients to define recognizable batches, which usually coincides with naturally occurring sample groups (>15 samples) according to cohorts or chip types. The 68 negative control probe sets on the microarray platform, representing spiked-in non-human transcripts, show nearly identical batch effects as using all probes (not shown), indicating that most of the batch variation is due to technical differences in reagents
 
Submission date Dec 19, 2016
Last update date Dec 20, 2016
Contact name Megan Hastings Hagenauer
E-mail(s) [email protected]
Organization name University of Michigan
Department MBNI
Lab Dr. Huda Akil & Dr. Stanley Watson
Street address 205 Zina Pitcher Pl.
City Ann Arbor
State/province MI
ZIP/Postal code 48109
Country USA
 
Platform ID GPL10526
Series (1)
GSE92538 Inference of cell-type composition from human brain transcriptomic datasets illuminates the effects of age, manner of death, dissection, and psychiatric diagnosis

Data table header descriptions
ID_REF
VALUE Log(2) transformed signal, following RMA normalization

Data table
ID_REF VALUE
10000_at 5.14037149
10001_at 5.86803069
10002_at 5.060496923
10003_at 4.571293169
100048912_at 5.884789373
10004_at 4.836605668
10005_at 7.02209635
10006_at 8.167824135
10007_at 8.019019407
10009_at 4.226838237
1000_at 7.736215452
10010_at 5.404702432
100127886_at 4.061198165
100127972_at 4.854142472
100128008_at 6.59550164
100128062_at 9.030286633
100128414_at 4.221388173
100128919_at 6.002526133
100129015_at 4.484180023
100129128_at 4.399775283

Total number of rows: 11973

Table truncated, full table size 238 Kbytes.




Supplementary file Size Download File type/resource
GSM2431943_M_133P_102579_DLPFC_1_C.cel.gz 8.6 Mb (ftp)(http) CEL
Processed data included within Sample table

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap