cohort: Schiz Cohort 1 site of processing: UC_Davis diagnosis: Control subject id: 104126 agonal factor: 0 tissue ph (cerebellum): 6.79 gender: M race: Caucasian age: 48 post-mortem interval: 20.2 suicide (1=yes): 0 tissue: Dorsolateral Prefrontal Cortex qc_batch: 6
Extracted molecule
total RNA
Extraction protocol
Coronal slices of the brain were rapidly frozen on pre-cooled (to −120°C) aluminum plates, and stored at −80°C. Dorsolateral prefrontal cortex samples (area 9 plus 46) were trimmed to include an approximately equal ratio of gray and white matter, the latter being restricted to a region approximately the same thickness as layer VI of the cortex. Samples were taken from the left side. Following tissue dissection, total RNA was isolated by using TRIzol reagents (Invitrogen, Carlsbad, CA, USA), and shipped to the three research groups that collaborate on this project (University of California, Irvine; University of California, Davis; University of Michigan). For details, see (Evans 2003 Neurobiol Dis 14:240-250, Vawter 2003 Neuropsychopharm 29: 373-384, Li 2004 Biol Psychiatry 55:346-352, Li 2013 PNAS 110: 9950-9955).
Label
biotin
Label protocol
Sample labeling and hybridization followed the exact Affymetrix procedures.
Hybridization protocol
Sample labeling and hybridization followed the exact Affymetrix procedures.
Scan protocol
standard Affymetrix protocol
Data processing
A fully-cited description of the preprocessing can be found in the supplementary section of Li 2013 PNAS 110: 9950-9955: Microarray experiments were performed in separate experimental cohorts, ranging from five to eight cohorts depending on the brain region. Each cohort contained a mixture of cases and controls, with most RNA samples analyzed in duplicate at two laboratories (some analyzed in three laboratories). The generation of probe-level intensity data (i.e., the CEL files) relied on standard Affymetrix library files and further processed using a custom annotation file (see below). While most cohorts were analyzed on Affymetrix U133A platform, several of the latest cohorts were analyzed on the newer, U133Plus-v2 platform, which contains all U133A probe sets as a subset. We extracted the U133A subset of the data for these samples and combined them with data for those analyzed on the U133A platform. We applied RMA (Robust Multi-array Analysis) to summarize probe set expression levels, using the web interface at http://arrayanalysis.mbni.med.umich.edu. RMA output, in the form of logged (base 2) expression levels, was generated using the custom ENTREZ12.1 Chip Definition Files (CDF), which defined probe sets for 11,912 ENTREZ transcripts and 68 control probe sets (http://nmg-r.bioinformatics.nl/Packages_for_R2.12.html). The reason for using our custom-defined CDF files rather than the probe annotation provided by Affymetrix was to re-map all probes to the latest human genome build available, and to annotate probes according to one of the most detailed gene models. The RMA results in this study thus represented 11,912 transcripts defined by ENTREZ in 03/2010 and are covered by probes on the U133A microarrays. All downstream analyses were performed in R, using contributed packages available in early 2010. To gain an overview of sample heterogeneity, we calculated sample-sample similarities for each region using pairwise Pearson's correlation coefficients (r), and calculated the average r of each sample to all other samples of the same region. We chose the threshold of average r = 0.85 to define and remove outlier microarrays. The outliers could result from either technical or biological differences. We removed additional microarrays corresponding to data produced at one laboratory for one cohort due to low average r and poor match with the duplicate microarrays from the second laboratory (denoted “batch 15” in the GEO metadata). Although the RMA method has normalized probe intensity distributions across microarrays, the resulting probe set summaries still showed between-cohort, between microarray type (U133A versus U133 Plus v2), and between-laboratory variations, thus requiring further normalization. For each brain region, we quantile-normalized the probe set values, and used pairwise correlation coefficients to define recognizable batches, which usually coincides with naturally occurring sample groups (>15 samples) according to cohorts or chip types. The 68 negative control probe sets on the microarray platform, representing spiked-in non-human transcripts, show nearly identical batch effects as using all probes (not shown), indicating that most of the batch variation is due to technical differences in reagents and instruments rather than due to biological differences between samples in different batches. To adjust for batch effects, we median-centered the expression levels of each transcript within each batch, and confirmed, by using the correlation matrices, that the batch-effects were removed after the adjustment. We compared the result of this simple correction with the alternative, Bayesian batch-correction approach implemented in combat, and did not see meaningful differences in performance in terms of duplicate-sample concordance (results not shown). While this is contrary to the published comparison results showing that combat is a better algorithm for dealing with batch effects, its advantage is probably blunted in our dataset because (1) we have larger sample sizes per batch (typically >15) than what was tested in the published comparisons, and we used median centering rather than mean-centering. The latter is susceptible to the influence of outlier values, yet was used in earlier comparisons with combat. We note that combat has decreased the scale of variation for most transcripts (as a consequence of improving the group variance estimation) and resulted in under-reporting of fold-changes between sample groups. We therefore opted to maintain the use of the median-centering approach in this study. After per-batch median-centering, we quantile-normalized the resulting values, and averaged the replicate microarrays for the same samples, yielding a dataset for unique subjects for each region. Microarray experiments were performed in separate experimental cohorts, ranging from five to eight cohorts depending on the brain region. Each cohort contained a mixture of cases and controls, with most RNA samples analyzed in duplicate at two laboratories (some analyzed in three laboratories). The generation of probe-level intensity data (i.e., the CEL files) relied on standard Affymetrix library files and further processed using a custom annotation file (see below). While most cohorts were analyzed on Affymetrix U133A platform, several of the latest cohorts were analyzed on the newer, U133Plus-v2 platform, which contains all U133A probe sets as a subset. We extracted the U133A subset of the data for these samples and combined them with data for those analyzed on the U133A platform. We applied RMA (Robust Multi-array Analysis) to summarize probe set expression levels, using the web interface at http://arrayanalysis.mbni.med.umich.edu. RMA output, in the form of logged (base 2) expression levels, was generated using the custom ENTREZ12.1 Chip Definition Files (CDF), which defined probe sets for 11,912 ENTREZ transcripts and 68 control probe sets (http://nmg-r.bioinformatics.nl/Packages_for_R2.12.html). The reason for using our custom-defined CDF files rather than the probe annotation provided by Affymetrix was to re-map all probes to the latest human genome build available, and to annotate probes according to one of the most detailed gene models. The RMA results in this study thus represented 11,912 transcripts defined by ENTREZ in 03/2010 and are covered by probes on the U133A microarrays. All downstream analyses were performed in R, using contributed packages available in early 2010. To gain an overview of sample heterogeneity, we calculated sample-sample similarities for each region using pairwise Pearson's correlation coefficients (r), and calculated the average r of each sample to all other samples of the same region. We chose the threshold of average r = 0.85 to define and remove outlier microarrays. The outliers could result from either technical or biological differences. We removed additional microarrays corresponding to data produced at one laboratory for one cohort due to low average r and poor match with the duplicate microarrays from the second laboratory (denoted “batch 15” in the GEO metadata). Although the RMA method has normalized probe intensity distributions across microarrays, the resulting probe set summaries still showed between-cohort, between microarray type (U133A versus U133 Plus v2), and between-laboratory variations, thus requiring further normalization. For each brain region, we quantile-normalized the probe set values, and used pairwise correlation coefficients to define recognizable batches, which usually coincides with naturally occurring sample groups (>15 samples) according to cohorts or chip types. The 68 negative control probe sets on the microarray platform, representing spiked-in non-human transcripts, show nearly identical batch effects as using all probes (not shown), indicating that most of the batch variation is due to technical differences in reagents
Inference of cell-type composition from human brain transcriptomic datasets illuminates the effects of age, manner of death, dissection, and psychiatric diagnosis
Data table header descriptions
ID_REF
VALUE
Log(2) transformed signal, following RMA normalization