GEO Accession viewer

NCBI > GEO > Accession Display

Not logged in | Login

GEO help: Mouse over screen elements for information.

Sample GSM2431991

Query DataSets for GSM2431991

Status

Public on Dec 20, 2016

Title

DLPFC_Female_Control_p_UMichigan_105725

Sample type

RNA

Source name

DLPFC_Female_Control_p_UMichigan

Organism

Homo sapiens

Characteristics

cohort: Schiz Cohort 2
site of processing: U_Michigan
diagnosis: Control
subject id: 105725
agonal factor: 0
tissue ph (cerebellum): 6.55
gender: F
race: Caucasian
age: 68
post-mortem interval: 33.5
suicide (1=yes): 0
tissue: Dorsolateral Prefrontal Cortex
qc_batch: 15

Extracted molecule

total RNA

Extraction protocol

Coronal slices of the brain were rapidly frozen on pre-cooled (to −120°C) aluminum plates, and stored at −80°C. Dorsolateral prefrontal cortex samples (area 9 plus 46) were trimmed to include an approximately equal ratio of gray and white matter, the latter being restricted to a region approximately the same thickness as layer VI of the cortex. Samples were taken from the left side. Following tissue dissection, total RNA was isolated by using TRIzol reagents (Invitrogen, Carlsbad, CA, USA), and shipped to the three research groups that collaborate on this project (University of California, Irvine; University of California, Davis; University of Michigan). For details, see (Evans 2003 Neurobiol Dis 14:240-250, Vawter 2003 Neuropsychopharm 29: 373-384, Li 2004 Biol Psychiatry 55:346-352, Li 2013 PNAS 110: 9950-9955).

Label

biotin

Label protocol

Sample labeling and hybridization followed the exact Affymetrix procedures.

Hybridization protocol

Sample labeling and hybridization followed the exact Affymetrix procedures.

Scan protocol

standard Affymetrix protocol

Data processing

A fully-cited description of the preprocessing can be found in the supplementary section of Li 2013 PNAS 110: 9950-9955: Microarray experiments were performed in separate experimental cohorts, ranging from five to eight cohorts depending on the brain region. Each cohort contained a mixture of cases and controls, with most RNA samples analyzed in duplicate at two laboratories (some analyzed in three laboratories). The generation of probe-level intensity data (i.e., the CEL files) relied on standard Affymetrix library files and further processed using a custom annotation file (see below). While most cohorts were analyzed on Affymetrix U133A platform, several of the latest cohorts were analyzed on the newer, U133Plus-v2 platform, which contains all U133A probe sets as a subset. We extracted the U133A subset of the data for these samples and combined them with data for those analyzed on the U133A platform. We applied RMA (Robust Multi-array Analysis) to summarize probe set expression levels, using the web interface at http://arrayanalysis.mbni.med.umich.edu. RMA output, in the form of logged (base 2) expression levels, was generated using the custom ENTREZ12.1 Chip Definition Files (CDF), which defined probe sets for 11,912 ENTREZ transcripts and 68 control probe sets (http://nmg-r.bioinformatics.nl/Packages_for_R2.12.html). The reason for using our custom-defined CDF files rather than the probe annotation provided by Affymetrix was to re-map all probes to the latest human genome build available, and to annotate probes according to one of the most detailed gene models. The RMA results in this study thus represented 11,912 transcripts defined by ENTREZ in 03/2010 and are covered by probes on the U133A microarrays. All downstream analyses were performed in R, using contributed packages available in early 2010. To gain an overview of sample heterogeneity, we calculated sample-sample similarities for each region using pairwise Pearson's correlation coefficients (r), and calculated the average r of each sample to all other samples of the same region. We chose the threshold of average r = 0.85 to define and remove outlier microarrays. The outliers could result from either technical or biological differences. We removed additional microarrays corresponding to data produced at one laboratory for one cohort due to low average r and poor match with the duplicate microarrays from the second laboratory (denoted “batch 15” in the GEO metadata). Although the RMA method has normalized probe intensity distributions across microarrays, the resulting probe set summaries still showed between-cohort, between microarray type (U133A versus U133 Plus v2), and between-laboratory variations, thus requiring further normalization. For each brain region, we quantile-normalized the probe set values, and used pairwise correlation coefficients to define recognizable batches, which usually coincides with naturally occurring sample groups (>15 samples) according to cohorts or chip types. The 68 negative control probe sets on the microarray platform, representing spiked-in non-human transcripts, show nearly identical batch effects as using all probes (not shown), indicating that most of the batch variation is due to technical differences in reagents and instruments rather than due to biological differences between samples in different batches. To adjust for batch effects, we median-centered the expression levels of each transcript within each batch, and confirmed, by using the correlation matrices, that the batch-effects were removed after the adjustment. We compared the result of this simple correction with the alternative, Bayesian batch-correction approach implemented in combat, and did not see meaningful differences in performance in terms of duplicate-sample concordance (results not shown). While this is contrary to the published comparison results showing that combat is a better algorithm for dealing with batch effects, its advantage is probably blunted in our dataset because (1) we have larger sample sizes per batch (typically >15) than what was tested in the published comparisons, and we used median centering rather than mean-centering. The latter is susceptible to the influence of outlier values, yet was used in earlier comparisons with combat. We note that combat has decreased the scale of variation for most transcripts (as a consequence of improving the group variance estimation) and resulted in under-reporting of fold-changes between sample groups. We therefore opted to maintain the use of the median-centering approach in this study. After per-batch median-centering, we quantile-normalized the resulting values, and averaged the replicate microarrays for the same samples, yielding a dataset for unique subjects for each region.
Microarray experiments were performed in separate experimental cohorts, ranging from five to eight cohorts depending on the brain region. Each cohort contained a mixture of cases and controls, with most RNA samples analyzed in duplicate at two laboratories (some analyzed in three laboratories). The generation of probe-level intensity data (i.e., the CEL files) relied on standard Affymetrix library files and further processed using a custom annotation file (see below).
While most cohorts were analyzed on Affymetrix U133A platform, several of the latest cohorts were analyzed on the newer, U133Plus-v2 platform, which contains all U133A probe sets as a subset. We extracted the U133A subset of the data for these samples and combined them with data for those analyzed on the U133A platform. We applied RMA (Robust Multi-array Analysis) to summarize probe set expression levels, using the web interface at http://arrayanalysis.mbni.med.umich.edu. RMA output, in the form of logged (base 2) expression levels, was generated using the custom ENTREZ12.1 Chip Definition Files (CDF), which defined probe sets for 11,912 ENTREZ transcripts and 68 control probe sets (http://nmg-r.bioinformatics.nl/Packages_for_R2.12.html). The reason for using our custom-defined CDF files rather than the probe annotation provided by Affymetrix was to re-map all probes to the latest human genome build available, and to annotate probes according to one of the most detailed gene models. The RMA results in this study thus represented 11,912 transcripts defined by ENTREZ in 03/2010 and are covered by probes on the U133A microarrays. All downstream analyses were performed in R, using contributed packages available in early 2010.
To gain an overview of sample heterogeneity, we calculated sample-sample similarities for each region using pairwise Pearson's correlation coefficients (r), and calculated the average r of each sample to all other samples of the same region. We chose the threshold of average r = 0.85 to define and remove outlier microarrays. The outliers could result from either technical or biological differences. We removed additional microarrays corresponding to data produced at one laboratory for one cohort due to low average r and poor match with the duplicate microarrays from the second laboratory (denoted “batch 15” in the GEO metadata).
Although the RMA method has normalized probe intensity distributions across microarrays, the resulting probe set summaries still showed between-cohort, between microarray type (U133A versus U133 Plus v2), and between-laboratory variations, thus requiring further normalization. For each brain region, we quantile-normalized the probe set values, and used pairwise correlation coefficients to define recognizable batches, which usually coincides with naturally occurring sample groups (>15 samples) according to cohorts or chip types. The 68 negative control probe sets on the microarray platform, representing spiked-in non-human transcripts, show nearly identical batch effects as using all probes (not shown), indicating that most of the batch variation is due to technical differences in reagents

Submission date

Dec 19, 2016

Last update date

Dec 20, 2016

Contact name

Megan Hastings Hagenauer

E-mail(s)

[email protected]

Organization name

University of Michigan

Department

MBNI

Lab

Dr. Huda Akil & Dr. Stanley Watson

Street address

205 Zina Pitcher Pl.

City

Ann Arbor

State/province

ZIP/Postal code

48109

Country

USA

Platform ID

GPL10526

Series (1)

GSE92538

Inference of cell-type composition from human brain transcriptomic datasets illuminates the effects of age, manner of death, dissection, and psychiatric diagnosis

Data table header descriptions
ID_REF
VALUE	Log(2) transformed signal, following RMA normalization

Data table
ID_REF	VALUE
10000_at	4.881246457
10001_at	5.326297275
10002_at	5.058259217
10003_at	4.585334935
100048912_at	5.062558758
10004_at	5.154250238
10005_at	6.45753394
10006_at	7.821045177
10007_at	7.75042618
10009_at	4.234529003
1000_at	8.114687872
10010_at	5.436672639
100127886_at	3.991295064
100127972_at	4.553490106
100128008_at	7.3544562
100128062_at	9.196977931
100128414_at	4.141857033
100128919_at	6.040204425
100129015_at	4.582962888
100129128_at	4.555040023

Total number of rows: 11973

Table truncated, full table size 238 Kbytes.

Supplementary file	Size	Download	File type/resource
GSM2431991_M_133P_105725_DLPFC_0_C.cel.gz	8.5 Mb	(ftp)(http)	CEL
Processed data included within Sample table