GEO Accession viewer

NCBI > GEO > Accession Display

Not logged in | Login

GEO help: Mouse over screen elements for information.

Series GSE123604

Query DataSets for GSE123604

Status

Public on Dec 12, 2018

Title

A novel computational complete deconvolution method using RNA-seq data

Organism

Experiment type

Expression profiling by high throughput sequencing

Summary

The cell type composition of many biological tissues varies widely across samples. Such sample heterogeneity hampers efforts to probe the role of each cell type in the tissue microenvironment. Current approaches that address this issue have drawbacks. Cell sorting or single-cell based experimental techniques disrupt in situ interactions and alter physiological status of cells in tissues. Computational methods are flexible and promising; but they often estimate either sample-specific proportions of each cell type or cell-type-specific gene expression profiles, not both, by requiring the other as input. We introduce a computational Complete Deconvolution method that can estimate both sample-specific proportions of each cell type and cell-type-specific gene expression profiles simultaneously using bulk RNA-Seq data only (CDSeq). We assessed our method’s performance using several synthetic and experimental mixtures of varied but known cell-type composition and compared its performance to the performance of two state-of-the-art deconvolution methods on the same mixtures. The results showed CDSeq can estimate both sample-specific proportions of each component cell type and cell-type-specific gene expression profiles with high accuracy. CDSeq holds promise for computationally deciphering complex mixtures of cell types, each with differing expression profiles, using RNA-seq data measured in bulk tissue .

Overall design

In brief, total mRNA was prepared from Namalwa (Burkitt’s lymphoma), Hs343T (fibroblast line derived from a mammary gland adenocarcinoma), hTERT-HME1 (normal mammary epithelial cells immortalized with hTERT), and MCF7 (estrogen receptor positive breast cancer cell line). mRNA samples were diluted to 100 ng/μl and mixed in different proportions (Supplementary Table 2). Global mRNA abundance of the four pure cell lines and of the mixed RNA samples was profiled by RNA-sequencing. Sequencing libraries were prepared using TruSeq RNA sample preparation kit v2 (Illumina). 75-bp single end sequencing was performed on the NextSeq sequencer (Illumina). After obtaining the fastq data, we first ran cutadapt (version 1.12) for trimming adapter sequences. Secondly, we mapped reads to the genome using STAR (version 020201). Lastly, we used featureCounts (version 1.5.1) to generate raw read counts data as the input for our algorithm.

Contributor(s)

Kang K, Meng Q, Shats I, Umbach D, Li M, Li Y, Li X, Li L

Citation(s)

Submission date

Dec 11, 2018

Last update date

May 13, 2021

Contact name

Kai Kang

E-mail(s)

[email protected]

Organization name

MIT

Department

CSAIL

Street address

32 Vassar Street

City

Cambridge

State/province

MA

ZIP/Postal code

02139

Country

USA

Platforms (1)

Illumina NextSeq 500 (Homo sapiens)

Samples (40)

More...

GSM3507820	Tumor-MCF7_pure_1
GSM3507821	CAFs-Hs_343.T_pure_1
GSM3507822	Normal_breast-hMECs-hTERT_pure_1

Relations

BioProject

SRA

Download family	Format
SOFT formatted family file(s)	SOFT
MINiML formatted family file(s)	MINiML
Series Matrix File(s)	TXT

Supplementary file	Size	Download	File type/resource
GSE123604_RAW.tar	3.3 Gb	(http)(custom)	TAR (of BW)
GSE123604_all_samples_per_gene_counts.txt.gz	1.8 Mb	(ftp)(http)	TXT
SRA Run Selector
Raw data are available in SRA
Processed data provided as supplementary file
Processed data are available on Series record

| NLM | NIH | GEO Help | Disclaimer | Accessibility |