|
|
GEO help: Mouse over screen elements for information. |
|
Status |
Public on Dec 12, 2018 |
Title |
A novel computational complete deconvolution method using RNA-seq data |
Organism |
Homo sapiens |
Experiment type |
Expression profiling by high throughput sequencing
|
Summary |
The cell type composition of many biological tissues varies widely across samples. Such sample heterogeneity hampers efforts to probe the role of each cell type in the tissue microenvironment. Current approaches that address this issue have drawbacks. Cell sorting or single-cell based experimental techniques disrupt in situ interactions and alter physiological status of cells in tissues. Computational methods are flexible and promising; but they often estimate either sample-specific proportions of each cell type or cell-type-specific gene expression profiles, not both, by requiring the other as input. We introduce a computational Complete Deconvolution method that can estimate both sample-specific proportions of each cell type and cell-type-specific gene expression profiles simultaneously using bulk RNA-Seq data only (CDSeq). We assessed our method’s performance using several synthetic and experimental mixtures of varied but known cell-type composition and compared its performance to the performance of two state-of-the-art deconvolution methods on the same mixtures. The results showed CDSeq can estimate both sample-specific proportions of each component cell type and cell-type-specific gene expression profiles with high accuracy. CDSeq holds promise for computationally deciphering complex mixtures of cell types, each with differing expression profiles, using RNA-seq data measured in bulk tissue .
|
|
|
Overall design |
In brief, total mRNA was prepared from Namalwa (Burkitt’s lymphoma), Hs343T (fibroblast line derived from a mammary gland adenocarcinoma), hTERT-HME1 (normal mammary epithelial cells immortalized with hTERT), and MCF7 (estrogen receptor positive breast cancer cell line). mRNA samples were diluted to 100 ng/μl and mixed in different proportions (Supplementary Table 2). Global mRNA abundance of the four pure cell lines and of the mixed RNA samples was profiled by RNA-sequencing. Sequencing libraries were prepared using TruSeq RNA sample preparation kit v2 (Illumina). 75-bp single end sequencing was performed on the NextSeq sequencer (Illumina). After obtaining the fastq data, we first ran cutadapt (version 1.12) for trimming adapter sequences. Secondly, we mapped reads to the genome using STAR (version 020201). Lastly, we used featureCounts (version 1.5.1) to generate raw read counts data as the input for our algorithm.
|
|
|
Contributor(s) |
Kang K, Meng Q, Shats I, Umbach D, Li M, Li Y, Li X, Li L |
Citation(s) |
31790389 |
|
Submission date |
Dec 11, 2018 |
Last update date |
May 13, 2021 |
Contact name |
Kai Kang |
E-mail(s) |
[email protected]
|
Organization name |
MIT
|
Department |
CSAIL
|
Street address |
32 Vassar Street
|
City |
Cambridge |
State/province |
MA |
ZIP/Postal code |
02139 |
Country |
USA |
|
|
Platforms (1) |
GPL18573 |
Illumina NextSeq 500 (Homo sapiens) |
|
Samples (40)
|
|
Relations |
BioProject |
PRJNA509361 |
SRA |
SRP173265 |
Supplementary file |
Size |
Download |
File type/resource |
GSE123604_RAW.tar |
3.3 Gb |
(http)(custom) |
TAR (of BW) |
GSE123604_all_samples_per_gene_counts.txt.gz |
1.8 Mb |
(ftp)(http) |
TXT |
SRA Run Selector |
Raw data are available in SRA |
Processed data provided as supplementary file |
Processed data are available on Series record |
|
|
|
|
|