The Project for High-Confidence Coding and Noncodi... (ID 381216) - BioProject

Display Settings:

Format

Summary
BioProject ID list
Accessions List

Send to:

Choose Destination

File
Clipboard
Collections
My Bibliography

Accession: PRJNA381216 ID: 381216

The Project for High-Confidence Coding and Noncoding Transcriptome Maps

The advent of high-throughput RNA sequencing (RNA-seq) has led to the discovery of unprecedentedly immense transcriptomes encoded by eukaryotic genomes. However, the transcriptome maps are still incomplete partly because they were mostly reconstructed based on RNA-seq reads that lack their orientations (known as unstranded reads) and certain boundary information. Methods to expand the usability of unstranded RNA-seq data by predetermining the orientation of the reads and precisely determining the boundaries of assembled transcripts could significantly benefit the quality of the resulting transcriptome maps. Here, we present a high-performing transcriptome assembly pipeline, called CAFE, that significantly improves the original assemblies, respectively assembled with stranded and/or unstranded RNA-seq data, by orienting unstranded reads using the maximum likelihood estimation and by integrating information about transcription start sites and cleavage and polyadenylation sites. Applying large-scale transcriptomic data comprising 230 billion RNA-seq reads from the ENCODE, Human BodyMap Projects, The Cancer Genome Atlas, and GTEx, CAFE enabled us to predict the directions of about 220 billion unstranded reads, which led to the construction of more accurate transcriptome maps, comparable to the manually curated map, and a comprehensive lncRNA catalogue that includes thousands of novel lncRNAs. Our pipeline should not only help to build comprehensive, precise transcriptome maps from complex genomes but also to expand the universe of non-coding genomes. This SuperSeries is composed of the SubSeries listed below. Overall design: The direction of unstranded reads (from ENCODE, Human BodyMap Projects, GTEx and TCGA as well as from HeLa and mES cells) were predicted using k-order Markov chain models (kMC) generating a read with a predicted direction (RPD) and were used to assemble transcriptome maps (BIGTranscriptome). Those transcriptome maps were next used for quantification of RPDs. Refer to individual Series Less...

Accession	PRJNA381216; GEO: GSE97212
Type	Umbrella project
Publications	You BH et al., "High-confidence coding and noncoding transcriptome maps.", Genome Res, 2017 Jun;27(6):1050-1062
Submission	Registration date: 29-Mar-2017 Department of Life Science, Hanyang University
Relevance	Superseries

Project Data:

Resource Name	Number of Links
Sequence data
SRA Experiments	2
Publications
PubMed	1
PMC	1
Other datasets
BioSample	2
GEO DataSets	3

GEO Data Details

Parameter	Value
Data volume, Supplementary Mbytes	1916

SRA Data Details

Parameter	Value
Data volume, Gbases	17
Data volume, Mbytes	11414

The Project for High-Confidence Coding and Noncoding Transcriptome Maps encompasses the following 2 sub-projects:

Project Type

Number of Projects

Transcriptome or Gene expression

BioProject accession	Name	Title
PRJNA381218	High-confidence Coding and Noncoding Transcriptome Maps	High-confidence Coding and Noncoding Transcriptome Maps (Department of Life Science,...)
PRJNA335726	Mus musculus	Co-assembly of stranded and unstranded RNA-seq data improves coding and noncoding transcriptome maps (Department of Life Science,...)

BioProject

Result Filters

Display Settings:

Send to:

The Project for High-Confidence Coding and Noncoding Transcriptome Maps

Supplemental Content

Related information

Recent activity