|
|
GEO help: Mouse over screen elements for information. |
|
Status |
Public on Dec 19, 2017 |
Title |
Dsim_03 |
Sample type |
SRA |
|
|
Source name |
Whole embryos (staged)
|
Organism |
Drosophila simulans |
Characteristics |
strain: w[501] (reference genome strain) ucsd stock #: 14021-0251.195 Stage: 2-3 hours after egg laying
|
Growth protocol |
Stocks were maintained on standard cornmeal medium. Embryo collections were performed in population cages. 2- to 7-day-old flies were left to acclimatize to the cage for at least 48h and regularly fed with grape juice-agar plates generously loaded with fresh yeast paste. After two 2-hour pre-lays, embryos were collected in 1-hour windows and aged appropriately (22 time points, 0-22h). Embryos were washed with deionized water, dechorionated for 90 sec with 50% bleach, rinsed abundantly with water, and snap-frozen in liquid nitrogen.
|
Extracted molecule |
total RNA |
Extraction protocol |
Total RNA was extracted from embryos using a Beadbeater (Biospec, Cat. #607) with 1.0 mm zirconia beads (Biospec, #11079110zx) and the RNAdvance Tissue kit (Agencourt #A32649) according to the manufacturer’s instructions, including DNaseI treatment. We systematically checked on a Bioanalyzer RNA Nano chip (Agilent) that the RNA was of very high quality. Libraries were prepared as described in Batut et al., Genome Research 2012. In brief, 5’-monophosphate transcripts were depleted by TEX digest (Epicentre #TER51020). For every time series, each sample was labeled with a different sequence barcode during reverse-transcription, and all samples for the series were then pooled and processed together as a single library. The 5'-complete cDNA selection strategy relies on the combination of two orthogonal enrichment methods: reverse-transcriptase template-switching, and cap-trapping. The template-switching approach is based on the ability of reverse-transcriptase to add linker sequences to the ends of 5'-complete cDNAs – preferentially if they are made from capped transcripts. Cap-trapping relies on the biotinylation of capped RNA molecules and specific pulldown of their associated 5'-complete cDNAs. Quality control and library quantification were carried out on a Bioanalyzer DNA High Sensitivity chip. Each library was sequenced on one lane of an Illumina HiSeq 2000. reference: Batut, P., Dobin, A., Plessy, C., Carninci, P. & Gingeras, T.R. High-fidelity promoter profiling reveals widespread alternative promoter usage and transposon-driven developmental gene expression. Genome Res (2012). reference: Batut, P. & Gingeras, T.R. RAMPAGE: Promoter Activity Profiling by Paired-End Sequencing of 5'-Complete cDNAs. Curr Protoc Mol Biol 104, 25B 11 1-25B 11 16 (2013).
|
|
|
Library strategy |
RNA-Seq |
Library source |
transcriptomic |
Library selection |
cDNA |
Instrument model |
Illumina HiSeq 2000 |
|
|
Data processing |
Sequences corresponding to the library identification barcode (first 6 bases of read 1) and the reverse-transcription primer (first 15 bases of read 2) were trimmed prior to mapping. Trimmed reads were mapped with STAR, with parameters as follows (--outSAMattributes All --outFilterScoreMinOverLread 0.85 --outFilterMatchNminOverLread 0.85 —outFilterMultimapNmax 50 --outFilterIntronMotifs None --seedSearchStartLmax 30 —outSAMunmapped Within --outFilterType BySJout --outFilterMismatchNmax 10 —clip5pNbases 6 15). All uniquely mapping reads were kept. As a rescue strategy for multiply mapping reads, if all alignments for those reads started within an annotated transposon and overlapped the same gene annotation, the alignment starting in the closest transposon insertion was selected. All non-rescued multi-mappers were discarded. PCR duplicates, defined as reads sharing the same alignment coordinates (start, end and splice sites), were removed from the individual datasets. To avoid over-collapsing, we took advantage of the fact that the long random sequence (15-mer) of our reverse-transcription primer often primes with mismatches. We used this sequence as a pseudo-random barcode allowing us to distinguish between true duplicates (same barcode) and independent identical inserts. Custom scripts were used to extract the 5'-most position of each RAMPAGE read, which corresponds to the TSS used for the transcription of the original RNA molecule. All collapsed datasets in the time series were combined prior to peak calling. The density of cDNA 5' ends across the genome was determined from this combined dataset, as well as the density of coverage by second (i.e., downstream) sequencing reads. Peaks were called by a sliding window algorithm that assesses the significance of local signal enrichment given a null distribution. Downstream read coverage in the same window was used to correct for local transcript abundance, by subtracting from the raw signal a pseudocount proportional to this coverage. After FDR correction, significant windows in close proximity to each other were merged into peaks, and those were trimmed at the edges down to the first base with signal. (Parameters: window width 15 bases, null distribution negative binomial with k=0.5, background weight 0.5, FDR 0.1, merging range 150 bases). Individual peaks were connected to annotated genes based on cDNA structure information. For each peak, if we could find at least 2 inserts having their 5' in the peak and overlapping an annotated exon of a gene, the peak was functionally linked to that gene. If a peak could potentially be linked to several genes, ties were broken by removing all links that were 5-fold weaker than the strongest one. For quantification, the signal for each peak and each timepoint was derived from the uncollapsed datasets, and normalized to dataset size (defined as the total number of reads attributed to any genic TSS). Expression valuesin the matrix files are thus in reads per milion (RPM). Expression values for genes are the sum of expression values for all TSCs attributed to the gene (in RPM). Gene expression matrices were upsampled 5-fold, subjected to Gaussian smoothing (width parameter=10), and converted to Z-scores using the GTEM package (see Goltsev et al., BMC Bioinformatics 2009). The time series was globally aligned to the D. melanogaster replicate 1 series using the MAlign tool in GTEM. Genome_build: dsim r1.4 (FlyBase) Supplementary_files_format_and_content: Signal density tracks (bigWig) were generated after collapsing PCR duplicates, and represent the density of RAMPAGE read 5' ends (i.e., the 5' mapping position) over each base across the genome. Supplementary_files_format_and_content: Peak calls (i.e., TSCs) for the combined time series datasets are annotated in BED6 file format. Scores correspond to the total number of collapsed read 5' ends supporting the peak. Supplementary_files_format_and_content: TSC expression and Gene expression matrices combine expression values for individual features, normalized by the total number of uniquely mapped reads per time point (in reads per million, RPM). Supplementary_files_format_and_content: The aligned gene expression matrix represents expression values as Z-scores of the original RPM values, after upsampling, smoothing and global alignment of the time series.
|
|
|
Submission date |
Oct 28, 2016 |
Last update date |
May 15, 2019 |
Contact name |
Philippe Batut |
E-mail(s) |
[email protected]
|
Phone |
516-422-4122
|
Organization name |
CSHL
|
Lab |
Gingeras
|
Street address |
500 Sunnyside Blvd.
|
City |
Woodbury |
State/province |
NY |
ZIP/Postal code |
11797 |
Country |
USA |
|
|
Platform ID |
GPL13306 |
Series (2) |
GSE89302 |
Drosophila simulans embryonic development RAMPAGE time series |
GSE89335 |
Drosophila species embryonic development RAMPAGE time series |
|
Relations |
BioSample |
SAMN05954846 |
SRA |
SRX2310921 |
Supplementary file |
Size |
Download |
File type/resource |
GSM2365689_Dsim_03_+.bw |
605.9 Kb |
(ftp)(http) |
BW |
GSM2365689_Dsim_03_-.bw |
561.9 Kb |
(ftp)(http) |
BW |
SRA Run Selector |
Raw data are available in SRA |
Processed data provided as supplementary file |
Processed data are available on Series record |
|
|
|
|
|