GEO Accession viewer

NCBI > GEO > Accession Display

Not logged in | Login

GEO help: Mouse over screen elements for information.

Sample GSM4468524

Query DataSets for GSM4468524

Status

Public on May 28, 2020

Title

WT replicate 1 Dataset#3

Sample type

SRA

Source name

flowers

Organism

Arabidopsis thaliana

Characteristics

genotype: WT
biological replicate: replicate 1
dataset: Dataset#3
reference sequence: AT1G29910|AT1G29920|AT2G21660|AT2G30570|AT2G46820|AT3G05880|AT4G02770|AT4G22150|AT4G38770|AT5G19140|AT5G42530|AT2G34420

Growth protocol

Arabidopsis plantlets of Col-0 accession were grown on soil with 16 hr light (21 °C) /8h (18 °C) darkness cycles

Extracted molecule

total RNA

Extraction protocol

Total RNA was extracted from flowers with TRI Reagent® (Molecular Research Center) according to manufacturer’s instructions
3'RACEseq protocol is based on the ligation of a primer at the 3’ end of RNA, and the subsequent targeted amplification by PCR of amplicons suitable for Illumina sequencing.

Library strategy

RNA-Seq

Library source

transcriptomic

Library selection

RACE

Instrument model

Illumina MiSeq

Description

Processed_data_file.txt

Data processing

After initial data processing by the MiSeq Control Software v 2.5. (Illumina), base calls were retrieved and further analysed by a suite of home made python scripts (v2.7) using biopython (v1.63) and regex (v2.4) libraries
Data processing pipeline was adapted from (Sikorska et al., 2017). Reads with low quality bases (=<Q10) within the 15 –base random sequence of the read 2 or within the 30 bases downstream the delimiter sequence, were filtered out
Sequences with identical nucleotides in 15 –base random sequence were deduplicated.
Next, 20 nucleotides sequences corresponding to nucleotides of the transcript were searched into reads 1 to identify the corresponding target mRNAs. One mismatch was tolerated. Matched reads 1 and their corresponding reads 2 were extracted and annotated.
Reads 2 that contain the delimiter sequence were selected and subsequently trimmed from their random and delimiter sequences.
Then, the analysis was divided into two steps.
The aim of the first step was to identify the position of mRNA 3’ extremities and to detect untemplated nucleotides. To do this, the 30 nucleotide sequences downstream of the read 2 delimiter sequence were mapped to the corresponding reference sequence, which goes from the first nucleotide of the transcript that maps the forward PCR2 primer to the end of the mRNA. Up to four mismatches were tolerated, with the exception of the first five nucleotides downstream of the mapping site that had to perfectly map. To map the 3’ end position of reads 2 with untemplated tails, the sequences of the unmatched reads 2 were successively trimmed from their 3’ end, with a one nucleotide trimming step, until they could be mapped to the reference sequence or until a maximum of 30 nucleotide has been removed. For each successfully mapped read 2, untemplated nucleotides at the 3’ end were extracted
The aim of the second step was to analyze long mRNA poly(A) tail. Sequencing of long homopolymeric stretches causes a rapid decrease of sequencing quality, making it impossible to exactly map the 3’ end of mRNA with long poly(A). We thus looked for long T stretches of at least 10 Ts in the read 2 that failed to map the reference sequence. Poly(A) tails were searched with the constraint that it must begin in the first 30 cycles, which means that the maximal length of the added 3’ end modification is limited to 29 nucleotides.
Finally, results from step 1 and 2 were compiled and 3’ extensions were analyzed.
Genome_build: TAIR10
Supplementary_files_format_and_content: One processed data file is given. It includes the processed data of the 22 Arabidopsis mRNAs for each genotype and biological replicate. Each line corresponds to one individual reads.
Supplementary_files_format_and_content: For each read, we indicate read ID (read.ID), gene AGI (Gene), poly(A) tail sequence (polyA), poly(A) tail length (polyA.size), non-A extension sequence (modification), length of the non-A extension (modification.size), tail sequence (extension), tail size (extension.size), a tag (classification) indicating the category of the tail, the 15N random sequence used for deduplication (random), biological replicate (rep) and genotype (genotype, WT or urt1).

Submission date

Apr 09, 2020

Last update date

May 28, 2020

Contact name

Dominique Gagliardi

E-mail(s)

[email protected]

Organization name

CNRS

Department

IBMP