NCBI Logo
GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
          Go
Sample GSM4468524 Query DataSets for GSM4468524
Status Public on May 28, 2020
Title WT replicate 1 Dataset#3
Sample type SRA
 
Source name flowers
Organism Arabidopsis thaliana
Characteristics genotype: WT
biological replicate: replicate 1
dataset: Dataset#3
reference sequence: AT1G29910|AT1G29920|AT2G21660|AT2G30570|AT2G46820|AT3G05880|AT4G02770|AT4G22150|AT4G38770|AT5G19140|AT5G42530|AT2G34420
Growth protocol Arabidopsis plantlets of Col-0 accession were grown on soil with 16 hr light (21 °C) /8h (18 °C) darkness cycles
Extracted molecule total RNA
Extraction protocol Total RNA was extracted from flowers with TRI Reagent® (Molecular Research Center) according to manufacturer’s instructions
3'RACEseq protocol is based on the ligation of a primer at the 3’ end of RNA, and the subsequent targeted amplification by PCR of amplicons suitable for Illumina sequencing.
 
Library strategy RNA-Seq
Library source transcriptomic
Library selection RACE
Instrument model Illumina MiSeq
 
Description Processed_data_file.txt
Data processing After initial data processing by the MiSeq Control Software v 2.5. (Illumina), base calls were retrieved and further analysed by a suite of home made python scripts (v2.7) using biopython (v1.63) and regex (v2.4) libraries
Data processing pipeline was adapted from (Sikorska et al., 2017). Reads with low quality bases (=<Q10) within the 15 –base random sequence of the read 2 or within the 30 bases downstream the delimiter sequence, were filtered out
Sequences with identical nucleotides in 15 –base random sequence were deduplicated.
Next, 20 nucleotides sequences corresponding to nucleotides of the transcript were searched into reads 1 to identify the corresponding target mRNAs. One mismatch was tolerated. Matched reads 1 and their corresponding reads 2 were extracted and annotated.
Reads 2 that contain the delimiter sequence were selected and subsequently trimmed from their random and delimiter sequences.
Then, the analysis was divided into two steps.
The aim of the first step was to identify the position of mRNA 3’ extremities and to detect untemplated nucleotides. To do this, the 30 nucleotide sequences downstream of the read 2 delimiter sequence were mapped to the corresponding reference sequence, which goes from the first nucleotide of the transcript that maps the forward PCR2 primer to the end of the mRNA. Up to four mismatches were tolerated, with the exception of the first five nucleotides downstream of the mapping site that had to perfectly map. To map the 3’ end position of reads 2 with untemplated tails, the sequences of the unmatched reads 2 were successively trimmed from their 3’ end, with a one nucleotide trimming step, until they could be mapped to the reference sequence or until a maximum of 30 nucleotide has been removed. For each successfully mapped read 2, untemplated nucleotides at the 3’ end were extracted
The aim of the second step was to analyze long mRNA poly(A) tail. Sequencing of long homopolymeric stretches causes a rapid decrease of sequencing quality, making it impossible to exactly map the 3’ end of mRNA with long poly(A). We thus looked for long T stretches of at least 10 Ts in the read 2 that failed to map the reference sequence. Poly(A) tails were searched with the constraint that it must begin in the first 30 cycles, which means that the maximal length of the added 3’ end modification is limited to 29 nucleotides.
Finally, results from step 1 and 2 were compiled and 3’ extensions were analyzed.
Genome_build: TAIR10
Supplementary_files_format_and_content: One processed data file is given. It includes the processed data of the 22 Arabidopsis mRNAs for each genotype and biological replicate. Each line corresponds to one individual reads.
Supplementary_files_format_and_content: For each read, we indicate read ID (read.ID), gene AGI (Gene), poly(A) tail sequence (polyA), poly(A) tail length (polyA.size), non-A extension sequence (modification), length of the non-A extension (modification.size), tail sequence (extension), tail size (extension.size), a tag (classification) indicating the category of the tail, the 15N random sequence used for deduplication (random), biological replicate (rep) and genotype (genotype, WT or urt1).
 
Submission date Apr 09, 2020
Last update date May 28, 2020
Contact name Dominique Gagliardi
E-mail(s) [email protected]
Organization name CNRS
Department IBMP
Street address 12, rue du General Zimmer
City Strasbourg
ZIP/Postal code 67084
Country France
 
Platform ID GPL17970
Series (2)
GSE148406 URT1-mediated uridylation shapes poly(A) tails in Arabidopsis
GSE148449 Molecular connection between the TUTase URT1 and decapping activators
Relations
BioSample SAMN14568345
SRA SRX8092594

Supplementary data files not provided
SRA Run SelectorHelp
Raw data are available in SRA
Processed data are available on Series record

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap