NCBI Logo
GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
          Go
Sample GSM5031743 Query DataSets for GSM5031743
Status Public on Jan 31, 2021
Title Mix2_2
Sample type SRA
 
Source name Myeloma cell lines (1200 cells)
Organism Homo sapiens
Characteristics cell type: Myeloma cell lines
sample: sample6
cell_number: 1200
Extracted molecule polyA RNA
Extraction protocol Samples were processed using the Drop-seq DolomiteBio Nadia encapsulator system.
For nanopore sequencing, cDNA was amplified with 25 SMART PCR reactions and sequencing libraries were prepared using the Oxford Nanopore LSK-109 library preperation kit.
 
Library strategy RNA-Seq
Library source transcriptomic
Library selection cDNA
Instrument model PromethION
 
Description Myeloma cells
Data processing We performed basecalling on the raw fast5 data using Guppy (v) (guppy_basecaller –compress-fastq -c dna_r9.4.1_450bps_hac.cfg -x “cuda:1”) in GPU mode from Oxford Nanopore Technologies running on a GTX 1080 Ti graphics card. For each read we identify the barcode and UMI sequence by searching for the polyA region and flanking regions before and after the barcode/UMI. Accurately sequenced barcodes were identified based on their dual nucleotide complementarity. Unambiguous barcodes were then used as a guide to error correct the ambiguous barcodes in a second pass correction analysis approach. We performed fuzzy searching using a Levenshtein distance of 4 (unless otherwise stated in the figure legend) and replaced the original ambiguous barcode with the unambiguous sequence. A whitelist of barcodes was then generated using UMI-tools whitelist (umi_tools whitelist --bc-pattern=CCCCCCCCCCCCCCCCCCCCCCCCNNNNNNNNNNNNNNNN --set-cell-number=1000) [3]. This whitelist was used to assess the quality of our cells to read count ratio and used as an input for UMI-tools extract. Next the barcode and UMI sequence of each read was extracted and placed within the read2 header file using UMI-tools extract (umi_tools extract --bc-pattern=CCCCCCCCCCCCCCCCCCCCCCCCNNNNNNNNNNNNNNNN --whitelist=whitelist.txt). Reads were then aligned to the transcriptome using minimap2 [10] (-ax splice -uf --MD --sam-hit-only --junc-bed) using the reference transcriptome for human hg38 and mouse mm10. The resulting sam file was converted to a bam file and then sorted and indexed using samtools [11]. The transcript name was then added as a XT tag within the bam file using pysam. Finally, UMI-tools count (umi_tools count –per-gene –gene-tag=XT –per-cell –double-barcode) was used to count features to cells before being converted to a market matrix format. We modified UMI-tools count to handle the double nucleotide UMIs as defined below. This counts matrix was then used as an input into the standard Seurat pipeline.
Genome_build: hg38
Supplementary_files_format_and_content: mtx
 
Submission date Jan 22, 2021
Last update date Feb 01, 2021
Contact name Adam Cribbs
E-mail(s) [email protected]
Organization name University of Oxford
Department NDORMS
Street address Windmill Road
City Oxford
ZIP/Postal code OX37LD
Country United Kingdom
 
Platform ID GPL26167
Series (1)
GSE162053 High throughput error correction using dual nucleotide dimer blocks allows direct single-cell nanopore transcriptome sequencing
Relations
BioSample SAMN17496940
SRA SRX9920602

Supplementary file Size Download File type/resource
GSM5031743_Mix2_2_genes.barcodes.txt.gz 11.9 Kb (ftp)(http) TXT
GSM5031743_Mix2_2_genes.genes.txt.gz 17.0 Kb (ftp)(http) TXT
GSM5031743_Mix2_2_genes.mtx.gz 2.3 Mb (ftp)(http) MTX
SRA Run SelectorHelp
Raw data are available in SRA
Processed data provided as supplementary file

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap