GEO Accession viewer

NCBI > GEO > Accession Display

Not logged in | Login

GEO help: Mouse over screen elements for information.

Sample GSM6725092

Query DataSets for GSM6725092

Status

Public on Dec 11, 2022

Title

sc_rep_promoter_series_oBC_CS1_repB2

Sample type

SRA

Source name

Mix of K562, HEK293T, HepG2

Organism

Homo sapiens

Characteristics

tissue: N/A
cell line: Mix of K562, HEK293T, HepG2
cell type: Mix
genotype: Polyclonal pools of cells at high MOI integration via piggyBac of single-cell reporters with promoter series. Cellular pool bottlenecked to a few hundred clones, and mixed by hand prior to single-cell sequencing
treatment: N/A

Growth protocol

K562 cells were grown in RPMI 1640 medium (ThermoFisher, cat. num. 11875119), supplemented with 10% FBS and 1x Penicillin/streptomycin (ThermoFisher, cat. num. 15140122). HepG2 and HEK293T cells were grown in DMEM (ThermoFisher, cat. num. 10313021) with 10% FBS and 1x Penicillin/streptomycin. Cells were kept at 37C and 5% CO2, and passaged every two days (K562, HEK293T) or when cells reached confluency (HepG2, typically every three days), except for clonal expansion.
All cells were transfected in mid-exponential phase. K562 cells were transfected using MaxCyte electroporation following manufacturer’s protocol (1.5 M cells, with 15 ug reporter plasmid mix (see above), 0.5 ug super piggybac transposase (SBI) in 50 uL volume). Two replicates of 1 M of HepG2 and HEK293 cells were transfected using lipofectamine 2000 (ThermoFisher) with 4 ug of reporter plasmid mix and 0.2 ug super PiggyBac transposase (SBI). Medium was changed the next day, and cells passaged as usual thereafter. After 5 days, cells were put on puromycin selection, and grown for an additional 10 days to allow complete dilution of the non integrated plasmids. At 15 days post-transfection, cells were bottlenecked to an estimated 250 and 500 starting in clones in multiple replicates, and the populations were expanded before performing bulk vs. single-cell experiments on mid-exponential cultures.

Extracted molecule

total RNA

Extraction protocol

The bulk vs. single-cell quantification experiment was performed in two replicates. The first replicate (replicate A) with populations bottlenecked at an expected 250 clones, and the second replicate (replicate B) with populations bottlenecked at an expected 500 clones. For each replicate, at the same time, cells from each line were (1) harvested separately and methanol fixed for bulk quantification, and (2) prepared as single cell suspension, hand-mixed at an expected 1:1 ratio, and profiled for single-cell transcriptomics. Briefly, for the bulk methanol fixation, K562 cells (and HEK293 and HepG2 cells following lifting off plate with 0.05% trypsin) were washed once with ice cold PBS, and resuspended in 80% ice cold methanol, to a concentration of 1 M/mL, and placed at -80C until further processing. For single-cell processing, cells were washed twice with PBS+BSA (0.04%) and diluted to 1000 cells/uL. Cell dilutions were mixed at estimated equal proportion and loaded at expected 10k cells total on the 10x Chromium platform following manufacturer’s protocol (Single Cell 3’ v3.1 with feature barcoding, 10x Genomics), as one lane per replicate. Replicate B showed good emulsion despite having some evidence of a partial wetting failure. Sample processing otherwise proceeded similarly between the two replicates.
For single-cell reporters, three libraries are generated: the standard 3’ gene expression library from 10x (GEx), and two custom derived libraries, one for each reporter RNA (oBC and mBC), obtained from nested PCRs from the amplified cDNA. Briefly, Single-cell library preparation proceeded following the manufacturer's protocol (v3.1 manual CG000205 Rev D, 10x Genomics), with some modifications.
First, one of the replicate’s cDNA (replicate B) was split in two equal halves (and brought to same final volume with elution solution 1) after GEM RT cleanup (step 2.1.s) prior to cDNA amplification to allow for a direct comparison the UMIs captured with different enrichment strategy (hereafter replicate B1 and B2). For cDNA amplification, primers specific to the mBC (oSR38) and oBC (o246) reporter transcripts were spiked-in the reaction (similar to TAP-seq) at final concentration of 0.5 uM to boost UMI capture for replicates A and B1 (but not for replicate B2, to allow direct comparison with replicate B1). Following cDNA amplification, both the bead and supernatant derived material (steps 2.3Ax and 2.3Bxiv respectively) were saved for downstream processing.
Gene expression libraries for all replicates were prepared following the manufacturer’s protocol from 25% of the bead fraction amplified cDNA.
oBC enriched libraries were prepared as follows. For replicate B2 (no primer spiked in), a first outer PCR1 was performed using 25% of the supernatant amplified cDNA with primers oSR40+o246 using Kapa Robust (Roche) and tracking with qPCR until the inflection point (50 uL 2x master mix, 12.5 uL supernatant cDNA, 5 μL 10 μM o246, 5 μL 10 μM oSR40, 0.5 uL SYBr green, and water to 100 μL; run parameters: 3 min at 95C, and cycles 20 s at 95C, 20 s at 60C, 20 s at 72C). Amplicons were cleaned up with 1.75x ampure, and 1/10 of the eluate was carried to the inner PCR with the remaining replicates. For replicates A and B1, the outer PCR was performed during the cDNA amplification via the spiked-in primer, and 25% of the supernatant amplified cDNA was taken as input. Semi-nested inner PCR was performed with primers NextP5_index1 and indexed primers o425-o427, with the same parameters as PCR1 and stopped before the inflection point. Libraries were purified by 1.5x ampure.
As a result of our Pol II reporter construct having a capture sequence (CS2) downstream of the mBC, reporter mRNAs can be captured from both the poly-dT and CS2 reverse transcription primers on the 10x beads. To systematically compare capture efficiency resulting from the two types of primers, two different libraries were generated (poly-dT captured, and CS2 captured). For poly-dT captured libraries, similar to oBC libraries, we first performed outer PCR on replicate B2 (no spiked-in primers in cDNA amplification) using primers oSR38+o207, using the same PCR conditions as for oBC except for an elongation time of 50 s and an anneal temperature of 65C. 25% of the bead-derived amplified cDNA was also used as template. Following 1x ampure clean up, 10% of the eluate was taken for PCR2. PCR2 was performed on all replicates (directly using 25% of the bead-derived amplified cDNA for replicates A and B1) using primers o324+o495 and the same parameters as PCR1, tracking by qPCR and purifying by 1x ampure. A final PCR was performed to index amplicons with primers o076 and indexed primers (o496-o498), and the resulting amplicons purified by 1x ampure. The CS2 libraries were prepared entirely analogously to poly-dT captured libraries, except with the following primers: PCR1 for replicate B2 (SR38+SR40), PCR2 all replicates (o529+oSR40), PCR3 all replicates (NextP5_index1+ indexed primers o530-o532).
Read structure:
GEx: read 1, cell barcode UMI (no custom primer, 66 cycles); index 1, library index (no custom primer, 10 cycles); read 2, transcriptome (no custom primer, 76 cycles)
oBC: read 1, cell barcode UMI (no custom primer, 66 cycles); index 1, library index (primer o432, 10 cycles); read 2, oBC (primer o433, 76 cycles)
oBC (re-seq): read 1, cell barcode UMI (no custom primer, 34 cycles); index 1, library index (primer o432, 10 cycles); read 2 oBC (primer o433, 38 cycles)
mBC (poly-dT): read 1, cell barcode UMI (no custom primer, 66 cycles); index 1, library index (primer o494, 10 cycles); read 2, mBC (primer o334, 76 cycles)
mBC (CS2): read 1, cell barcode-UMI (no custom primer, 30 cycles); index 1, library index, (primer o534, 15 cycles); read 2, mBC (primer o334, 18 cycles)

Library strategy

RNA-Seq

Library source

transcriptomic single cell

Library selection

cDNA

Instrument model

Illumina NextSeq 500

Description

10X Genomics (custom oBC library)
oBC_counts_sc_rep_promoter_series.txt
assigned_oBC_CRE_mBC_joined_counts_sc_rep_promoter_series.txt

Data processing

GEx:
Data was converted to fastq using bcl2fastq, and fastqs were minimally processed (trimming read 1 to 28 cycles with seqtk, files renamed) to be compatible with cellranger (version 6.0.1, 10x Genomics), which was run using reference GRCh38-2020-A. Each CellRanger count output was processed with Seurat. Briefly, cell barcodes were filtered to those with >700 gene expression RNA UMIs, and between 2 and 15% mitochondrial UMI fraction. This led to 5787, 4278, and 3834, cell barcodes across the replicates A, B1, and B2. 10x data was normalized, scaled and clustered using standard commands (NormalizeData with LogNormalize method, finding 1000 top variable features with FindVariableFeatures, scaling with ScaleData over all genes, RunPCA and retaining top 50 principal components [PCs] calculated on the identified variable features, FindNeighbors on the top PCs, FindClusters with 0.1 resolution, and RunUMAP with n.neighbors of 20 and using the top PCs as input features). The UMAP revealed three clear clusters, hypothesized to correspond to the three cell lines profiled. Replicates B1 and B2 also displayed an intermediate cluster, likely as a result of the lane partial wetting failure, found to share marker genes from the focal neighboring clusters, which was excluded as plausibly composed of doublets. To confirm the cellular identity of each cluster, in addition to assessment from canonical marker genes (e.g., HBG1/2 in K562, ALB in HepG2), we compared the pseudo-bulked expression (mean across UMI counts for each gene) to bulk expression quantification in the three lines (as assessed from the average of stranded bulk RNA-seq ENCODE83 datasets in K562 and HepG2, and in HEK293), finding unambiguous correspondence of each clusters to a single line (average log-transformed R^2=0.72 for match, vs. 0.39 for non-match).
Following preliminary filtering described above, cell barcodes corresponding to doublets were removed by two methods. First, each large cluster was further sub-clustered using the same method as above, revealing focal subclusters which shared marker genes from large neighboring clusters, and usually had nearly 2-fold more total RNA UMIs. Cell barcodes contained in these clusters were excluded as doublet from further analysis. Second, scrublet was run on the filtered cell barcode set (>700 RNA UMIs, 2 to 15% mitochondrial RNAs), and a doublet score threshold of 0.25 was selected for filtration based on the separation of the bimodal peaks in the simulated score distribution. Cells either belonging to doublet subclusters or having a scrublet doublet score > 0.25 (we observed high concordance between the two approaches) were filtered out. Finally, cells with anomalously high gene expression UMI (>4000) or anomalously high multiplicity of integration (>100), also likely doublets, were removed, leaving 5505 high confidence cells for replicate A (K562: 2184, HEK293T: 2090, HepG2: 1231), 3533 for replicate B1 (K562: 1303, HEK293T: 1238, HepG2: 992), and 3172 for replicate B2 (K562: 1298, HEK293T: 1056, HepG2: 818).
mBC:
Data was converted to fastq using bcl2fastq, and fastqs were minimally processed (trimming read 1 to 28 cycles and read 2 to 22 cycles with seqtk, files renamed) to be compatible with cellranger (version 6.0.1, 10x Genomics), which was run to perform error correction on cell barcodes. The resulting position sorted bam files were then parsed for the mBC reads as follows using a custom python script. Reads aligning to the reference genome or without either corrected cell barcode or UMI (tags CB and UB in the bam file) were discarded. Only reads with the exact expected 7 nt sequence (TCGACAA) downstream of the mBC (positions 16 to 22) were retained. List of all UMIs corresponding to a cell barcode and mBC pair were stored, discarding chimeric UMIs (taken to be UMIs for which the proportion of reads associated to a given mBC vs all other mBC in the specified cell barcode falls below 0.2). mBC comprised of all Gs (empty read) were discarded. Finally, the UMI count was error corrected as follows. For each given mBC and cell barcode, the Hamming distance between all UMIs was calculated, a graph created by connecting UMIs that were a Hamming distance ≤ 1, and the resulting the number of connected components in the graph was taken as the error-corrected UMI count for a given cell barcode-mBC pair. These error corrected UMI counts were taken as the per single-cell quantification of the reporter mRNA expression (see below for a normalization strategy to correct for gene expression UMIs). Given that cell barcodes derived from capture sequence vs. poly-dT reverse transcription primer are different (bases 8 and 9 reverse complemented) on the same bead (and not error corrected by cellranger in our application), we converted the CS2 cell barcode to its poly-dT counterpart to enable matching across the different libraries.
oBC:
oBC libraries were processed in an entirely analogous way to the strategy for mBC, with the following modifications: two sequencing runs were combined in a single fastq prior to processing, read 2 were trimmed to 23 cycles, and only reads with the GCTTTAA (constant region after the oBC) at positions 17 to 23 were retained. The number of UMIs per oBC per cell barcode was also taken as the error corrected (1 Hamming distance) count and our measure of oBC expression in single cells (see below for a normalization strategy to correct for gene expression UMIs). Similarly to the CS2 mBC data above, we again converted the CS1 cell barcode to poly-dT cell barcodes.
Supplementary files format and content: GEx_obj_sc_rep_promoter_series: Seurat object of quality filtered cells.
Supplementary files format and content: assigned_oBC_CRE_mBC_joined_counts_sc_rep_promoter_series: Final assigned joined cell-oBC-CRE_mBC table. Restricting to oBC with >11 UMI counts, and to uniquely matchable oBC-promoter-mBC triplets. column 1: cell barcode; column 2: replicate ID; column 3: oBC; column 4: mBC; column 5: CRE class (all promoters here); column 6: CRE identity; column 7: read counts oBC; column 8: UMI counts oBC; column 9: read counts mBC; column 10: UMI counts mBC
Supplementary files format and content: mBC_CS2_counts_sc_rep_promoter_series: raw count table (restricted to mBC from the list determined in the subassembly) per mBC per cell (CS2 captured). Column 1: cell barcode; column 2: replicate ID; column 3: mBC; column 4: read counts; column 5: UMI counts; column 6: capture modality.
Supplementary files format and content: mBC_poly_dT_counts_sc_rep_promoter_series: raw count table (restricted to mBC from the list determined in the subassembly) per mBC per cell (poly-dT captured). Column 1: cell barcode; column 2: replicate ID; column 3: mBC; column 4: read counts; column 5: UMI counts; column 6: capture modality.
Supplementary files format and content: oBC_counts_sc_rep_promoter_series: raw count table (restricted to oBC from the list determined in the subassembly and UMI count >1 to decrease file size) per oBC per cell (CS1 captured). Column 1: cell barcode; column 2: replicate ID; column 3: oBC; column 4: read counts

Submission date

Nov 10, 2022

Last update date

Dec 11, 2022

Contact name

Jean-Benoit Lalanne

E-mail(s)

[email protected]

Organization name

University of Washington

Department

Genome Sciences

Lab

Jay Shendure

Street address

3720 15th Ave NE

City

Seattle

State/province

ZIP/Postal code

98195

Country

USA

Platform ID

GPL18573

Series (2)

GSE217689	Multiplex profiling of developmental enhancers with quantitative, single-cell expression reporters [scQer_promoters_cell_lines]
GSE217690	Multiplex profiling of developmental enhancers with quantitative, single-cell expression reporters

Relations

BioSample

SAMN31678312

SRA

SRX18229813

Supplementary data files not provided

SRA Run Selector

Raw data are available in SRA

Processed data are available on Series record