GEO Accession viewer

NCBI > GEO > Accession Display

Not logged in | Login

GEO help: Mouse over screen elements for information.

Sample GSM7839602

Query DataSets for GSM7839602

Status

Public on Oct 17, 2023

Title

oBC_CRE_mutated_TFBS_series

Sample type

SRA

Source name

Bacteria (Escherichia coli)

Organism

Escherichia coli

Characteristics

cell type: Bacteria (Escherichia coli)

Extracted molecule

genomic DNA

Extraction protocol

Plasmids were purified with Zymo MidiPrep kit.
Promoter series (to assess positional effects):To obtain the list of mBC corresponding to each library, we prepared libraries for sequencing similarly to the DNA arm of bulk MPRA, as described before. Starting from 5 ng of each plasmid library, two rounds of PCR were performed (4 cycles PCR1 with primers oJBL039+oJBL753 to append a pseudo 10 bp UMI, followed by 8 cycles with oJBL361 and Nextera v2 P7 indexed primers). Amplicons were cleaned up with Ampure XP beads (1x), and pooled for sequencing. Sequencing was performed on a Nextseq2000 using custom set of primers (read 1: oJBL369, 15 cycles to read mBC; index 1: oJBL335 [Nextera index 1], 10 cycles to read the sample index; read 2: oJBL494 [Nextera read 2], 10 cycles to read the pseudo-UMI; index 2: oJBL371, 15 cycles to read the reverse complement of the mBC). The data was processed to a piled up file (counting number of reads and UMI per barcode per library). The count distributions per barcode per library were inspected and found to be bimodal. The bona fide barcodes present in the libraries were taken to be those in the high count mode (count threshold the minimum of the bimodal distribution), leading to 153.9k barcodes across the 20 libraries. To ensure no inter-library barcode collision (given that the libraries were pooled for the experiment), the final list of barcodes used was filtered to only have barcodes present in a single library out of the 20 pooled for the experiment, leading to a final set of 134.1k barcodes (13% multiply represented barcodes removed). Barcode complexity per library spanned 588 to 22.1k with interquartile range 3.6k to 11.3k. Libraries p058 & p059: the connection between mBC and CRE was obtained using a similar strategy as that connecting oBC to CREs. Briefly, the plasmid library was tagmented as described before (section 2.4.4). Instead of using primers upstream of the oBC, primers downstream of the mBC were used. Following 13 cycles of semi-specific PCR (primers: indexed Nextera P5 + oJBL358). Fragments in the range 350-700 bp were size selected on PAGE, and paired-end sequenced on a Nextseq500 (read1 32 cycles Nextera_read1 primer: tagmented CRE; read 2 32 cycles primer oJBL371: mBC; index 2 10 cycles Nextera_index2 primer: sample index). Bioinformatic processing of the data was similar as for oBC-CRE subassembly, yielding 15.5k mBC with median 65 mBC/CRE for p058 and 27.8k mBCs with median 117 mBC/CRE for p059 passing the individual library controls (see below for additional filter to avoid inter-library collisions in the pool). Library p092: given the architecture of the reporter was similar to the original p055 (except without the minimal promoter), we used the exact same strategy as previously described obtaining 49.4k valid mBCs and a median of 207 mBCs/CRE. Libraries p091 & p096: since the CREs were inserted downstream of the GFP reporter, the CREs could be directly subassembled to the mBCs in this context. To do so, we again used tagmentation followed by 13 cycles of semi-specific PCR (primers: indexed Nextera P7 + oJBL708) and a PAGE size selection (600 bp to 900 bp). The library was paired-end sequenced on a Nextseq500 (read1 32 cycles oJBL707 primer: start of inserted CRE; index1 18 cycles Nextera_index1 primer: sample index, read 2 32 cycles primer oJBL371: mBC). Data processing for subassembly was similar as for the oBC-CRE mapping in p055, and leading to identification of 32.0k and 47.3k valid mBCs, and a median of 162 and 241 mBCs/CRE for libraries pJBL091 and pJBL096 respectively. Promoters series (p033_v2 series) was performed by obtaining the list of mBC (PCR amplification and sequencing of the product) from sequencing of PCR products (two steps: PCR1 with Kapa HiFi in 20 uL, 4 cycles, primers oJBL039+oJBL358, Ampure 1x cleanup, PCR2 with Kapa HiFi in 20 uL, primers oJBL077+oJBL362-o366 indexed series for 10 cycles). The product was sequenced as a spike-in on Nextseq 2000 with custom primers (read 1: 148 cycles, primer oJBL369; index 1: 10 cycles, empty [blank read]; index2: 10 cycles, primer oJBL370). Read 1 was sufficiently long to cover both the mBC and the pseudo-UMI installed by PCR1 of the library preparation. Read1 fastq was then trimmed to separate fastqs for the two pieces of information for downstream processing. mBC and UMI where the piled-up as for MPRA amplicons. The bona fide set of mBC present in each respective libraries was then determined from the high-count mode of the bimodal distribution of UMI count per mBC, yielding a median of 2.8k mBCs per promoter (span of 0.3k to 4.3k mBC for the different promoter, with UBCp being less well represented). See manuscript for details of other subassemblies (e.g., processing of the Nanopore data).
Custom amplicons for reporter constructs subassemblies

Library strategy

OTHER

Library source

genomic

Library selection

other

Instrument model

NextSeq 2000

Description

oBC to CRE mapping for plasmid library mutated CREs (scQers in mEBs, v2)

Data processing

See library construction above and manuscript for details of data processing.
Supplementary files format and content: final_subassembly_bulk_MPRA_mEB_v2_poolA.txt.gz & final_subassembly_bulk_MPRA_mEB_v2_poolB.txt.gz: mBC map for bulk MPRA in mEBs (testing for different architectures): column 1: originating library short identifier, column 2: reporter architecture identifier, column 3: position of CRE relative to reporter, column 4: reporter ORF, column 5: promoter id, column 6: logical boolean indicating whether U6/oBC cassette is present on construct, column 7: logical boolean indicating whether cHS4 insulators are present on construct, column 8: pool id, column 9: CRE class, column 10: CRE id, column 11: mBC final_subassembly_bulk_MPRA_promoter_architecture_positional_effects.txt.gz: list of mBC associated with libraries of promoters with different reporter architecturs. column 1: originating library short identifier, column 2: reporter architecture identifier, column 3: reporter ORF, column 4: promoter id, column 5: logical boolean indicating whether U6/oBC cassette is present on construct, column 6: logical boolean indicating whether cHS4 insulators are present on construct, column 7: mBC final_subassembly_scQer_mEB_v2.txt.gz: list of oBC-CRE-mBC triplet for scQer (second round) in mEBs. column 1: library identifier, column 2: mBC, column 3: oBC, column 4: CRE class, column 5: identity CRE downstream (right, promoter proximal), column 6: orientation CRE right, column 7: identity CRE upstream (left, promoter distal), column 8: orientation CRE upstream. Note: only for the pairwise CREs are fields 7 and 8 not NA, reflecting the fact that only one CRE is present for those constructs in contrast to the CRE pairs libraries.

Submission date

Oct 12, 2023

Last update date

Oct 23, 2023

Contact name

Jean-Benoit Lalanne

E-mail(s)

[email protected]

Organization name

University of Washington

Department

Genome Sciences

Lab

Jay Shendure

Street address

3720 15th Ave NE

City

Seattle

State/province

ZIP/Postal code

98195

Country

USA

Platform ID

GPL32081

Series (2)

GSE217690	Multiplex profiling of developmental enhancers with quantitative, single-cell expression reporters
GSE245260	Multiplex profiling of developmental enhancers with quantitative, single-cell expression reporters [plasmid libraries subassembly, v2]

Relations

BioSample

SAMN37798526

SRA

SRX22182227

Supplementary data files not provided

SRA Run Selector

Raw data are available in SRA