GEO Accession viewer

NCBI > GEO > Accession Display

Not logged in | Login

GEO help: Mouse over screen elements for information.

Sample GSM5824332

Query DataSets for GSM5824332

Status

Public on Jul 05, 2022

Title

Human tumor 2

Sample type

SRA

Source name

Human SCLC tumor

Organism

Homo sapiens

Characteristics

tissue: SCLC tumor
collection method: Upper right lobectomy
stage/treatment: Stage 1B mixed SCLC/LCNEC, relapsed (EP standard of care)

Treatment protocol

Both patients received etoposide and cisplatin. Tumor 1 came from a patient who also received prophylactic cranial irradiation

Growth protocol

The two human SCLC tumors were collected in collaboration with Vanderbilt University Medical Center. Tumor was immediately placed on cold RPMI on ice for dissociation for sequencing.

Extracted molecule

total RNA

Extraction protocol

Tissue was washed in an RBC lysis buffer, passed through a 70 μm filter, and washed in PBS. Cells were dissociated with cold DNAse and proteases and titrated every 5-10 minutes to increase dissociation.
Tumor 1: . Library preparation for scRNA-seq was performed according to previous protocols (Banerjee et al., 2020), and cells were sequenced using BGI MGI-seq. Tumor 2: Tumor was immediately placed in cold RPMI on ice for dissociation. Library preparation for scRNA-seq was performed as described previously (Banerjee et al., 2020). Cells were prepared for sequencing using TruDrop (Southard-Smith et al., 2020) and sequenced on Nova-seq.

Library strategy

RNA-Seq

Library source

transcriptomic

Library selection

cDNA

Instrument model

DNBSEQ-T7

Data processing

DropEst pipeline was used to process scRNA-seq data and to generate count matrices of each gene in each cell (Petukhov et al., 2018). Specifically, cell barcodes and UMIs were extracted by dropTag, reads were aligned to the human reference transcriptome hg38 using STAR (Dobin et al., 2013) and cell barcode errors were corrected and gene-by-cell count matrices and three other count matrices for exons, introns and exon/intron spanning reads were measured by dropEst. Spliced and unspliced reads were annotated and RNA expression dynamics of single cells were estimated by velocyto (Manno et al., 2018).
We initially filter out cells with < 100 genes using scanpy.pp.filter_cells with min_genes = 100. This reduces the total number of cells (in both tumors) from 8,649 to 4,863. We remove genes found in < 3 total spliced counts across all samples with scanpy.pp.filter_genes with min_counts < 3. This reduces the number of genes from 43,306 to 15,344. These filtering steps ensure we remove any cells or genes with low or no reads, to prepare for further filtering steps below.
We then use Dropkick v1.2.6 (https://github.com/KenLauLab/dropkick) to label and filter out low quality cells from each sample. Once each sample has dropkick scores and labels (using dropkick score > 0.5 to identify high quality cells), we then concatenate the datasets with a batch key for each tumor.
We normalize the data using scanpy.pp.normalize_total and used default arguments for this normalization (as described in the Scanpy API v 1.8), such that after normalization, each cell has a total count equal to the median of total counts for cells before normalization. Next, numpy.log1p is used to log transform the data (+1, which keeps zeroes from being transformed to negative infinity). Finally, the log-transformed, normalized counts are then scaled using scapy.pp.scale, which rescales the data to unit variance and mean-centers each gene. We then use scanpy.tl.pca to compute a 50-component PCA embedding of the data, using all genes, and use scvelo.pp.neighbors and scvelo.tl.umap to generate a UMAP dimensionality reduction.
We use Scrublet to determine the number of possible doublets in the data. One tumor had 6 predicted doublets (out of 7741 original cells before filtering); the other had 3 (out of 580 original cells), which were removed. Because these samples may contain non-tumor cells, we annotated clusters by cell type based on expression of tissue compartment markers. First, we clustered cells by the Leiden algorithm and then analyzed expression of markers across these clusters. To remove immune cells, we filtered clusters by expression of PTPRC. To remove fibroblasts, we filtered cells used COL1A1 expression, and we used CLDN5 expression to remove endothelial cells. We also used EPCAM to identify epithelial cells. We found several small clusters of immune cells and a single small population of likely fibroblasts. A single cluster had a few cells with low expression of CLDN5, and higher average expression of EPCAM, so we chose not to remove this cluster. Removing the other non-cancer cells reduced the number of cells from 4,863 to 4,485. After removing so many cells, we re-filter the genes with a low threshold (min_cells = 3) to remove any genes that were only expressed in the low-quality cells. This reduces the number of genes from 15,344 to 13,938. Therefore, we moved forward with the analysis with 1618 cells and 13,938 across two tumors.
We use Scrublet to determine the number of possible doublets in the data. One tumor had 6 predicted doublets (out of 7741 original cells before filtering); the other had 3 (out of 580 original cells), which were removed.
Because these samples may contain non-tumor cells, we annotated clusters by cell type based on expression of tissue compartment markers. First, we clustered cells by the Leiden algorithm and then analyzed expression of markers across these clusters. To remove immune cells, we filtered clusters by expression of PTPRC. To remove fibroblasts, we filtered cells used COL1A1 expression, and we used CLDN5 expression to remove endothelial cells. We also used EPCAM to identify epithelial cells. We found several small clusters of immune cells and a single small population of likely fibroblasts. A single cluster had a few cells with low expression of CLDN5, and higher average expression of EPCAM, so we chose not to remove this cluster. Removing the other non-cancer cells reduced the number of cells from 4,863 to 4,485.
After removing so many cells, we re-filter the genes with a low threshold (min_cells = 3) to remove any genes that were only expressed in the low-quality cells. This reduces the number of genes from 15,344 to 13,938. Therefore, we moved forward with the analysis with 1618 cells and 13,938 across two tumors.
Genome_build: Hg38
Supplementary_files_format_and_content: 3359-PK-1-GCCAAT-ATCAGT_S1_L001.loom, 236D_236D_V300044428.loom. Loom format containing spliced, unspliced, ambiguous and spanning matrices.

Submission date

Jan 19, 2022

Last update date

Jul 05, 2022

Contact name

Marisol Ramirez

E-mail(s)

[email protected]

Organization name

Vanderbilt University Medical Center

Department

Center for Quantitative Sciences, Department of Biostatisticss

Street address

2220 Pierce Avenue, 571 Preston Research Building

City

NASHVILLE

State/province

TENNESSEE

ZIP/Postal code

37232-6848

Country

USA

Platform ID

GPL29480

Series (2)

GSE193960	Archetype tasks link intratumoral heterogeneity to plasticity in recalcitrant small cell lung cancer [Human tumor]
GSE193961	Archetype tasks link intratumoral heterogeneity to plasticity in recalcitrant small cell lung cancer

Relations

BioSample

SAMN25117886

SRA

SRX13829011

Supplementary file	Size	Download	File type/resource
GSM5824332_236D_236D_V300044428.loom.gz	32.8 Mb	(ftp)(http)	LOOM
SRA Run Selector
Raw data are available in SRA
Processed data provided as supplementary file