NCBI Logo
GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
          Go
Sample GSM5824332 Query DataSets for GSM5824332
Status Public on Jul 05, 2022
Title Human tumor 2
Sample type SRA
 
Source name Human SCLC tumor
Organism Homo sapiens
Characteristics tissue: SCLC tumor
collection method: Upper right lobectomy
stage/treatment: Stage 1B mixed SCLC/LCNEC, relapsed (EP standard of care)
Treatment protocol Both patients received etoposide and cisplatin. Tumor 1 came from a patient who also received prophylactic cranial irradiation
Growth protocol The two human SCLC tumors were collected in collaboration with Vanderbilt University Medical Center. Tumor was immediately placed on cold RPMI on ice for dissociation for sequencing.
Extracted molecule total RNA
Extraction protocol Tissue was washed in an RBC lysis buffer, passed through a 70 μm filter, and washed in PBS. Cells were dissociated with cold DNAse and proteases and titrated every 5-10 minutes to increase dissociation.
Tumor 1: . Library preparation for scRNA-seq was performed according to previous protocols (Banerjee et al., 2020), and cells were sequenced using BGI MGI-seq. Tumor 2: Tumor was immediately placed in cold RPMI on ice for dissociation. Library preparation for scRNA-seq was performed as described previously (Banerjee et al., 2020). Cells were prepared for sequencing using TruDrop (Southard-Smith et al., 2020) and sequenced on Nova-seq.
 
Library strategy RNA-Seq
Library source transcriptomic
Library selection cDNA
Instrument model DNBSEQ-T7
 
Data processing DropEst pipeline was used to process scRNA-seq data and to generate count matrices of each gene in each cell (Petukhov et al., 2018). Specifically, cell barcodes and UMIs were extracted by dropTag, reads were aligned to the human reference transcriptome hg38 using STAR (Dobin et al., 2013) and cell barcode errors were corrected and gene-by-cell count matrices and three other count matrices for exons, introns and exon/intron spanning reads were measured by dropEst. Spliced and unspliced reads were annotated and RNA expression dynamics of single cells were estimated by velocyto (Manno et al., 2018).
We initially filter out cells with < 100 genes using scanpy.pp.filter_cells with min_genes = 100. This reduces the total number of cells (in both tumors) from 8,649 to 4,863. We remove genes found in < 3 total spliced counts across all samples with scanpy.pp.filter_genes with min_counts < 3. This reduces the number of genes from 43,306 to 15,344. These filtering steps ensure we remove any cells or genes with low or no reads, to prepare for further filtering steps below.
We then use Dropkick v1.2.6 (https://github.com/KenLauLab/dropkick) to label and filter out low quality cells from each sample. Once each sample has dropkick scores and labels (using dropkick score > 0.5 to identify high quality cells), we then concatenate the datasets with a batch key for each tumor.
We normalize the data using scanpy.pp.normalize_total and used default arguments for this normalization (as described in the Scanpy API v 1.8), such that after normalization, each cell has a total count equal to the median of total counts for cells before normalization. Next, numpy.log1p is used to log transform the data (+1, which keeps zeroes from being transformed to negative infinity). Finally, the log-transformed, normalized counts are then scaled using scapy.pp.scale, which rescales the data to unit variance and mean-centers each gene. We then use scanpy.tl.pca to compute a 50-component PCA embedding of the data, using all genes, and use scvelo.pp.neighbors and scvelo.tl.umap to generate a UMAP dimensionality reduction.
We use Scrublet to determine the number of possible doublets in the data. One tumor had 6 predicted doublets (out of 7741 original cells before filtering); the other had 3 (out of 580 original cells), which were removed. Because these samples may contain non-tumor cells, we annotated clusters by cell type based on expression of tissue compartment markers. First, we clustered cells by the Leiden algorithm and then analyzed expression of markers across these clusters. To remove immune cells, we filtered clusters by expression of PTPRC. To remove fibroblasts, we filtered cells used COL1A1 expression, and we used CLDN5 expression to remove endothelial cells. We also used EPCAM to identify epithelial cells. We found several small clusters of immune cells and a single small population of likely fibroblasts. A single cluster had a few cells with low expression of CLDN5, and higher average expression of EPCAM, so we chose not to remove this cluster. Removing the other non-cancer cells reduced the number of cells from 4,863 to 4,485. After removing so many cells, we re-filter the genes with a low threshold (min_cells = 3) to remove any genes that were only expressed in the low-quality cells. This reduces the number of genes from 15,344 to 13,938. Therefore, we moved forward with the analysis with 1618 cells and 13,938 across two tumors.
We use Scrublet to determine the number of possible doublets in the data. One tumor had 6 predicted doublets (out of 7741 original cells before filtering); the other had 3 (out of 580 original cells), which were removed.
Because these samples may contain non-tumor cells, we annotated clusters by cell type based on expression of tissue compartment markers. First, we clustered cells by the Leiden algorithm and then analyzed expression of markers across these clusters. To remove immune cells, we filtered clusters by expression of PTPRC. To remove fibroblasts, we filtered cells used COL1A1 expression, and we used CLDN5 expression to remove endothelial cells. We also used EPCAM to identify epithelial cells. We found several small clusters of immune cells and a single small population of likely fibroblasts. A single cluster had a few cells with low expression of CLDN5, and higher average expression of EPCAM, so we chose not to remove this cluster. Removing the other non-cancer cells reduced the number of cells from 4,863 to 4,485.
After removing so many cells, we re-filter the genes with a low threshold (min_cells = 3) to remove any genes that were only expressed in the low-quality cells. This reduces the number of genes from 15,344 to 13,938. Therefore, we moved forward with the analysis with 1618 cells and 13,938 across two tumors.
Genome_build: Hg38
Supplementary_files_format_and_content: 3359-PK-1-GCCAAT-ATCAGT_S1_L001.loom, 236D_236D_V300044428.loom. Loom format containing spliced, unspliced, ambiguous and spanning matrices.
 
Submission date Jan 19, 2022
Last update date Jul 05, 2022
Contact name Marisol Ramirez
E-mail(s) [email protected]
Organization name Vanderbilt University Medical Center
Department Center for Quantitative Sciences, Department of Biostatisticss
Street address 2220 Pierce Avenue, 571 Preston Research Building
City NASHVILLE
State/province TENNESSEE
ZIP/Postal code 37232-6848
Country USA
 
Platform ID GPL29480
Series (2)
GSE193960 Archetype tasks link intratumoral heterogeneity to plasticity in recalcitrant small cell lung cancer [Human tumor]
GSE193961 Archetype tasks link intratumoral heterogeneity to plasticity in recalcitrant small cell lung cancer
Relations
BioSample SAMN25117886
SRA SRX13829011

Supplementary file Size Download File type/resource
GSM5824332_236D_236D_V300044428.loom.gz 32.8 Mb (ftp)(http) LOOM
SRA Run SelectorHelp
Raw data are available in SRA
Processed data provided as supplementary file

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap