GEO Accession viewer

NCBI > GEO > Accession Display

Not logged in | Login

GEO help: Mouse over screen elements for information.

Sample GSM1608505

Query DataSets for GSM1608505

Status

Public on Aug 20, 2015

Title

GM12878_HIC_1

Sample type

SRA

Source name

Lymphoblastoid Cell Line

Organism

Homo sapiens

Characteristics

cell line: GM12878

Treatment protocol

Cross linking was performed in 1% formaldehyde for 10 minutes at room temperature. Followed by quenching with Glycin with a final concentration of 125mM

Growth protocol

LCLs were grown to a density of 0.6-0.8 x 10^6/mL in RPMI1640 with 15% fetal bovine serum and 1% PenStrep.

Extracted molecule

genomic DNA

Extraction protocol

25 million cells for GM12878 were cross linked and chromatin digested with HindIII. DNA overhangs were biotinylated and proximity ligated under dilute conditions to favor ligation of fragments in three-dimensional proximity. DNA was then sheared, biotinylated fragments were enriched with streptavidin beads and prepared for Illumina sequencing
For ChIP-Seq, nuclear lysates were sonicated using a Branson 250 Sonifier (power setting 2, 100% duty cycle for 7 x 30-s intervals). Clarified lysates corresponding to 20 million cells were treated with 1-5ug of antibody coupled to Protein G Dynabeads (Invitrogen #10003D, New York). The protein-DNA complexes were washed with RIPA buffer and eluted in 1% SDS TE at 65°C.
ChIP DNA sequencing libraries were generated according to Illumina DNA Tru-Seq DNA Sample Preparation Kit Instructions (Illumina Part # FC-121-2001, San Diego, CA).

Library strategy

OTHER

Library source

genomic

Library selection

other

Instrument model

Illumina HiSeq 2000

Description

HiC.GM12878.correlations.txt.gz
library strategy: HiC

Data processing

Personal genomes were created by adding the SNPs of each individual into hg19. SNPs were obtained from the 1000 Genomes Project or imputed using shapeIt and IMPUTE2
For ChIP-seq, ChIP-seq reads were aligned to personal genomes with BWA 0.6.1 (options -q 20, -t 4 and the rest set to defaults)
For ChIP-seq, peaks were called using MACS2 on merged replicates, subsampled to 50 million single reads. Genome-wide signal tracks (bigWig) were generated using wiggler, also on the subsampled data.
For HiC, we aligned reads using HICUP, which aligns reads to an in-silico HindIII digested genome in the hg19 assembly. Our HiC-interaction analysis is based on restriction-fragment level resolution. We obtained the interaction count for each pair of fragments. To estimate the proximity between two restriction fragments A and B we calculated their co-variance as the fraction of fragments that interact with both A and B, normalized by the number of fragments interacting only with A or only with B. The processed data file contains unique interactions, specified with the following entries: i=id of first interaction fragment, j=id of second interaction fragment, x=number of shared interaction partners between fragment i and j, cor=proportion of interaction partners of i and j that are shared between i and j, pos_i=midpoint of fragment i, pos_j=midpoint of fragment j, pair_id=unique identifier for the interaction. The file contains only interactions with an entry for cor > 0.2
For ChIA-PET, data analysis was carried out using software developed in-house, Mango (paper submitted). PETs were trimmed to remove linker sequences. In addition, only PETs that have the same linker sequences at both ends are kept for further processing. The resulting reads were aligned to the genome using the Bowtie software suite(Ben Langmead et al., 2009). Duplicate reads were removed that may be due to PCR duplication. MACS2 was used to call binding peaks, which are subsequently used as anchor regions for the detection of interactions in the next step(Zhang et al., 2008). The probability of observing a PET linking any two peaks was modeled as a function of both genomic distance and the read depth of each peak. Using this model statistical confidence estimates are assigned to interactions. The resulting P-values are corrected to account for multiple hypothesis testing using the Benjamini-Hochberg method and filtered to a user defined false discovery rate (FDR). The processed files are in bedpe format and contain the following entries: chrom1: chromosome of anchor 1, start1: start position of anchor 1, end1: end position of anchor 1, chrom2: chromosome of anchor 2, start2: start position of anchor 2, end2: end position of anchor 1, name: unique ID for each interaction, peak1: # of mid range PETs in anchor1, peak2: # of mid range PETs in anchor2, PETs: # of PETs linking anchor 1 to anchor 2, distance: # the distance in between anchor 1 to anchor 2, P_IAB_distance: The probability of observing a PET with the distance that these two anchors are separated by. P_combos_distance: The probability of observing two anchors separated by the distance that these two anchors are separated by. P_IAB_depth: The probability of observing a PET linking two anchors with the same read depths as these two anchors, P_combos_depth: The probability of observing a pair of loci with the same read depths as these two anchors, p_binom: # The binomial probability of observing a single PET linking these two loci, P: # The actual P-value of the interaction (calculated using the binomial distribution), Q: # The P-value after Benjamini-Hochberg correction (for all possible pairs of loci not just once linked by >= 1 PET).
Genome_build: hg19
Supplementary_files_format_and_content: Peaks are in the narrowPeak format. bigWig files were generated using wiggler.

Submission date

Feb 12, 2015

Last update date

May 15, 2019

Contact name

Fabian Grubert

E-mail(s)

[email protected]

Organization name

Stanford University

Street address

300 Pasteur Drive