NCBI
dbSNP

dbVar ClinVar GaP PubMed Nucleotide Protein
Search small variations in dbSNP or large structural variations in dbVar
transparent GIF
Spacer gif
Have a question about dbSNP? Try searching the SNP FAQ Archive!

Spacer gif
Method Detail
Submitter Method Handle: DEVINE_LAB
Submitter Method ID: METHOD_WGS
Submitted method description:
We used a multi-step computational pipeline for INDEL identification.
1. Vector screening using NCBI VecScreen system and trimming based on quality score.
Traces are trimmed when scores are below Phred 25 for five bases in a row.
Moreover, trimmed length has to be at least 100 bases with an average score greater than
or equal to Phred 25.
2. Repeatmask traces using Repeatmasker and Maskeraid.
3. Megablast to Golden Path Build 35 sequence with these options: -q 100 p 95 F F
4. For each trace that hits the Golden Path, identify anchor sequence with a minimum of
50 bases and 100% match to a single location. Based on the anchor sequence, traces are
then unmasked and aligned against the mapped locations using Bl2seq (NCBI). INDELs up
to 16 bases in length are recorded from this analysis if they are flanked on both sides
by five bases with Phred quality scores of 25 or greater.
5. When a mismatch is shown in the beginning or end of a trace, and the number of mismatched
bases is greater than or equal to 10, a special computer program (FindMatch) is used to look
for matching bases upstream or downstream. If a match is found (at least 95% identity),
and the surrounding 5 bases on both sides have a quality score of 25 or more, an INDEL is recorded.
6. Identified INDELs were mapped, where possible, to the completed Golden Path Build 1 Version 1
chimp sequence to identify the ancestral allele.
7. Double-hit status was determined for each INDEL on the basis of the trace allele matching either
the ancestral chimp allele or at least one other trace allele. Identified single base INDELs
without double-hit status were discarded.
8. Traces used with this method were generated from the Baylor and Whitehead Genome Centers
as part of an effort to identify SNPs in the human genome. We obtained 8,278,155 of these
traces from the Trace DB archive at NCBI. The DNA samples used to generate these traces
were generated from eight humans of African American decent as described for method WI-WGS-200306.

This method was used in the following submission:

Submitter Handle Batch Type Submitter batch id Release build id
DEVINE_LAB Assay EMORY_INDEL_GP35_WGS 126

GENERAL: Contact Us | Homepage | Announcements |dbSNP Summary | Genome | FTP SERVER | Build History | Handle Request
DOCUMENTATION: FAQ | Searchable FAQ Archive | Overview | How to Submit | RefSNP Summary Info | Database Schema
SEARCH: Entrez SNP | Blast SNP | Batch Query | By Submitter |New Batches | Method | Population | Publication | Batch | Locus Info | Between Marker
NCBI: PubMed | Entrez | BLAST | OMIM | Taxonomy | Structure

Disclaimer     Privacy statement