| We have developed SNP Xplore pipeline to mine single nucleotide substitutions, short deletion and insertion polymorphisms from Celera whole-genome shotgun assembly (WGA) (1, 2). The pipeline includes three major processes: |
| 1. Identify the potential nucleotide variants from the WGA2 alignment, which includes all Celera reads and public BACtigs. All fragments imported for the Assembler have been cleaned and trimmed by the assembly process (1). The potential nucleotide v |
| ariants need to pass the sequence quality value (QV), neighbor quality value (NQV) and the heterozygosity check. The default QV value is > 23 for the polymorphic base and >21 for the minimal neighbor QV (4 bps). For indel variations, which only the neighb |
| or QV are available, the default minimal NQV should be >23. In addition to this general simple QV rule, Xplore implemented our plurality rule. This rule scavenges the SNPs with marginal level of QV. For the deep covered minor alleles, the QV threshold is |
| adjusted lower. Every supported minor allele will decrease the threshold, but the minimal QV cutoff is not below 16. |
| 2. Measure the SNP features (3). Each potential SNP was gone through our quality assessment metric for getting a set of assessment values, including neighbor variant density (NB20), Binomial Probability P(n,k), fragment edge measure (EM), SNP type, |
| Phred quality value, neighbor quality value, neighbor nucleotide composition, pairwise dissimilarity rate, and donor and allele frequency. |
| 3. Selection process. Based on the quality assessment metric, the SNP were selected based on the composite consideration of each individual criterion. For high quality SNPs in this dump, we selected the composite index: matrixN. This matrix has been |
| tested for giving the highest ratio of validated vs non validated SNPs from our in house resequence project. |
| Reference - |
| (1) Venter, Craig; et al. Science vol. 291, no.5507, p. 1304 ? 1351, 2001 |
| (2) Istrail, Sorin; et al. PNAS vol. 101, no. 7, p. 1916 -1921, 2004 |
| (3) Gu, Zhiping; Nathan Edwards, Cai, Shuang; Levitsky, Alexander; Fosler, Carl; Chiang, Chia-Chien; Wan, Chunhua; Ingber, Daniel; Li, Peter, Mining quality SNPs from Whole Genome Shotgun Assembly, in preparation |