| This method was based on comparison of ESTs from the same gene. Clustered ESTs |
| were directly downloaded from UniGene. To correct possible misclustering, UniGe |
| ne clusters were mapped onto genomic sequence by homology search followed by dy |
| namic programming. Many clustering errors (e.g. mixing up similar ESTs from par |
| alogous genes) were cleaned by screening of the ESTs within each cluster agains |
| t the genomic sequence. SNPs were detected by applying Bayesian statistical met |
| hods to these corrected EST clusters. Briefly, Trace chromatogram data of EST s |
| equences in Unigene were processed with PHRED. To identify likely SNPs, single |
| base mismatches were reported from multiple sequence alignments produced by the |
| programs BRO and POA for each Unigene cluster. BRO corrected possible misrepor |
| ted EST orientations, while POA identified and analyzed non-linear alignment st |
| ructures indicative of gene mixing/chimeras that might produce spurious SNPs. B |
| ayesian inference was used to weigh evidence for true polymorphism versus seque |
| ncing error, misalignment or ambiguity, misclustering or chimeric EST sequences |
| , assessing data such as raw chromatogram height, sharpness, overlap and spacin |
| g; sequencing error rates; context-sensitivity; cDNA library origin etc. |