U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

SNP FAQ Archive [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2005-.

  • This publication is provided for historical reference only and the information may be out of date.

This publication is provided for historical reference only and the information may be out of date.

Ascertainment Bias

Created: ; Last Update: February 18, 2014.

Estimated reading time: 1 minute

What is ascertainment bias, and how does it relate to the calculation of SNP frequencies?

Ascertainment bias is a term in population genetics that describes systematic deviations from an expected theoretical result attributable to the sampling processes used to find (ascertain) SNPs and measure (estimate) their population-specific allele frequencies.

The distribution of SNP "derived" allele frequency ranges in nature from 1/2N (i.e., one mutant chromosome in the entire species gene pool), to 2N-1/2N (i.e., only one chromosome left in the species gene pool to represent the "ancestral" allelic state; all other chromosomes have the derived allele). This distribution, however, is imperfectly measured when finite sub-samples are drawn from the population. The smaller the finite sample used in our SNP detection (sampling) process, the more “imperfect” the fit between the distribution of derived allele frequencies and the “true” distribution in nature. Here is a graph that illustrates the bias:

Image Cnt_Ascertmnt_Bias-Image001.jpg

Modified from Fig.2, Ref. (1).

If SNPs in dbSNP are ascertained in samples of a few chromosomes, then a fraction of those SNPs will be excessively common in the population relative to potentially larger samples of the same genomic sequence.

Experimental validation typically means observing the SNP in additional samples unrelated to the original set of chromosomes surveyed to define the SNP. Therefore, common SNPs will “validate” with a higher rate than SNPs with a really rare minor allele, because larger samples are needed to recapture (and hence confirm) the rare variation. Because all samples used to ascertain (discover) SNPs or estimate their allele frequencies in specific population samples are of finite size, there will be some kind of ascertainment bias in every batch of data submitted to dbSNP.

References

1.

Marth GT, Czabarka E, Murvai J, Sherry ST. (2004) The allele frequency spectrum in genome-wide human variation data reveals signals of differential demographic history in three large world populations. Genetics. Jan;166(1):351-372

Views

Other titles in this collection

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...