RS with Multiple Mappings
Introduction
In current release of dbSNP data, around 28 thousand refSNPs (rs) are mapped to multiple chromosome positions, and rs1376641341 is one of them. As shown below, it has 4 mappings in GRCh38, with two primary hits on chromosome 1 and chromosome 2, respectively. In GRCh37, it also has two mappings, one on chromosome 1 fix patch, and the other on chromosome 2.
Some Analysis
While this is considered as a data issue, it can not be avoided completely under dbSNP current workflow of assembly to assembly mapping. Due to the complexities of human genome, there is a tradeoff between accuracy and completeness. Specifically, when a variation is reported in a highly repetitive or duplicated region in the reference genome, the chance for it to be mapped to several chromosome positions is significantly increased.
We can still take rs1376641341 as an example, to demonstrate the cause of multiple chromosome mappings for a single SNP. There is a single submission (ss3090307137) from TOPMED assigned to rs1376641341, and this submission was asserted with a chromosome position (chr1:144,077,475) in GRCh38. Unfortunately, this region of chromosome 1 was somehow misassembled in GRCh37 (sequences from 3 segmental duplications were collapsed into one location and there were also gaps). After a lot of work, a FIX patch scaffold was created that disambiguated the region and closed the gaps. The patch sequence was subsequently incorporated into the chromosome in GRCh38. This fact explains why rs1376641341 doesn’t have a mapping position on chromosome 1 in GRCh37, but on the fix patch scaffold. In the meantime, with the sequence ambiguity, rs1376641341 has two mapping positions on alternate sequences in GRCh38, one of which is mapped onto chromosome 2 in GRCh37.
The figure below demonstrates the mappings between GRCh38 and GRCh37 positions for rs1376641341. It is possible to filter out certain mapping results during the assembly to assembly remapping process, but any new rules may introduce new issues that we don’t currently have. So there may not be a perfect solution and it is always a tradeoff.
Lists of RS
In current dbSNP build (153), we have identified ~28K rs with multiple chromosome positions in GRCh38, and ~22M rs with a single chromosome position along with one or more mapping positions on ALT/PATCH sequences. The two lists of rs can be downloaded from dbSNP FTP site (https://ftp.ncbi.nih.gov/snp/others/).