- Introduction
- Compare external against existing family
- Find neighbours of a member within a viral family
- Get a list of family members within the identity range
- Methods
- How to cite PASC
|
|
Introduction
PASC (PAirwise Sequence Comparison) is a web tool for analysis of pairwise identity
distribution within viral families. The identities are pre-computed for every pair
within the families and with distribution plotted in a form of histogram where each
bar corresponds to an interval of identities.
|
Compare external genome against existing family
Go to the list of families and select the family you want to compare with, by clicking the name of the family. In the "Sequence:" box, specify the query genome using either its Accession/GI number or copy/paste it in FASTA format, or upload a file containing the sequence by clicking the "Browse" button. Please note that only complete genomes should be used as query sequences. The results from partial sequences are not suitable for the purpose of this tool. Up to 50 sequences can be added in one submission. After you submit your sequence, PASC will start computing pairwise identities between user provided genomes and the existing genome sequences of the family. At the end of the process, for each input genome, you will be presented with a list of pairwise identities, from the highest to the lowest, between this input genome and 1). the rest of input genomes (if there are more than one), and 2). 5 to 10 closest matches to existing genomes within the family. The identity distribution chart will depict currently selected genome with a different color. You can click on each genome's number to make it current, or you can click the identity to see details of the alignment.
|
Find neighbours of a member within a viral family
To see the best matches for a family member, specify that member using its GI.
The system will then be able to recognize if this GI is a member of the current family and, if so, retrieve identities
and the alignment data rather than recomputing them.
|
Get a list of family members within the identity range
By clicking on the identity distribution plot, you will retrieve the list of viral genomes with pairwise identities within
the selected range. You can retrieve details of each alignment by clicking in the identity column.
|
Methods
The identity distribution chart is plotted based on pairwise alignments computed between every members of the selected family or genus.
The alignment is done using the BLAST-based alignment method, which calculates pairwise identity as number of identical bases in
local hits divided by the average sequence length of the genome pair. Local hits are found by blastn (nucleotide to nucleotide) and blastp (translated sequences in all 6 frames from one genome to another).
The process is typically scaled to multiple CPUs to reduce computing time for large sets of organisms and for those with longer genomes.
To increase the speed of the tool, sequences with the same taxid and identities higher than a predefined value (varies between 95-99.5% for different viral groups) are represented by one sequence in the dataset. The excluded sequences are referred as redundant sequences here.
|
How to cite PASC
Bao Y, Chetvernin V, Tatusova T. Improvements to pairwise sequence comparison
(PASC): a genome-based web tool for virus classification. Arch Virol. 2014 Dec;159(12):3293-3304. doi:10.1007/s00705-014-2197-x
|