PASC

Overview

Families/Genera

Documentation

Contacts

Viral Genome Resources

Introduction
Compare external against existing family
Find neighbours of a member within a viral family
Get a list of family members within the identity range
Methods
How to cite PASC

Introduction

PASC (PAirwise Sequence Comparison) is a web tool for analysis of pairwise identity distribution within viral families. The identities are pre-computed for every pair within the families and with distribution plotted in a form of histogram where each bar corresponds to an interval of identities.

Compare external genome against existing family

Go to the list of families and select the family you want to compare with, by clicking the name of the family. In the "Sequence:" box, specify the query genome using either its Accession/GI number or copy/paste it in FASTA format, or upload a file containing the sequence by clicking the "Browse" button. Please note that only complete genomes should be used as query sequences. The results from partial sequences are not suitable for the purpose of this tool. Up to 50 sequences can be added in one submission. After you submit your sequence, PASC will start computing pairwise identities between user provided genomes and the existing genome sequences of the family. At the end of the process, for each input genome, you will be presented with a list of pairwise identities, from the highest to the lowest, between this input genome and 1). the rest of input genomes (if there are more than one), and 2). 5 to 10 closest matches to existing genomes within the family. The identity distribution chart will depict currently selected genome with a different color. You can click on each genome's number to make it current, or you can click the identity to see details of the alignment.

Find neighbours of a member within a viral family

To see the best matches for a family member, specify that member using its GI. The system will then be able to recognize if this GI is a member of the current family and, if so, retrieve identities and the alignment data rather than recomputing them.

Get a list of family members within the identity range

By clicking on the identity distribution plot, you will retrieve the list of viral genomes with pairwise identities within the selected range. You can retrieve details of each alignment by clicking in the identity column.

Methods

The identity distribution chart is plotted based on pairwise alignments computed between every members of the selected family or genus.

The alignment is done using the BLAST-based alignment method, which calculates pairwise identity as number of identical bases in local hits divided by the average sequence length of the genome pair. Local hits are found by blastn (nucleotide to nucleotide) and blastp (translated sequences in all 6 frames from one genome to another).

The process is typically scaled to multiple CPUs to reduce computing time for large sets of organisms and for those with longer genomes.

To increase the speed of the tool, sequences with the same taxid and identities higher than a predefined value (varies between 95-99.5% for different viral groups) are represented by one sequence in the dataset. The excluded sequences are referred as redundant sequences here.

How to cite PASC

Bao Y, Chetvernin V, Tatusova T. Improvements to pairwise sequence comparison (PASC): a genome-based web tool for virus classification. Arch Virol. 2014 Dec;159(12):3293-3304. doi:10.1007/s00705-014-2197-x

|Disclaimer |Privacy statement | Accessibility |

Important Update