Genotyping
PubMed Nucleotide Protein Genome Taxonomy Structure PopSet

 

Resources:

Retroviruses

 

Reference sets:

Retroviruses
Hepatitis B
Hepatitis C
Poliovirus

 

Help

Tools:

Genotyping
Alignment

 

External links:

Division of AIDS at NIAID

LANL HIV Sequence Database

Stanford HIV RT and Protease Gene Database

 


Overview back to top

  This Web resource helps identify the genotype (subtype) of a viral sequence. It comprises the Genotyping and Alignment tools. A detailed description of the NCBI genotyping tool has been published in Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W654-9 .
 
  In the Genotyping tool, a window is slid along the query sequence, and each window is compared by BLAST to the reference sequences for different virus subtypes. The BLAST scores for each reference sequence are plotted with one of the subtype-specific colors that are provided on the right side of the graphics. The thick "subtype" bar on the top consists of rectangles, one for each BLAST window. Each rectangle is painted with a color corresponding to the best cumulative score if the latter is greater than a threshold chosen. The top-scoring genotype is displayed for each window as you mouse-over it, either in the form of a tooltip next to the pointer or as a label on the status bar, depending on the browser. Because of the presence of similar colors or colored patterns designating different subtypes, we recommended that you mouse-over the subtype bar, paying attention to the subtype values. This approach allows visualization of similarities for any part of the query and, therefore, is especially useful for the analysis of recombinant sequences.
 
  The Alignment tool compares a query sequence(s) to the reference genome sequence or master sequence from one of the reference sets of subtypes. This tool also uses BLAST. Multiple query sequences are anchored to the master sequence in the order of descending similarity. The alignment representation is under development and will change soon.
 

Query sequence format back to top

  The query may be submitted as a GenBank or Genome Accession, as a GI, or as a nucleotide sequence in FASTA format. In the present version of the genotyping tool, only one query sequence can be processed; any others will be ignored without warning. The alignment tool accepts multiple queries, including mixed accessions/GI/FASTA queries in which accessions/GI precede FASTA-formatted sequences. Different FASTA sequences in one query must have different names; otherwise, only the first sequence will be taken into account, and a warning will pop up.
 

Reference sequences format back to top

  An existing reference set can be modified or replaced with a new one from the "advanced" page of the genotyping tool. To create an all-new custom reference set, chose "Clear all" from the "Select" option box first; then type in new values in the format "GenBank accession"|"Genotype name", e.g.,

PLEASE NOTE:
  • The sequence does not have to be from a local database (it can be any existing one). But using accessions from only local database will increase perfomance.
  • The name of the subtype will be truncated to 10 characters if longer.
  • The FASTA definition line will be truncated to 10 characters if longer.
  • The maximum number of different colors currently used for plotting is 18, which is due to the limited number of colors that can be easily distinguished in browsers. However, the raw data can be saved to a local computer via the option non-graphical output and further processed with the help of a spreadsheet program.

Local reference sequences format back to top

  You can enter a local reference sequence(s) as a GenBank Accession, GI, or in FASTA format, but you may not mix different format types.

FASTA example:

GI example

Accession number example

PLEASE NOTE:
  • The FASTA definition line will be truncated to 10 characters if longer.
  • The maximum number of different colors currently used for plotting is 18, which is due to the limited number of colors that can be easily distinguished in browsers. However, the raw data can be saved to a local computer via option non-graphical output and further processed with the help of a spreadsheet program.

Window size and increment back to top

  The default values are 300 for "window" and 100 for "increment". These parameters may be changed from the "advanced" page. The minimum values for "window" is 50 and for "increment" is 10. However, if any of these parameters is set to a value greater than the sequence length, the program will re-set the window size to be equal to the sequence length and increment to 0 (zero). The lower value of increment and window size will result in an increase of memory consumption and poor performance. The window alignment option allows one to obtain alignment for a window-size portion of a query sequence.
 

Similarity threshold back to top

  Sets similarity threshold in percentage. Only those similarities that are greater that the value of this parameter will contribute to the score used to draw the "subtype" bar in the resulting plot. The default value is "30".
 

Other parameters back to top

  Background color allows changing the background of the plot.
  Table output only, if set, will force table-like output. Then the table can be saved on a local data storage device and further processed with a spreadsheet program.
 

Window alignment back to top

  Available after genotyping is done. It allows one to obtain alignment for a window-size portion of a query sequence.
  The "subtype" bar on the top of plot consists of rectangle images, one for each BLAST window. Each rectangle is painted with a color corresponding to the genotype with the best score in this window. If none of the reference sequences produced a cumulative score greater than a threshold chosen, no rectangle will be drawn.
  To blast a window click on the corresponding rectangle of that "subtype" bar. Read the number from the popup window. Enter it in a text area under the "Blast window" button in the top left corner of resulting page. Press the "Blast window" button.
 

Global alignment back to top

  Available after genotyping is done. Select either all or a particular reference sequence to align with the query.



Comments and suggestions to: [[email protected]]

Revised: August 5, 2004