Annotation Examples
- mRNA sequence
- Prokaryotic gene
- Eukaryotic gene
- Promoter region
- Viral sequence
- HIV-1
- Transposon or insertion sequence
- Microsatellite sequence
- Repeat regions
- Pseudogene
- Translocation and/or fusion protein
- Cloning vector
- Gapped sequence
- Phylogenetic or population set
- EST submissions
- GSS submissions
- STS submissions
- HTGS submissions
- FLICs submissions
mRNA sequence
Relevant feature information for a mRNA (cDNA) sequence encoding a protein:- coding region intervals, including start and stop codons
- protein name
- gene name, if available
- amino acid sequence, if available
We strongly suggest that you provide as much of the above information as possible to ensure the most complete annotation of your sequence. If any of this information is not known, please inform us.
Example:
Homo sapiens prolidase (PEPD) mRNA, complete cds. source 1..1888 /organism="Homo sapiens" /chromosome="19" /map="19q12-q13.2" /cell_type="fibroblasts" gene 1..1888 /gene="PEPD" CDS 17..1498 /gene="PEPD" /EC_number="3.4.13.9" /note="imidodipeptidase" /product="prolidase"
Prokaryotic gene
Relevant feature information for a prokaryotic genomic sequence encoding a protein:- coding region intervals, including start and stop codons, if present
- protein name
- gene name, if known
- amino acid sequence, if known
We strongly suggest that you provide as much of the above information as possible to ensure the most complete annotation of your sequence. If any of this information is not known, please inform us.
Example:
Escherichia coli RecA protein (recA) gene, complete cds. source 1..3300 /organism="Escherichia coli" /strain="K-12" gene 783..1961 /gene="recA" CDS 783..1961 /gene="recA" /function="DNA repair protein" /product="RecA protein"
Eukaryotic gene
Relevant feature information for a eukaryotic genomic sequence encoding a protein:- coding region intervals, including start and stop codons, if present, and all exon intervals
- protein name
- gene name, if known
- amino acid sequence, if known
We strongly suggest that you provide as much of the above information as possible to ensure the most complete annotation of your sequence. If any of this information is not known, please inform us.
Example:
Caenorhabditis elegans tyrosine kinase PTK-2 (ptk-2) gene, complete cds. source 1..3180 /organism="Caenorhabditis elegans" gene 211..3011 /gene="ptk-2" mRNA join(211..288,533..703,763..890,940..1024, 1084..1380,1838..1962,2018..2099,2301..3011) /gene="ptk-2" /product="protein kinase PTK-2" CDS join(250..288,533..703,763..890,940..1024, 1084..1380,1838..1962,2018..2099,2301..2456) /gene="ptk-2" /product="protein kinase PTK-2"
Promoter region
Relevant feature information for promoter, genomic 5' flanking sequence, or genomic 3' flanking sequence:- protein or gene name for the sequence to which the promoter or flanking region belongs
- intervals of any transcribed regions or coding regions, if present on the sequence
We strongly suggest that you provide as much of the above information as possible to ensure the most complete annotation of your sequence. If any of this information is not known, please inform us.
Example:
Homo sapiens enhancer-binding protein 2 (EBP2) gene, promoter region and partial cds. source 1..3061 /organism="Homo sapiens" /chromosome="15" /map="15q13" /cell_line="H441" /tissue_type="lung" gene 1..>3061 /gene="EBP2" promoter 1..2947 /gene="EBP2" TATA_signal 2918..2923 /gene="EBP2" mRNA 2948..>3061 /gene="EBP2" /product="enhancer-binding protein 2" 5'UTR 2948..3010 /gene="EBP2" CDS 3011..>3061 /gene="EBP2" /product="enhancer-binding protein 2"
Viral sequence
Relevant feature information for a viral sequence:- include strain, serotype, host, country, and collection_date when known
- coding region intervals, including start and stop codons, if present
- protein name
- gene name, if known
- amino acid sequence, if known
- if no coding region is present, other description of the sequence
We strongly suggest that you provide as much of the above information as possible to ensure the most complete annotation of your sequence. If any of this information is not known, please inform us.
Example:
Human adenovirus 3 strain RKI-4263/07 hexon (H) gene, partial cds. source 1..1520 /organism="Human adenovirus 3" /mol_type="genomic DNA" /strain="RKI-4263/07" /serotype="3" /host="Homo sapiens" /db_xref="taxon:45659" /country="Germany" /collection_date="Apr-2007" gene <1..>1520 /gene="H" CDS <1..>1520 /note="major capsid protein" /codon_start=1 /product="hexon"
HIV-1
Relevant feature information for an HIV-1 sequence:- name of the country from which the virus was isolated
- clone and isolate information
- coding region intervals, including start and stop codons, if present
- protein names
- gene names, if known
- amino acid sequences, if known
- if no coding region is present, other description of the sequence
AND
OR
We strongly suggest that you provide as much of the above information as possible to ensure the most complete annotation of your sequence. If any of this information is not known, please inform us.
Example:
HIV-1 isolate X clone 5601 from USA, complete genome. source 1..9720 /organism="Human immunodeficiency virus type 1" /clone="5601" /isolate="X" /country="USA" repeat_region 1..634 /rpt_type=long_terminal_repeat gene 789..2291 /gene="gag" CDS 789..2291 /gene="gag" /product="gag protein" gene 2084..5095 /gene="pol" CDS 2084..5095 /gene="pol" /product="pol protein" gene 5040..5618 /gene="vif" CDS 5040..5618 /gene="vif" /product="vif protein" gene 5558..5848 /gene="vpr" CDS 5558..5848 /gene="vpr" /product="vpr protein" gene 5829..8476 /gene="tat" CDS join(5829..6043,8386..8476) /gene="tat" /product="tat protein" gene 5968..8660 /gene="rev" CDS join(5968..6043,8386..8660) /gene="rev" /product="rev protein" gene 6060..6305 /gene="vpu" CDS 6060..6305 /gene="vpu" /product="vpu protein" gene 6223..8802 /gene="env" /pseudo gene 8804..9070 /gene="nef" CDS 8804..9070 /gene="nef" /product="nef protein" repeat_region 9086..9719 /rpt_type=long_terminal_repeat polyA_signal 9612..9617
Transposon or insertion sequence
Relevant feature information for transposons or insertion sequences:- specific name of the transposon or IS, if available
- nucleotide spans corresponding to the transposon/IS
- name and nucleotide intervals of any host gene/product disrupted by the transposon/IS
- name and nucleotide intervals of any gene/product in the transposon/IS (eg, transposase)
- nucleotide spans any other features (LTRs, repeat regions)
We strongly suggest that you provide as much of the above information as possible to ensure the most complete annotation of your sequence. If any of this information is not known, please inform us.
Example:
Bacillus subtilis strain RS2 transposon BLT transposase (tnpA) gene, complete cds source 1..1221 /organism="Bacillus subtilis" /strain="RS2" repeat_region 21..1127 /rpt_type="dispersed" /mobile_element="transposon: BLT" repeat_region 21..61 /rpt_type=inverted gene 128..1034 /gene="tnpA" CDS 128..1034 /gene="tnpA" /product="transposase" repeat_region 1085..1127 /rpt_type=inverted
Microsatellite sequence
Relevant feature information for a microsatellite sequence:- unique microsatellite/clone name for each sequence
- interval of any repeat region(s) within the microsatellite sequence, if known
- are these considered STS sequences?
We strongly suggest that you provide as much of the above information as possible to ensure the most complete annotation of your sequence. If any of this information is not known, please inform us.
Example #1:
Chorthippus parallelus clone IIB-G5 microsatellite sequence. source 1..288 /organism="Chorthippus parallelus" /mol_type="genomic DNA" /db_xref="taxon:37639" /clone="IIB-G5" repeat_region 1..288 /rpt_type=tandem /satellite="microsatellite"
Example #2:
Noturus exilis voucher KU 40271 microsatellite Noex254 sequence. source 1..556 /organism="Noturus exilis" /mol_type="genomic DNA" /specimen_voucher="KU 40271" /db_xref="taxon:61323" /clone="Noex_02_03_H06" /PCR_primers="fwd_seq: catgtttgcacaaagggaaa, rev_seq: atgtggatgcagattgtgga" repeat_region 77..100 /rpt_type=tandem /rpt_unit_range=77..100 /rpt_unit_seq="ca" /satellite="microsatellite:Noex254"
Repeat regions
Relevant feature information for sequences containing repeat regions:- repeat region intervals
- repeat family, if known (eg, Alu, Mer)
- repeat type (tandem, inverted, flanking, terminal, direct, dispersed, nested, long_terminal_repeat, non_ltr_retrotransposon_polymeric_tract, centromeric_repeat, telomeric_repeat, x_element_combinatorial_repeat, y_prime_element, or other)
- repeat unit description/intervals, if region contains more than one repeat
We strongly suggest that you provide as much of the above information as possible to ensure the most complete annotation of your sequence. If any of this information is not known, please inform us.
Example:
Homo sapiens repeat regions source 1..2050 /organism="Homo sapiens" /chromosome="6" /map="6q25" repeat_region 8..126 /rpt_type=dispersed /rpt_family="B2" repeat_region 197..344 /rpt_type="direct" /rpt_unit="197..220" repeat_region 389..673 /rpt_family="AluSx" /rpt_type=dispersed repeat_region 847..876 /rpt_type="tandem" /rpt_unit="ca" /satellite="microsatellite:BT21" repeat_region 2000..2050 /rpt_type=long_terminal_repeat
Pseudogene
Relevant feature information for a pseudogene sequence:- gene intervals
- gene name
We strongly suggest that you provide as much of the above information as possible to ensure the most complete annotation of your sequence. If any of this information is not known, please inform us.
Example:
Mus musculus DNA methyltransferase (Dmt1) pseudogene, complete sequence. source 1..2131 /organism="Mus musculus" /strain="SvJ/129" gene 123..1444 /gene="Dmt1" /note="DNA methyltransferase 1" /pseudo
Translocation and/or fusion protein
Relevant feature information for a sequence resulting from a chromosomal translocation:- nucleotide location of the translocation breakpoint, if known
- map information for the translocation breakpoint (e.g., t(18;X)(q11.2;p11.2)
- coding region intervals, including start and stop codons, if present
- protein name
- amino acid sequence, if known
We strongly suggest that you provide as much of the above information as possible to ensure the most complete annotation of your sequence. If any of this information is not known, please inform us.
Example:
Homo sapiens SYT/SSX4 fusion protein mRNA, complete cds. source 1..2935 /organism="Homo sapiens" /tissue_type="sarcoma" /map="t(18;X)(q11.2;p11.2)" source 1..1242 /organism="Homo sapiens" /chromosome="18" /map="18q11.2" CDS 1..1479 /product="SYT/SSX4 fusion protein" source 1243..2935 /organism="Homo sapiens" /chromosome="X" /map="Xp11.2" 3'UTR 1480..2935
Cloning vector
Relevant feature information for a cloning vector- unique name for the cloning vector
- coding region intervals, including start and stop codons
- protein names, gene names
We strongly suggest that you provide as much of the above information as possible to ensure the most complete annotation of your sequence. If any of this information is not known, please inform us.
Example:
Cloning vector pRB223, complete sequence source 1..4361 /organism="Cloning vector pRB223" gene 86..1276 /gene="tet" CDS 86..1276 /gene="tet" /product="tetracycline resistance protein" RBS 1905..1909 /note="Shine-Dalgarno sequence" rep_origin 2535 gene complement(3293..4194) /gene="bla" CDS complement(3293..4153) /gene="bla" /product="beta-lactamase" misc_feature 4069..4125 /note="multiple cloning site" RBS complement(4161..4165) /gene="bla" /note="Shine-Dalgarno sequence" promoter complement(4188..4194) /gene="bla"
Gapped sequence
A gapped sequence includes both known, directly sequenced data and unknown data. The unknown sections of sequence are represented by strings of 'nnn' between the known, directly sequenced, contiguous data. All pieces of a gapped sequence must be from the same source and be in the same orientation and in the correct order.
Relevant feature information for a gapped sequence:- if a gap length is estimated, insert the equivalent number of nnns between the directly determined, contiguous sections of sequence
- if the gap length is unknown, insert a string of 100 nnns to represent the gap between the sections of sequence
- add a misc_feature for each gap with a /note qualifier to describe it as either 'gap of unknown length' or 'gap of estimated length, # nts'
- add all other appropriate features (exons, introns, CDS, gene, etc)
We strongly suggest that you provide as much of the above information as possible to ensure the most complete annotation of your sequence. If any of this information is not known, please inform us.
Example:
Homo sapiens MHC class I antigen (HLA-B) gene, HLA-B_458_01445 allele, exons 2, 3 and partial cds. source 1..788 /organism="Homo sapiens" /mol_type="genomic DNA" /db_xref="taxon:9606" gene <1..>788 /gene="HLA-B" /allele="HLA-B_458_01445" mRNA join(<1..270,513..>788) /gene="HLA-B" /allele="HLA-B_458_01445" /product="MHC class I antigen" CDS join(<1..270,513..>788) /gene="HLA-B" /allele="HLA-B_458_01445" /codon_start=3 /product="MHC class I antigen" /protein_id="ACR38915.1" /db_xref="GI:238055051" /translation="SHSMRYFDTAMSRPGRGEPRFISVGYVDDTQFVRFDSDAASPRE EPRAPWIEQEGPEYWDRNTQIFKTNTQTDRESLRNLRGYYNQSEAGSHTLQSMYGCDV GPDGRLLRGHDQSAYDGKDYIALNEDLRSWTAADTAAQITQRKWEAARVAEQDRAYLE GTCVEWLRRYLENGKDTLERA" exon 1..270 /gene="HLA-B" /allele="HLA-B_458_01445" /number=2 gap 271..512 /estimated_length=242 exon 513..788 /gene="HLA-B" /allele="HLA-B_458_01445" /number=3
Phylogenetic or population set
Relevant feature information for population or phylogenetic studies:A set comprises a group of sequences that represent the same gene or locus in different organisms or in different isolates, strains, or clones of the same organism. A set can be, for example, phylogenetic (different organisms), population (same organism), or environmental (unclassified or unknown organisms).
- unique descriptive information for each sequence (eg, clone, strain, isolate, or organism names)
- creating a set will allow the sequences to be retreivable by Entrez PopSet as a group.
We strongly suggest that you provide as much of the above information as possible to ensure the most complete annotation of your sequence. If any of this information is not known, please inform us.
STS submissions
Relevant feature information for STS submissions:- submit directly to dbSTS: the STS division of GenBank
- submit using BankIt and provide:
- chromosome and/or specific map locations
- clone name
- clone library [catalog number, reference, lab source, and/or specific (in-house) name or number]
- PCR conditions and primer binding sites
We strongly suggest that you provide as much of the above information as possible to ensure the most complete annotation of your sequence. If any of this information is not known, please inform us.
HTGS submissions
Requirements for HTGs submissions:- large genome centers should submit these through an FTP account to the High Throughput Genomic (HTG) Sequences division of GenBank
- one time only submitters should submit to [email protected]
FLICs submissions
Relevant feature information for FLIC submissions:- explicit labeling as FLICs
- protein name
- gene name
- CDS intervals, including start/stop codons
We strongly suggest that you provide as much of the above information as possible to ensure the most complete annotation of your sequence. If any of this information is not known, please inform us.