Virus report
Virus record identifiers, sample information, genomic locations, and products
Virus report
The downloaded virus package contains a virus data report in
JSON Lines
format in the file:
ncbi_dataset/data/data_report.jsonl
Each line of the virus data report file is a hierarchical JSON
object that represents a single virus record. The schema of the virus record is defined in the tables below where
each row describes a single field in the report or a sub-structure, which is a collection of fields.
The outermost structure of the report is VirusAssembly.
Table fields that include a Table Field Mnemonic can be used with the
dataformat command-line tool's
--fields
Sample report
{
"accession": "NC_045512.2",
"annotation": {
"genes": [
{
"cds": [
{
"maturePeptide": [
{
"accession": "YP_009725297.1",
"cdd": [
{
"accession": "CDD:288369",
"name": "Non structural protein Nsp1",
"range": {
"begin": "13",
"end": "127"
}
}
],
"name": "leader protein",
"nucleotide": {
"accessionVersion": "NC_045512.2",
"range": [
{
"begin": "266",
"end": "805"
}
],
"seqId": "NC_045512.2:266-805",
"sequenceHash": "BFEE0830",
"title": "leader protein [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"otherNames": [
"leader protein",
"non-structural protein 1",
"nonstructural protein 1",
"nsp1"
],
"protein": {
"accessionVersion": "YP_009725297.1",
"seqId": "YP_009725297.1",
"sequenceHash": "DFF407F9",
"title": "leader protein [polyprotein_range=YP_009724389.1:1-180] [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"proteinCompleteness": "COMPLETE"
},
{
"accession": "YP_009725298.1",
"name": "nsp2",
"nucleotide": {
"accessionVersion": "NC_045512.2",
"range": [
{
"begin": "806",
"end": "2719"
}
],
"seqId": "NC_045512.2:806-2719",
"sequenceHash": "E741D86",
"title": "nsp2 [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"otherNames": [
"non-structural protein 2",
"nonstructural protein 2",
"nsp2"
],
"protein": {
"accessionVersion": "YP_009725298.1",
"seqId": "YP_009725298.1",
"sequenceHash": "58F71ADE",
"title": "nsp2 [polyprotein_range=YP_009724389.1:181-818] [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"proteinCompleteness": "COMPLETE"
},
{
"accession": "YP_009725299.1",
"cdd": [
{
"accession": "CDD:289172",
"name": "Protein of unknown function (DUF3655)",
"range": {
"begin": "102",
"end": "169"
}
},
{
"accession": "CDD:366746",
"name": "Macro domain",
"range": {
"begin": "240",
"end": "340"
}
},
{
"accession": "CDD:314498",
"name": "Single-stranded poly(A) binding domain",
"range": {
"begin": "533",
"end": "675"
}
},
{
"accession": "CDD:288939",
"name": "Coronavirus polyprotein cleavage domain",
"range": {
"begin": "680",
"end": "743"
}
},
{
"accession": "CDD:370080",
"name": "Papain like viral protease",
"range": {
"begin": "746",
"end": "1064"
}
},
{
"accession": "CDD:292868",
"name": "Nucleic acid-binding domain (NAR)",
"range": {
"begin": "1089",
"end": "1201"
}
},
{
"accession": "CDD:391938",
"name": "even-transmembrane G protein-coupled receptor",
"range": {
"begin": "1494",
"end": "1563"
}
},
{
"accession": "CDD:341315",
"name": "TM helix 5 [structural motif]",
"range": {
"begin": "1497",
"end": "1519"
}
},
{
"accession": "CDD:341315",
"name": "TM helix 6 [structural motif]",
"range": {
"begin": "1527",
"end": "1551"
}
}
],
"name": "nsp3",
"nucleotide": {
"accessionVersion": "NC_045512.2",
"range": [
{
"begin": "2720",
"end": "8554"
}
],
"seqId": "NC_045512.2:2720-8554",
"sequenceHash": "6A235ABB",
"title": "nsp3 [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"otherNames": [
"non-structural protein 3",
"nonstructural protein 3",
"nsp3"
],
"protein": {
"accessionVersion": "YP_009725299.1",
"seqId": "YP_009725299.1",
"sequenceHash": "21B55819",
"title": "nsp3 [polyprotein_range=YP_009724389.1:819-2763] [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"proteinCompleteness": "COMPLETE"
},
{
"accession": "YP_009725300.1",
"cdd": [
{
"accession": "CDD:374495",
"name": "Coronavirus nonstructural protein 4 C-terminus",
"range": {
"begin": "406",
"end": "498"
}
}
],
"name": "nsp4",
"nucleotide": {
"accessionVersion": "NC_045512.2",
"range": [
{
"begin": "8555",
"end": "10054"
}
],
"seqId": "NC_045512.2:8555-10054",
"sequenceHash": "4BA01958",
"title": "nsp4 [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"otherNames": [
"non-structural protein 4",
"nonstructural protein 4",
"nsp4"
],
"protein": {
"accessionVersion": "YP_009725300.1",
"seqId": "YP_009725300.1",
"sequenceHash": "2C781714",
"title": "nsp4 [polyprotein_range=YP_009724389.1:2764-3263] [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"proteinCompleteness": "COMPLETE"
},
{
"accession": "YP_009725301.1",
"cdd": [
{
"accession": "CDD:368429",
"name": "Coronavirus endopeptidase C30",
"range": {
"begin": "29",
"end": "306"
}
}
],
"name": "3C-like proteinase",
"nucleotide": {
"accessionVersion": "NC_045512.2",
"range": [
{
"begin": "10055",
"end": "10972"
}
],
"seqId": "NC_045512.2:10055-10972",
"sequenceHash": "C28D0EE5",
"title": "3C-like proteinase [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"otherNames": [
"3C-like proteinase",
"3CLpro",
"Mpro",
"main proteinase",
"non-structural protein 5",
"nonstructural protein 5",
"nsp5A_3CLpro",
"nsp5B_3CLpro"
],
"protein": {
"accessionVersion": "YP_009725301.1",
"seqId": "YP_009725301.1",
"sequenceHash": "5CE30DBB",
"title": "3C-like proteinase [polyprotein_range=YP_009724389.1:3264-3569] [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"proteinCompleteness": "COMPLETE"
},
{
"accession": "YP_009725302.1",
"name": "nsp6",
"nucleotide": {
"accessionVersion": "NC_045512.2",
"range": [
{
"begin": "10973",
"end": "11842"
}
],
"seqId": "NC_045512.2:10973-11842",
"sequenceHash": "99170F63",
"title": "nsp6 [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"otherNames": [
"non-structural protein 6",
"nonstructural protein 6",
"nsp6"
],
"protein": {
"accessionVersion": "YP_009725302.1",
"seqId": "YP_009725302.1",
"sequenceHash": "A72B0D80",
"title": "nsp6 [polyprotein_range=YP_009724389.1:3570-3859] [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"proteinCompleteness": "COMPLETE"
},
{
"accession": "YP_009725303.1",
"cdd": [
{
"accession": "CDD:285878",
"name": "nsp7 replicase",
"range": {
"begin": "1",
"end": "83"
}
}
],
"name": "nsp7",
"nucleotide": {
"accessionVersion": "NC_045512.2",
"range": [
{
"begin": "11843",
"end": "12091"
}
],
"seqId": "NC_045512.2:11843-12091",
"sequenceHash": "DDAA03C2",
"title": "nsp7 [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"otherNames": [
"non-structural protein 7",
"nonstructural protein 7",
"nsp7"
],
"protein": {
"accessionVersion": "YP_009725303.1",
"seqId": "YP_009725303.1",
"sequenceHash": "A87703C6",
"title": "nsp7 [polyprotein_range=YP_009724389.1:3860-3942] [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"proteinCompleteness": "COMPLETE"
},
{
"accession": "YP_009725304.1",
"cdd": [
{
"accession": "CDD:285879",
"name": "nsp8 replicase",
"range": {
"begin": "1",
"end": "198"
}
}
],
"name": "nsp8",
"nucleotide": {
"accessionVersion": "NC_045512.2",
"range": [
{
"begin": "12092",
"end": "12685"
}
],
"seqId": "NC_045512.2:12092-12685",
"sequenceHash": "759508E4",
"title": "nsp8 [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"otherNames": [
"non-structural protein 8",
"nonstructural protein 8",
"nsp8"
],
"protein": {
"accessionVersion": "YP_009725304.1",
"seqId": "YP_009725304.1",
"sequenceHash": "27D30877",
"title": "nsp8 [polyprotein_range=YP_009724389.1:3943-4140] [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"proteinCompleteness": "COMPLETE"
},
{
"accession": "YP_009725305.1",
"cdd": [
{
"accession": "CDD:285872",
"name": "nsp9 replicase",
"range": {
"begin": "1",
"end": "113"
}
}
],
"name": "nsp9",
"nucleotide": {
"accessionVersion": "NC_045512.2",
"range": [
{
"begin": "12686",
"end": "13024"
}
],
"seqId": "NC_045512.2:12686-13024",
"sequenceHash": "66A7051A",
"title": "nsp9 [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"otherNames": [
"non-structural protein 9",
"nonstructural protein 9",
"nsp9",
"ssRNA-binding protein"
],
"protein": {
"accessionVersion": "YP_009725305.1",
"seqId": "YP_009725305.1",
"sequenceHash": "1A720513",
"title": "nsp9 [polyprotein_range=YP_009724389.1:4141-4253] [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"proteinCompleteness": "COMPLETE"
},
{
"accession": "YP_009725306.1",
"cdd": [
{
"accession": "CDD:286486",
"name": "RNA synthesis protein NSP10",
"range": {
"begin": "12",
"end": "131"
}
}
],
"name": "nsp10",
"nucleotide": {
"accessionVersion": "NC_045512.2",
"range": [
{
"begin": "13025",
"end": "13441"
}
],
"seqId": "NC_045512.2:13025-13441",
"sequenceHash": "4B5A0671",
"title": "nsp10 [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"otherNames": [
"GFL",
"growth-factor-like protein",
"non-structural protein 10",
"nonstructural protein 10",
"nsp10"
],
"protein": {
"accessionVersion": "YP_009725306.1",
"seqId": "YP_009725306.1",
"sequenceHash": "839705B4",
"title": "nsp10 [polyprotein_range=YP_009724389.1:4254-4392] [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"proteinCompleteness": "COMPLETE"
},
{
"accession": "YP_009725307.1",
"cdd": [
{
"accession": "CDD:284009",
"name": "Coronavirus RPol N-terminus",
"range": {
"begin": "14",
"end": "366"
}
}
],
"name": "RNA-dependent RNA polymerase",
"nucleotide": {
"accessionVersion": "NC_045512.2",
"range": [
{
"begin": "13442",
"end": "13468"
},
{
"begin": "13468",
"end": "16236"
}
],
"seqId": "NC_045512.2:13442-13468,13468-16236",
"sequenceHash": "74392C07",
"title": "RNA-dependent RNA polymerase [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"otherNames": [
"NiRAN",
"RNA-dependent RNA polymerase",
"RdRp",
"non-structural protein 12",
"nonstructural protein 12",
"nsp12"
],
"protein": {
"accessionVersion": "YP_009725307.1",
"seqId": "YP_009725307.1",
"sequenceHash": "6D522979",
"title": "RNA-dependent RNA polymerase [polyprotein_range=YP_009724389.1:4393-5324] [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"proteinCompleteness": "COMPLETE"
},
{
"accession": "YP_009725308.1",
"cdd": [
{
"accession": "CDD:350692",
"name": "DEXXQ-box helicase domain of Upf1-like helicase",
"range": {
"begin": "272",
"end": "443"
}
},
{
"accession": "CDD:224037",
"name": "Superfamily I DNA and/or RNA helicase [Replication, recombination and repair]",
"range": {
"begin": "323",
"end": "592"
}
}
],
"name": "helicase",
"nucleotide": {
"accessionVersion": "NC_045512.2",
"range": [
{
"begin": "16237",
"end": "18039"
}
],
"seqId": "NC_045512.2:16237-18039",
"sequenceHash": "6CA71C02",
"title": "helicase [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"otherNames": [
"helicase",
"non-structural protein 13",
"nonstructural protein 13"
],
"protein": {
"accessionVersion": "YP_009725308.1",
"seqId": "YP_009725308.1",
"sequenceHash": "17B91B6E",
"title": "helicase [polyprotein_range=YP_009724389.1:5325-5925] [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"proteinCompleteness": "COMPLETE"
},
{
"accession": "YP_009725309.1",
"cdd": [
{
"accession": "CDD:284002",
"name": "pfam06471",
"range": {
"begin": "3",
"end": "527"
}
}
],
"name": "3'-to-5' exonuclease",
"nucleotide": {
"accessionVersion": "NC_045512.2",
"range": [
{
"begin": "18040",
"end": "19620"
}
],
"seqId": "NC_045512.2:18040-19620",
"sequenceHash": "444B1903",
"title": "3'-to-5' exonuclease [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"otherNames": [
"3'-to-5' exonuclease",
"non-structural protein 14",
"nonstructural protein 14",
"nsp14"
],
"protein": {
"accessionVersion": "YP_009725309.1",
"seqId": "YP_009725309.1",
"sequenceHash": "8ED173E",
"title": "3'-to-5' exonuclease [polyprotein_range=YP_009724389.1:5926-6452] [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"proteinCompleteness": "COMPLETE"
},
{
"accession": "YP_009725310.1",
"cdd": [
{
"accession": "CDD:284002",
"name": "pfam06471",
"range": {
"begin": "1",
"end": "68"
}
}
],
"name": "endoRNAse",
"nucleotide": {
"accessionVersion": "NC_045512.2",
"range": [
{
"begin": "19621",
"end": "20658"
}
],
"seqId": "NC_045512.2:19621-20658",
"sequenceHash": "C4DC1059",
"title": "endoRNAse [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"otherNames": [
"endoRNAse",
"non-structural protein 15",
"nonstructural protein 15",
"nsp15"
],
"protein": {
"accessionVersion": "YP_009725310.1",
"seqId": "YP_009725310.1",
"sequenceHash": "76160F5B",
"title": "endoRNAse [polyprotein_range=YP_009724389.1:6453-6798] [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"proteinCompleteness": "COMPLETE"
},
{
"accession": "YP_009725311.1",
"cdd": [
{
"accession": "CDD:368920",
"name": "Coronavirus NSP13",
"range": {
"begin": "2",
"end": "297"
}
}
],
"name": "2'-O-ribose methyltransferase",
"nucleotide": {
"accessionVersion": "NC_045512.2",
"range": [
{
"begin": "20659",
"end": "21552"
}
],
"seqId": "NC_045512.2:20659-21552",
"sequenceHash": "58050E29",
"title": "2'-O-ribose methyltransferase [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"otherNames": [
"2'-o-MT",
"2'-o-ribose methyltransferase",
"non-structural protein 16",
"nonstructural protein 16",
"nsp16",
"nsp16_OMT"
],
"protein": {
"accessionVersion": "YP_009725311.1",
"seqId": "YP_009725311.1",
"sequenceHash": "D5F30D79",
"title": "2'-O-ribose methyltransferase [polyprotein_range=YP_009724389.1:6799-7096] [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"proteinCompleteness": "COMPLETE"
}
],
"name": "ORF1ab polyprotein",
"nucleotide": {
"accessionVersion": "NC_045512.2",
"range": [
{
"begin": "266",
"end": "13468"
},
{
"begin": "13468",
"end": "21555"
}
],
"seqId": "NC_045512.2:266-13468,13468-21555",
"sequenceHash": "9F004FDB",
"title": "ORF1ab polyprotein [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"otherNames": [
"ORF1ab",
"ORF1ab polyprotein",
"open reading frame 1",
"open reading frame 1ab",
"orf1",
"orf1b",
"polyprotein 1ab",
"pp1ab"
],
"protein": {
"accessionVersion": "YP_009724389.1",
"range": [
{
"begin": "1",
"end": "7096"
}
],
"seqId": "YP_009724389.1:1-7096",
"sequenceHash": "89013D3D",
"title": "ORF1ab polyprotein [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"uniProtKb": {
"id": "P0DTD1",
"name": "Replicase polyprotein 1ab"
}
},
{
"maturePeptide": [
{
"accession": "YP_009742608.1",
"cdd": [
{
"accession": "CDD:288369",
"name": "Non structural protein Nsp1",
"range": {
"begin": "13",
"end": "127"
}
}
],
"name": "leader protein",
"nucleotide": {
"accessionVersion": "NC_045512.2",
"range": [
{
"begin": "266",
"end": "805"
}
],
"sequenceHash": "BFEE0830"
},
"otherNames": [
"leader protein",
"non-structural protein 1",
"nonstructural protein 1",
"nsp1"
],
"protein": {
"accessionVersion": "YP_009742608.1",
"seqId": "YP_009742608.1",
"sequenceHash": "DFF407F9",
"title": "leader protein [polyprotein_range=YP_009725295.1:1-180] [polyprotein=ORF1a polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"proteinCompleteness": "COMPLETE"
},
{
"accession": "YP_009742609.1",
"name": "nsp2",
"nucleotide": {
"accessionVersion": "NC_045512.2",
"range": [
{
"begin": "806",
"end": "2719"
}
],
"sequenceHash": "E741D86"
},
"otherNames": [
"non-structural protein 2",
"nonstructural protein 2",
"nsp2"
],
"protein": {
"accessionVersion": "YP_009742609.1",
"seqId": "YP_009742609.1",
"sequenceHash": "58F71ADE",
"title": "nsp2 [polyprotein_range=YP_009725295.1:181-818] [polyprotein=ORF1a polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"proteinCompleteness": "COMPLETE"
},
{
"accession": "YP_009742610.1",
"cdd": [
{
"accession": "CDD:289172",
"name": "Protein of unknown function (DUF3655)",
"range": {
"begin": "102",
"end": "169"
}
},
{
"accession": "CDD:366746",
"name": "Macro domain",
"range": {
"begin": "240",
"end": "340"
}
},
{
"accession": "CDD:314498",
"name": "Single-stranded poly(A) binding domain",
"range": {
"begin": "533",
"end": "675"
}
},
{
"accession": "CDD:288939",
"name": "Coronavirus polyprotein cleavage domain",
"range": {
"begin": "680",
"end": "743"
}
},
{
"accession": "CDD:370080",
"name": "Papain like viral protease",
"range": {
"begin": "746",
"end": "1064"
}
},
{
"accession": "CDD:292868",
"name": "Nucleic acid-binding domain (NAR)",
"range": {
"begin": "1089",
"end": "1201"
}
},
{
"accession": "CDD:391938",
"name": "even-transmembrane G protein-coupled receptor",
"range": {
"begin": "1494",
"end": "1563"
}
},
{
"accession": "CDD:341315",
"name": "TM helix 5 [structural motif]",
"range": {
"begin": "1497",
"end": "1519"
}
},
{
"accession": "CDD:341315",
"name": "TM helix 6 [structural motif]",
"range": {
"begin": "1527",
"end": "1551"
}
}
],
"name": "nsp3",
"nucleotide": {
"accessionVersion": "NC_045512.2",
"range": [
{
"begin": "2720",
"end": "8554"
}
],
"sequenceHash": "6A235ABB"
},
"otherNames": [
"non-structural protein 3",
"nonstructural protein 3",
"nsp3"
],
"protein": {
"accessionVersion": "YP_009742610.1",
"seqId": "YP_009742610.1",
"sequenceHash": "21B55819",
"title": "nsp3 [polyprotein_range=YP_009725295.1:819-2763] [polyprotein=ORF1a polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"proteinCompleteness": "COMPLETE"
},
{
"accession": "YP_009742611.1",
"cdd": [
{
"accession": "CDD:374495",
"name": "Coronavirus nonstructural protein 4 C-terminus",
"range": {
"begin": "406",
"end": "498"
}
}
],
"name": "nsp4",
"nucleotide": {
"accessionVersion": "NC_045512.2",
"range": [
{
"begin": "8555",
"end": "10054"
}
],
"sequenceHash": "4BA01958"
},
"otherNames": [
"non-structural protein 4",
"nonstructural protein 4",
"nsp4"
],
"protein": {
"accessionVersion": "YP_009742611.1",
"seqId": "YP_009742611.1",
"sequenceHash": "2C781714",
"title": "nsp4 [polyprotein_range=YP_009725295.1:2764-3263] [polyprotein=ORF1a polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"proteinCompleteness": "COMPLETE"
},
{
"accession": "YP_009742612.1",
"cdd": [
{
"accession": "CDD:368429",
"name": "Coronavirus endopeptidase C30",
"range": {
"begin": "29",
"end": "306"
}
}
],
"name": "3C-like proteinase",
"nucleotide": {
"accessionVersion": "NC_045512.2",
"range": [
{
"begin": "10055",
"end": "10972"
}
],
"sequenceHash": "C28D0EE5"
},
"otherNames": [
"3C-like proteinase",
"3CLpro",
"Mpro",
"main proteinase",
"non-structural protein 5",
"nonstructural protein 5",
"nsp5A_3CLpro",
"nsp5B_3CLpro"
],
"protein": {
"accessionVersion": "YP_009742612.1",
"seqId": "YP_009742612.1",
"sequenceHash": "5CE30DBB",
"title": "3C-like proteinase [polyprotein_range=YP_009725295.1:3264-3569] [polyprotein=ORF1a polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"proteinCompleteness": "COMPLETE"
},
{
"accession": "YP_009742613.1",
"name": "nsp6",
"nucleotide": {
"accessionVersion": "NC_045512.2",
"range": [
{
"begin": "10973",
"end": "11842"
}
],
"sequenceHash": "99170F63"
},
"otherNames": [
"non-structural protein 6",
"nonstructural protein 6",
"nsp6"
],
"protein": {
"accessionVersion": "YP_009742613.1",
"seqId": "YP_009742613.1",
"sequenceHash": "A72B0D80",
"title": "nsp6 [polyprotein_range=YP_009725295.1:3570-3859] [polyprotein=ORF1a polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"proteinCompleteness": "COMPLETE"
},
{
"accession": "YP_009742614.1",
"cdd": [
{
"accession": "CDD:285878",
"name": "nsp7 replicase",
"range": {
"begin": "1",
"end": "83"
}
}
],
"name": "nsp7",
"nucleotide": {
"accessionVersion": "NC_045512.2",
"range": [
{
"begin": "11843",
"end": "12091"
}
],
"sequenceHash": "DDAA03C2"
},
"otherNames": [
"non-structural protein 7",
"nonstructural protein 7",
"nsp7"
],
"protein": {
"accessionVersion": "YP_009742614.1",
"seqId": "YP_009742614.1",
"sequenceHash": "A87703C6",
"title": "nsp7 [polyprotein_range=YP_009725295.1:3860-3942] [polyprotein=ORF1a polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"proteinCompleteness": "COMPLETE"
},
{
"accession": "YP_009742615.1",
"cdd": [
{
"accession": "CDD:285879",
"name": "nsp8 replicase",
"range": {
"begin": "1",
"end": "198"
}
}
],
"name": "nsp8",
"nucleotide": {
"accessionVersion": "NC_045512.2",
"range": [
{
"begin": "12092",
"end": "12685"
}
],
"sequenceHash": "759508E4"
},
"otherNames": [
"non-structural protein 8",
"nonstructural protein 8",
"nsp8"
],
"protein": {
"accessionVersion": "YP_009742615.1",
"seqId": "YP_009742615.1",
"sequenceHash": "27D30877",
"title": "nsp8 [polyprotein_range=YP_009725295.1:3943-4140] [polyprotein=ORF1a polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"proteinCompleteness": "COMPLETE"
},
{
"accession": "YP_009742616.1",
"cdd": [
{
"accession": "CDD:285872",
"name": "nsp9 replicase",
"range": {
"begin": "1",
"end": "113"
}
}
],
"name": "nsp9",
"nucleotide": {
"accessionVersion": "NC_045512.2",
"range": [
{
"begin": "12686",
"end": "13024"
}
],
"sequenceHash": "66A7051A"
},
"otherNames": [
"non-structural protein 9",
"nonstructural protein 9",
"nsp9",
"ssRNA-binding protein"
],
"protein": {
"accessionVersion": "YP_009742616.1",
"seqId": "YP_009742616.1",
"sequenceHash": "1A720513",
"title": "nsp9 [polyprotein_range=YP_009725295.1:4141-4253] [polyprotein=ORF1a polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"proteinCompleteness": "COMPLETE"
},
{
"accession": "YP_009742617.1",
"cdd": [
{
"accession": "CDD:286486",
"name": "RNA synthesis protein NSP10",
"range": {
"begin": "12",
"end": "131"
}
}
],
"name": "nsp10",
"nucleotide": {
"accessionVersion": "NC_045512.2",
"range": [
{
"begin": "13025",
"end": "13441"
}
],
"sequenceHash": "4B5A0671"
},
"otherNames": [
"GFL",
"growth-factor-like protein",
"non-structural protein 10",
"nonstructural protein 10",
"nsp10"
],
"protein": {
"accessionVersion": "YP_009742617.1",
"seqId": "YP_009742617.1",
"sequenceHash": "839705B4",
"title": "nsp10 [polyprotein_range=YP_009725295.1:4254-4392] [polyprotein=ORF1a polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"proteinCompleteness": "COMPLETE"
},
{
"accession": "YP_009725312.1",
"name": "nsp11",
"nucleotide": {
"accessionVersion": "NC_045512.2",
"range": [
{
"begin": "13442",
"end": "13480"
}
],
"seqId": "NC_045512.2:13442-13480",
"sequenceHash": "CA400AB",
"title": "nsp11 [polyprotein=ORF1a polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"otherNames": [
"non-structural protein 11",
"nonstructural protein 11",
"nsp11"
],
"protein": {
"accessionVersion": "YP_009725312.1",
"seqId": "YP_009725312.1",
"sequenceHash": "32B0077",
"title": "nsp11 [polyprotein_range=YP_009725295.1:4393-4405] [polyprotein=ORF1a polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"proteinCompleteness": "COMPLETE"
}
],
"name": "ORF1a polyprotein",
"nucleotide": {
"accessionVersion": "NC_045512.2",
"range": [
{
"begin": "266",
"end": "13483"
}
],
"seqId": "NC_045512.2:266-13483",
"sequenceHash": "96BED0ED",
"title": "ORF1a polyprotein [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"otherNames": [
"ORF1a",
"ORF1a polyprotein",
"open reading frame 1a",
"orf1a",
"polyprotein 1ab",
"pp1a"
],
"protein": {
"accessionVersion": "YP_009725295.1",
"range": [
{
"begin": "1",
"end": "4405"
}
],
"seqId": "YP_009725295.1:1-4405",
"sequenceHash": "A6EBC4B0",
"title": "ORF1a polyprotein [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"uniProtKb": {
"id": "P0DTC1",
"name": "Replicase polyprotein 1a"
}
}
],
"geneId": 43740578,
"name": "ORF1ab"
},
{
"cds": [
{
"cdd": [
{
"accession": "CDD:370471",
"name": "Spike receptor binding domain",
"range": {
"begin": "330",
"end": "583"
}
},
{
"accession": "CDD:279881",
"name": "Coronavirus S2 glycoprotein",
"range": {
"begin": "1",
"end": "1273"
}
}
],
"name": "surface glycoprotein",
"nucleotide": {
"accessionVersion": "NC_045512.2",
"range": [
{
"begin": "21563",
"end": "25384"
}
],
"seqId": "NC_045512.2:21563-25384",
"sequenceHash": "B32D3CC0",
"title": "surface glycoprotein [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"otherNames": [
"ORF 2",
"open reading frame 2",
"spike",
"spike protein",
"surface glycoprotein"
],
"protein": {
"accessionVersion": "YP_009724390.1",
"range": [
{
"begin": "1",
"end": "1273"
}
],
"seqId": "YP_009724390.1:1-1273",
"sequenceHash": "DF1539B0",
"title": "surface glycoprotein [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"uniProtKb": {
"id": "P0DTC2",
"name": "spike glycoprotein"
}
}
],
"geneId": 43740568,
"name": "S"
},
{
"cds": [
{
"cdd": [
{
"accession": "CDD:288183",
"name": "Coronavirus accessory protein 3a",
"range": {
"begin": "1",
"end": "274"
}
}
],
"name": "ORF3a protein",
"nucleotide": {
"accessionVersion": "NC_045512.2",
"range": [
{
"begin": "25393",
"end": "26220"
}
],
"seqId": "NC_045512.2:25393-26220",
"sequenceHash": "EC770D42",
"title": "ORF3a protein [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"otherNames": [
"3a",
"ORF 3a",
"ORF3a",
"open reading frame 3a"
],
"protein": {
"accessionVersion": "YP_009724391.1",
"range": [
{
"begin": "1",
"end": "275"
}
],
"seqId": "YP_009724391.1:1-275",
"sequenceHash": "CA130D0F",
"title": "ORF3a protein [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"uniProtKb": {
"id": "P0DTC3",
"name": "Protein 3a"
}
}
],
"geneId": 43740569,
"name": "ORF3a"
},
{
"cds": [
{
"name": "envelope protein",
"nucleotide": {
"accessionVersion": "NC_045512.2",
"range": [
{
"begin": "26245",
"end": "26472"
}
],
"seqId": "NC_045512.2:26245-26472",
"sequenceHash": "CFFD0414",
"title": "envelope protein [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"otherNames": [
"E",
"ORF 4",
"ORF4",
"envelope",
"envelope protein",
"open reading frame 4"
],
"protein": {
"accessionVersion": "YP_009724392.1",
"range": [
{
"begin": "1",
"end": "75"
}
],
"seqId": "YP_009724392.1:1-75",
"sequenceHash": "8A7D03C2",
"title": "envelope protein [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"uniProtKb": {
"id": "P0DTC4",
"name": "Envelope small membrane protein"
}
}
],
"geneId": 43740570,
"name": "E"
},
{
"cds": [
{
"cdd": [
{
"accession": "CDD:279907",
"name": "Coronavirus M matrix/glycoprotein",
"range": {
"begin": "4",
"end": "221"
}
}
],
"name": "membrane glycoprotein",
"nucleotide": {
"accessionVersion": "NC_045512.2",
"range": [
{
"begin": "26523",
"end": "27191"
}
],
"seqId": "NC_045512.2:26523-27191",
"sequenceHash": "4A3D0AA4",
"title": "membrane glycoprotein [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"otherNames": [
"M",
"ORF 5",
"ORF5",
"matrix glycoprotein",
"matrix protein",
"membrane",
"membrane glycoprotein",
"open reading frame 5"
],
"protein": {
"accessionVersion": "YP_009724393.1",
"range": [
{
"begin": "1",
"end": "222"
}
],
"seqId": "YP_009724393.1:1-222",
"sequenceHash": "404E09E2",
"title": "membrane glycoprotein [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"uniProtKb": {
"id": "P0DTC5",
"name": "Membrane protein"
}
}
],
"geneId": 43740571,
"name": "M"
},
{
"cds": [
{
"cdd": [
{
"accession": "CDD:288948",
"name": "Open reading frame 6 from SARS coronavirus",
"range": {
"begin": "1",
"end": "61"
}
}
],
"name": "ORF6 protein",
"nucleotide": {
"accessionVersion": "NC_045512.2",
"range": [
{
"begin": "27202",
"end": "27387"
}
],
"seqId": "NC_045512.2:27202-27387",
"sequenceHash": "252702F1",
"title": "ORF6 protein [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"otherNames": [
"ORF 6",
"ORF6",
"open reading frame 6"
],
"protein": {
"accessionVersion": "YP_009724394.1",
"range": [
{
"begin": "1",
"end": "61"
}
],
"seqId": "YP_009724394.1:1-61",
"sequenceHash": "543902B5",
"title": "ORF6 protein [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"uniProtKb": {
"id": "P0DTC6",
"name": "Non-structural protein 6"
}
}
],
"geneId": 43740572,
"name": "ORF6"
},
{
"cds": [
{
"cdd": [
{
"accession": "CDD:370117",
"name": "SARS coronavirus X4 like",
"range": {
"begin": "16",
"end": "98"
}
}
],
"name": "ORF7a protein",
"nucleotide": {
"accessionVersion": "NC_045512.2",
"range": [
{
"begin": "27394",
"end": "27759"
}
],
"seqId": "NC_045512.2:27394-27759",
"sequenceHash": "25E705AF",
"title": "ORF7a protein [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"otherNames": [
"7a",
"ORF 7a",
"ORF7a",
"open reading frame 7a"
],
"protein": {
"accessionVersion": "YP_009724395.1",
"range": [
{
"begin": "1",
"end": "121"
}
],
"seqId": "YP_009724395.1:1-121",
"sequenceHash": "3B2A0535",
"title": "ORF7a protein [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"uniProtKb": {
"id": "P0DTC7",
"name": "Protein 7a"
}
}
],
"geneId": 43740573,
"name": "ORF7a"
},
{
"cds": [
{
"name": "ORF7b",
"nucleotide": {
"accessionVersion": "NC_045512.2",
"range": [
{
"begin": "27756",
"end": "27887"
}
],
"seqId": "NC_045512.2:27756-27887",
"sequenceHash": "ADD30274",
"title": "ORF7b [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"otherNames": [
"7b",
"ORF 7b",
"ORF7b",
"open reading frame 7b"
],
"protein": {
"accessionVersion": "YP_009725318.1",
"range": [
{
"begin": "1",
"end": "43"
}
],
"seqId": "YP_009725318.1:1-43",
"sequenceHash": "2469019F",
"title": "ORF7b [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"uniProtKb": {
"id": "P0DTD8",
"name": "Protein non-structural 7b"
}
}
],
"geneId": 43740574,
"name": "ORF7b"
},
{
"cds": [
{
"cdd": [
{
"accession": "CDD:152528",
"name": "Coronavirus NS8 protein",
"range": {
"begin": "1",
"end": "118"
}
}
],
"name": "ORF8 protein",
"nucleotide": {
"accessionVersion": "NC_045512.2",
"range": [
{
"begin": "27894",
"end": "28259"
}
],
"seqId": "NC_045512.2:27894-28259",
"sequenceHash": "43A50622",
"title": "ORF8 protein [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"otherNames": [
"ORF 8",
"ORF8",
"open reading frame 8"
],
"protein": {
"accessionVersion": "YP_009724396.1",
"range": [
{
"begin": "1",
"end": "121"
}
],
"seqId": "YP_009724396.1:1-121",
"sequenceHash": "47D40569",
"title": "ORF8 protein [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"uniProtKb": {
"id": "P0DTC8",
"name": "Non-structural protein 8"
}
}
],
"geneId": 43740577,
"name": "ORF8"
},
{
"cds": [
{
"cdd": [
{
"accession": "CDD:279305",
"name": "Coronavirus nucleocapsid protein",
"range": {
"begin": "14",
"end": "368"
}
}
],
"name": "nucleocapsid phosphoprotein",
"nucleotide": {
"accessionVersion": "NC_045512.2",
"range": [
{
"begin": "28274",
"end": "29533"
}
],
"seqId": "NC_045512.2:28274-29533",
"sequenceHash": "4D3C10AF",
"title": "nucleocapsid phosphoprotein [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"otherNames": [
"ORF 9",
"ORF9",
"nucleocapsid",
"nucleocapsid phosphoprotein",
"open reading frame 9"
],
"protein": {
"accessionVersion": "YP_009724397.2",
"range": [
{
"begin": "1",
"end": "419"
}
],
"seqId": "YP_009724397.2:1-419",
"sequenceHash": "9B7912B4",
"title": "nucleocapsid phosphoprotein [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"uniProtKb": {
"id": "P0DTC9",
"name": "Nucleoprotein"
}
}
],
"geneId": 43740575,
"name": "N"
},
{
"cds": [
{
"name": "ORF10 protein",
"nucleotide": {
"accessionVersion": "NC_045512.2",
"range": [
{
"begin": "29558",
"end": "29674"
}
],
"seqId": "NC_045512.2:29558-29674",
"sequenceHash": "761401EA",
"title": "ORF10 protein [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"otherNames": [
"ORF 10",
"ORF10",
"open reading frame 10"
],
"protein": {
"accessionVersion": "YP_009725255.1",
"range": [
{
"begin": "1",
"end": "38"
}
],
"seqId": "YP_009725255.1:1-38",
"sequenceHash": "231201DA",
"title": "ORF10 protein [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
},
"uniProtKb": {
"id": "A0A663DJA2",
"name": "ORF10 protein"
}
}
],
"geneId": 43740576,
"name": "ORF10"
}
]
},
"bioprojects": [
"PRJNA485481"
],
"completeness": "COMPLETE",
"geneCount": 11,
"host": {
"lineage": [
{
"name": "cellular organisms",
"taxId": 131567
},
{
"name": "Eukaryota",
"taxId": 2759
},
{
"name": "Opisthokonta",
"taxId": 33154
},
{
"name": "Metazoa",
"taxId": 33208
},
{
"name": "Eumetazoa",
"taxId": 6072
},
{
"name": "Bilateria",
"taxId": 33213
},
{
"name": "Deuterostomia",
"taxId": 33511
},
{
"name": "Chordata",
"taxId": 7711
},
{
"name": "Craniata",
"taxId": 89593
},
{
"name": "Vertebrata",
"taxId": 7742
},
{
"name": "Gnathostomata",
"taxId": 7776
},
{
"name": "Teleostomi",
"taxId": 117570
},
{
"name": "Euteleostomi",
"taxId": 117571
},
{
"name": "Sarcopterygii",
"taxId": 8287
},
{
"name": "Dipnotetrapodomorpha",
"taxId": 1338369
},
{
"name": "Tetrapoda",
"taxId": 32523
},
{
"name": "Amniota",
"taxId": 32524
},
{
"name": "Mammalia",
"taxId": 40674
},
{
"name": "Theria",
"taxId": 32525
},
{
"name": "Eutheria",
"taxId": 9347
},
{
"name": "Boreoeutheria",
"taxId": 1437010
},
{
"name": "Euarchontoglires",
"taxId": 314146
},
{
"name": "Primates",
"taxId": 9443
},
{
"name": "Haplorrhini",
"taxId": 376913
},
{
"name": "Simiiformes",
"taxId": 314293
},
{
"name": "Catarrhini",
"taxId": 9526
},
{
"name": "Hominoidea",
"taxId": 314295
},
{
"name": "Hominidae",
"taxId": 9604
},
{
"name": "Homininae",
"taxId": 207598
},
{
"name": "Homo",
"taxId": 9605
},
{
"name": "Homo sapiens",
"taxId": 9606
}
],
"organismName": "Homo sapiens",
"sciName": "Homo sapiens",
"taxId": 9606
},
"isAnnotated": true,
"isolate": {
"collectionDate": "2019-12",
"name": "Wuhan-Hu-1"
},
"length": 29903,
"location": {
"geographicLocation": "China",
"geographicRegion": "Asia"
},
"maturePeptideCount": 26,
"molType": "ssRNA(+)",
"nucleotide": {
"accessionVersion": "NC_045512.2",
"seqId": "NC_045512.2",
"sequenceHash": "A926D55E",
"title": "Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome"
},
"proteinCount": 12,
"releaseDate": "2020-01-13",
"sourceDatabase": "RefSeq",
"updateDate": "2020-07-18",
"virus": {
"lineage": [
{
"name": "Viruses",
"taxId": 10239
},
{
"name": "Riboviria",
"taxId": 2559587
},
{
"name": "Orthornavirae",
"taxId": 2732396
},
{
"name": "Pisuviricota",
"taxId": 2732408
},
{
"name": "Pisoniviricetes",
"taxId": 2732506
},
{
"name": "Nidovirales",
"taxId": 76804
},
{
"name": "Cornidovirineae",
"taxId": 2499399
},
{
"name": "Coronaviridae",
"taxId": 11118
},
{
"name": "Orthocoronavirinae",
"taxId": 2501931
},
{
"name": "Betacoronavirus",
"taxId": 694002
},
{
"name": "Sarbecovirus",
"taxId": 2509511
},
{
"name": "Severe acute respiratory syndrome-related coronavirus",
"taxId": 694009
},
{
"name": "Severe acute respiratory syndrome coronavirus 2",
"taxId": 2697049
}
],
"organismName": "Severe acute respiratory syndrome coronavirus 2",
"pangolinClassification": "B",
"sciName": "Severe acute respiratory syndrome coronavirus 2",
"taxId": 2697049
}
}
VirusAssembly Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
accession | accession | Accession | string | The accession.version of the viral nucleotide sequence. Includes both GenBank and RefSeq accessions | NC_045512.2 |
isAnnotated | is-annotated | Is Annotated | bool | The viral genome has been annotated by either the submitter (GenBank) or by NCBI (RefSeq) | |
isolate | isolate- | Isolate | VirusAssembly.Isolate | ||
sourceDatabase | sourcedb | Source database | string | Indicates if the source of the viral nucleotide record is from a GenBank submitter or from NCBI-derived curation (RefSeq) | RefSeq GenBank |
proteinCount | protein-count | Protein count | uint32 | The total count of annotated proteins including both proteins and polyproteins but not processed mature peptides | |
host | host- | Host | Organism | Taxon from which the virus sample was isolated | |
virus | virus- | Virus | Organism | Viral taxon | |
bioprojects repeated | bioprojects | BioProjects | string | Associated BioProject accessions, when available | PRJNA485481 |
location | geo- | Geographic | VirusAssembly.CollectionLocation | ||
updateDate | update-date | Update date | string | Date the viral nucleotide accession was last updated in NCBI Virus | |
releaseDate | release-date | Release date | string | Date the viral nucleotide accession was first released in NCBI Virus | |
completeness | completeness | Completeness | VirusAssembly.Completeness | ||
length | length | Length | uint32 | Length of the viral nucleotide sequence | |
geneCount | gene-count | Gene count | uint32 | Total count of genes annotated on the viral nucleotide sequence | |
maturePeptideCount | matpeptide-count | Mature peptide count | uint32 | Total count of processed mature peptides annotated on the viral nucleotide sequence | |
biosample | biosample-acc | BioSample accession | string | Associated Biosample accessions | SAMN15394129 |
molType | mol-type | Molecule type | string | ICTV (International Committee on Taxonomy of Viruses) viral classification based on nucleic acid composition, strandedness and method of replication | |
annotation | annot- | Annotation | VirusAnnotation | ||
nucleotide | SeqRangeSetFasta | The whole genomic nucleotide record of the CDS feature. | |||
purposeOfSampling | purpose-of-sampling | Purpose of Sampling | PurposeOfSampling | ||
sraAccessions repeated | sra-accs | SRA Accessions | string | SRA accessions linked to the genbank genome |
ConservedDomain Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
accession | accession | Accession | string | cdd accession | |
name | name | Name | string | ||
range | range- | Range | Range | range on the protein |
LineageOrganism Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
taxId | coming soon | coming soon | uint32 | NCBI Taxonomy identifier | 11118 |
name | coming soon | coming soon | string | Scientific name | Coronaviridae |
Organism Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
taxId | tax-id | Taxonomic ID | uint32 | NCBI Taxonomy identifier | 9606 2697049 |
organismName | organism-name | Organism Name | string | Scientific name | Homo sapiens Severe acute respiratory syndrome coronavirus 2 |
commonName | common-name | Common Name | string | Common name | human pangolin MERS SARS2 |
lineage repeated | LineageOrganism | Lineage ordered from superkingdom level to increasingly more specific taxonomic entries | |||
strain | strain | Strain | string | SE11 | |
pangolinClassification | pangolin | Pangolin Classification | string | B.1.1.7 |
Range Structure
A 1-based range on a sequence record.
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
begin | start | Start | uint64 | ||
end | stop | Stop | uint64 | ||
orientation | orientation | Orientation | Orientation | ||
order | order | Order | uint32 |
SeqRangeSetFasta Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
seqId | seq-id | Sequence ID | string | Seq_id may include location info in addition to a sequence accession | |
accessionVersion | accession | Accession | string | Accession and version of the viral nucleotide sequence | |
title | title | Title | string | ||
sequenceHash | hash | Hash | string | Unique identifier for identical sequences | |
range repeated | range- | Range | Range | Series of intervals on above accession_version |
VirusAnnotation Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
genes repeated | gene- | Gene | VirusGene |
VirusAssembly.CollectionLocation Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
geographicLocation | location | location | string | Country of virus specimen collection | USA France |
geographicRegion | region | region | string | Region of virus specimen collection | Asia North America |
VirusAssembly.Isolate Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
name | lineage | Lineage | string | BioSample harmonized attribute names https://www.ncbi.nlm.nih.gov/biosample/docs/attributes/ | |
source | lineage-source | Lineage source | string | Source material from which the viral specimen was isolated | blood feces lung |
collectionDate | collection-date | Collection date | string | The collection date for the sample from which the viral nucleotide sequence was derived |
VirusGene Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
name | name | Name | string | ||
geneId | gene-id | NCBI GeneID | uint32 | ||
nucleotide | genomic- | Genomic | SeqRangeSetFasta | The interval on the genomic nucleotide record of the CDS feature. | |
cds repeated | cds- | CDS | VirusPeptide | polyprotein or protein cds |
VirusPeptide Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
accession | accession | Accession | string | Protein accession and version | |
name | name | Name | string | Protein name | |
otherNames repeated | other-names | Other Names | string | Alternate names for this protein | |
nucleotide | nuc-fasta- | Nucleotide FASTA | SeqRangeSetFasta | The interval on the genomic nucleotide record of this mature-peptide feature | |
protein | protein-fasta- | Protein FASTA | SeqRangeSetFasta | The full polyprotein record or interval on the polyprotein for mature-peptide features | |
pdbIds repeated | pdb-ids | PDB IDs | string | PDB identifiers for this protein | |
cdd repeated | cdd- | CDD | ConservedDomain | Conserved Domains associated with this protein | |
uniProtKb | uniprot- | UniProt | VirusPeptide.UniProtId | UniProt identifier | |
maturePeptide repeated | mat-peptide- | Mature Peptide | VirusPeptide | Enzymatically processed products of a polyprotein | |
proteinCompleteness | prot-completeness | Protein Completeness | VirusAssembly.Completeness | Protein completeness |
VirusPeptide.UniProtId Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
id | id | ID | string | UniProt ID | |
name | name | Name | string | UniProt name |
Orientation Enumeration
Name | Number | Description |
---|---|---|
none | 0 | |
plus | 1 | |
minus | 2 |
PurposeOfSampling Enumeration
Name | Number | Description |
---|---|---|
PURPOSE_OF_SAMPLING_UNKNOWN | 0 | |
PURPOSE_OF_SAMPLING_BASELINE_SURVEILLANCE | 1 |
VirusAssembly.Completeness Enumeration
Name | Number | Description |
---|---|---|
UNKNOWN | 0 | |
COMPLETE | 1 | |
PARTIAL | 2 |
Scalar Value Types
Protocol buffers type | Notes | C++ | Python | Java | Go |
---|---|---|---|---|---|
double | double | float | double | float64 | |
float | float | float | float | float32 | |
int32 | Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead. | int32 | int | int | int32 |
int64 | Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead. | int64 | int/long | long | int64 |
uint32 | Uses variable-length encoding. | uint32 | int/long | int | uint32 |
uint64 | Uses variable-length encoding. | uint64 | int/long | long | uint64 |
sint32 | Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s. | int32 | int | int | int32 |
sint64 | Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s. | int64 | int/long | long | int64 |
fixed32 | Always four bytes. More efficient than uint32 if values are often greater than 2^28. | uint32 | int | int | uint32 |
fixed64 | Always eight bytes. More efficient than uint64 if values are often greater than 2^56. | uint64 | int/long | long | uint64 |
sfixed32 | Always four bytes. | int32 | int | int | int32 |
sfixed64 | Always eight bytes. | int64 | int/long | long | int64 |
bool | bool | boolean | boolean | bool | |
string | A string must always contain UTF-8 encoded or 7-bit ASCII text. | string | str/unicode | String | string |
bytes | May contain any arbitrary sequence of bytes. | string | str | ByteString | []byte |