Download SARS-CoV-2 genomes
Download sequences for SARS-CoV-2 GenBank genomes by taxon or lineage
Download SARS-CoV-2 genomes
Get genome and protein sequences, and annotation for SARS-CoV-2 GenBank genomes through the command line tool or programming languages. The selected genomes can be filtered by host organism, lineage and geographic location.
For an overview of the downloaded package contents, see the NCBI Datasets SARS-CoV-2 Data Package description.
Download by SARS-CoV-2 lineage
Download SARS-CoV-2 GenBank genomes for specific lineages as classified by pangolin
datasets download virus genome taxon SARS2 --lineage P.1 --filename SARS-CoV-2-P.1.zip
To get started with the Python library, see the Datasets Python API reference documentation.
First download the data package for the selected virus taxon using the
virus_genome_download method from ncbi-datasets-pylib. This function takes both the taxonomy name and an optional pangolin lineage.
Then, to access the results, open the zip file and print the catalog to show all of the included files using the
VirusDataset
class in ncbi.datasets.package.dataset.
from ncbi.datasets.openapi import ApiClient as DatasetsApiClient
from ncbi.datasets.openapi import ApiException as DatasetsApiException
from ncbi.datasets import VirusApi as DatasetsVirusApi
from ncbi.datasets.package import dataset
zipfile_name = "sars_cov2_dataset.zip"
pangolin_classification = "B.1.427"
with DatasetsApiClient() as api_client:
virus_api = DatasetsVirusApi(api_client)
try:
print("Begin download of virus data package ...")
virus_ds_download = virus_api.virus_genome_download(
"SARS2",
complete_only=True,
include_annotation_type=["PROT_FASTA", "CDS_FASTA"],
pangolin_classification=pangolin_classification,
_preload_content=False,
)
with open(zipfile_name, "wb") as f:
f.write(virus_ds_download.data)
print(f"Download completed -- see {zipfile_name}")
except DatasetsApiException as e:
print(f"Exception when calling virus_genome_download: {e}\n")
# open the package zip archive so we can retrieve files from it
package = dataset.VirusDataset(zipfile_name)
# print the names and types of all files in the downloaded zip file
print(package.get_catalog())
Download by taxon
Download all SARS-CoV-2 GenBank genomes
datasets download virus genome taxon SARS-CoV-2
To get started with the Python library, see the Datasets Python API reference documentation.
First download the data package for the selected virus taxon using the
virus_genome_download method from ncbi-datasets-pylib. Then, to access the results, open the zip file and print the catalog to show all of the included files using the
VirusDataset
class in ncbi.datasets.package.dataset.
from ncbi.datasets.openapi import ApiClient as DatasetsApiClient
from ncbi.datasets.openapi import ApiException as DatasetsApiException
from ncbi.datasets import VirusApi as DatasetsVirusApi
from ncbi.datasets.package import dataset
zipfile_name = "sars_cov2_dataset.zip"
with DatasetsApiClient() as api_client:
virus_api = DatasetsVirusApi(api_client)
try:
print("Begin download of virus data package ...")
virus_ds_download = virus_api.virus_genome_download(
"SARS2",
complete_only=True,
host="human",
include_annotation_type=["PROT_FASTA", "CDS_FASTA"],
_preload_content=False,
)
with open(zipfile_name, "wb") as f:
f.write(virus_ds_download.data)
print(f"Download completed -- see {zipfile_name}")
except DatasetsApiException as e:
print(f"Exception when calling virus_genome_download: {e}\n")
# open the package zip archive so we can retrieve files from it
package = dataset.VirusDataset(zipfile_name)
# print the names and types of all files in the downloaded zip file
print(package.get_catalog())
Download support for the R language is not yet available. Please check back for updates!