Download SARS-CoV-2 genomes

Download sequences for SARS-CoV-2 GenBank genomes by taxon or lineage

Download SARS-CoV-2 genomes

Download sequences for SARS-CoV-2 GenBank genomes by taxon or lineage

Get genome and protein sequences, and annotation for SARS-CoV-2 GenBank genomes through the command line tool or programming languages. The selected genomes can be filtered by host organism, lineage and geographic location.

For an overview of the downloaded package contents, see the NCBI Datasets SARS-CoV-2 Data Package description.

Download by SARS-CoV-2 lineage

Download SARS-CoV-2 GenBank genomes for specific lineages as classified by pangolin

datasets download virus genome taxon SARS2 --lineage P.1 --filename SARS-CoV-2-P.1.zip

To get started with the Python library, see the Datasets Python API reference documentation.

First download the data package for the selected virus taxon using the virus_genome_download method from ncbi-datasets-pylib. This function takes both the taxonomy name and an optional pangolin lineage. Then, to access the results, open the zip file and print the catalog to show all of the included files using the VirusDataset class in ncbi.datasets.package.dataset.

from ncbi.datasets.openapi import ApiClient as DatasetsApiClient
from ncbi.datasets.openapi import ApiException as DatasetsApiException
from ncbi.datasets import VirusApi as DatasetsVirusApi

from ncbi.datasets.package import dataset

zipfile_name = "sars_cov2_dataset.zip"
pangolin_classification = "B.1.427"

with DatasetsApiClient() as api_client:
    virus_api = DatasetsVirusApi(api_client)
    try:
        print("Begin download of virus data package ...")
        virus_ds_download = virus_api.virus_genome_download(
            "SARS2",
            complete_only=True,
            include_annotation_type=["PROT_FASTA", "CDS_FASTA"],
            pangolin_classification=pangolin_classification,
            _preload_content=False,
        )

        with open(zipfile_name, "wb") as f:
            f.write(virus_ds_download.data)
        print(f"Download completed -- see {zipfile_name}")
    except DatasetsApiException as e:
        print(f"Exception when calling virus_genome_download: {e}\n")

# open the package zip archive so we can retrieve files from it
package = dataset.VirusDataset(zipfile_name)
# print the names and types of all files in the downloaded zip file
print(package.get_catalog())

Download by taxon

Download all SARS-CoV-2 GenBank genomes

datasets download virus genome taxon SARS-CoV-2

To get started with the Python library, see the Datasets Python API reference documentation.

First download the data package for the selected virus taxon using the virus_genome_download method from ncbi-datasets-pylib. Then, to access the results, open the zip file and print the catalog to show all of the included files using the VirusDataset class in ncbi.datasets.package.dataset.

from ncbi.datasets.openapi import ApiClient as DatasetsApiClient
from ncbi.datasets.openapi import ApiException as DatasetsApiException
from ncbi.datasets import VirusApi as DatasetsVirusApi

from ncbi.datasets.package import dataset

zipfile_name = "sars_cov2_dataset.zip"

with DatasetsApiClient() as api_client:
    virus_api = DatasetsVirusApi(api_client)
    try:
        print("Begin download of virus data package ...")
        virus_ds_download = virus_api.virus_genome_download(
            "SARS2",
            complete_only=True,
            host="human",
            include_annotation_type=["PROT_FASTA", "CDS_FASTA"],
            _preload_content=False,
        )

        with open(zipfile_name, "wb") as f:
            f.write(virus_ds_download.data)
        print(f"Download completed -- see {zipfile_name}")
    except DatasetsApiException as e:
        print(f"Exception when calling virus_genome_download: {e}\n")

# open the package zip archive so we can retrieve files from it
package = dataset.VirusDataset(zipfile_name)
# print the names and types of all files in the downloaded zip file
print(package.get_catalog())
  Download support for the R language is not yet available. Please check back for updates!
Generated November 25, 2024