Download SARS-CoV-2 protein sequences

Download sequences and metadata for selected SARS-CoV-2 proteins

Download SARS-CoV-2 protein sequences

Download sequences and metadata for selected SARS-CoV-2 proteins

Retrieve genome and protein sequences, and metadata for selected SARS-CoV-2 proteins through the command line tool or programming languages. The selected proteins can be filtered by host organism and geographic location.

For an overview of the downloaded package contents, see the NCBI Datasets SARS-CoV-2 Data Package description.

Download Selected Proteins

Download all Spike protein sequences

datasets download virus protein S

For more information, see the Datasets Python API reference documentation

In this example, we are using the sars2_protein_download method from ncbi-datasets-pylib.

from ncbi.datasets.openapi import ApiClient as DatasetsApiClient
from ncbi.datasets.openapi import ApiException as DatasetsApiException
from ncbi.datasets import VirusApi as DatasetsVirusApi

zipfile_name = "sars_cov2_protein_dataset.zip"


with DatasetsApiClient() as api_client:
    virus_api = DatasetsVirusApi(api_client)
    try:
        print("Begin download of virus protein data package ...")
        virus_protein_ds_download = virus_api.sars2_protein_download(
            ["SPIKE"],
            complete_only=True,
            host="human",
            include_annotation_type=["PROT_FASTA", "CDS_FASTA"],
            _preload_content=False,
        )

        with open(zipfile_name, "wb") as f:
            f.write(virus_protein_ds_download.data)
        print(f"Download completed -- see {zipfile_name}")
    except DatasetsApiException as e:
        print(f"Exception when calling sars2_protein_download: {e}\n")
  Download support for the R language is not yet available. Please check back for updates!
Generated November 25, 2024