Download SARS-CoV-2 protein sequences
Download sequences and metadata for selected SARS-CoV-2 proteins
Download SARS-CoV-2 protein sequences
Retrieve genome and protein sequences, and metadata for selected SARS-CoV-2 proteins through the command line tool or programming languages. The selected proteins can be filtered by host organism and geographic location.
For an overview of the downloaded package contents, see the NCBI Datasets SARS-CoV-2 Data Package description.
Download Selected Proteins
Download all Spike protein sequences
datasets download virus protein S
For more information, see the Datasets Python API reference documentation
In this example, we are using the sars2_protein_download method from ncbi-datasets-pylib.
from ncbi.datasets.openapi import ApiClient as DatasetsApiClient
from ncbi.datasets.openapi import ApiException as DatasetsApiException
from ncbi.datasets import VirusApi as DatasetsVirusApi
zipfile_name = "sars_cov2_protein_dataset.zip"
with DatasetsApiClient() as api_client:
virus_api = DatasetsVirusApi(api_client)
try:
print("Begin download of virus protein data package ...")
virus_protein_ds_download = virus_api.sars2_protein_download(
["SPIKE"],
complete_only=True,
host="human",
include_annotation_type=["PROT_FASTA", "CDS_FASTA"],
_preload_content=False,
)
with open(zipfile_name, "wb") as f:
f.write(virus_protein_ds_download.data)
print(f"Download completed -- see {zipfile_name}")
except DatasetsApiException as e:
print(f"Exception when calling sars2_protein_download: {e}\n")
Download support for the R language is not yet available. Please check back for updates!