Health
Pathogen Detection
Help
Datasets downloads for Pathogen Detection isolates

Datasets downloads for Pathogen Detection isolates

The Datasets command-line tool allows you to download large amounts of data using the "dehydrated download" feature. To download large numbers of assemblies using this tool first you need to Install datasets.

Here's an example of how to use this feature to download all 68,000+ Listeria monocytogenes genomes available in Pathogen Detection and included in GenBank.

1. Select the isolates you want to download using the Isolates Browser

In this example we will download all Listeria monocytogenes genomes using the query taxgroup_name:"Listeria monocytogenes" in the Isolates Browser and click on the Download link.

Click download buttion

2. Select Assembly accessions and click Download to download.

Select Assembly accessions option to download

3. Download a "dehydrated data package" using the datasets CLI.

Here we will download the assembly, annotation GFF3 file, and annotated proteins

datasets download genome accession --dehydrated --inputfile accessions.txt --include genome,gff3,protein

4. Unzip the downloaded zip archive to a directory

unzip ncbi_dataset.zip -d listeria_genomes

5. "Rehydrate" the extracted zip archive to retrieve the data

datasets rehydrate --directory listeria_genomes

For more information on the datasets CLI see the datasets CLI documentation.