Pathogen Detection Resources at Google Cloud Platform
NCBI Pathogen Detection provides data at Google Cloud Platform in two places: The MicroBIGG-E and Isolates Browser tables are in Google BigQuery and MicroBIGG-E sequences FASTA format are in Google Cloud Storage.
Pathogen Detection Resources available on the Google Cloud
- Pathogen Detection Resources at Google Cloud Platform
- Getting started with BigQuery
- MicroBIGG-E table in BigQuery
- MicroBIGG-E contig sequences in Google Storage buckets
- MicroBIGG-E protein sequences in Google Storage buckets
- Isolates Browser table in BigQuery
- Isolate Exceptions table in BigQuery
- BioProject Hierarchy in BigQuery
Documentation for Pathogen Detection resources at Google Cloud
- YouTube video NCBI minute: Introduction to NCBI Pathogen Detection and antimicrobial resistance data in Google BigQuery and links to materials and information
- Getting started with Google BigQuery - Information about how to use Google BigQuery for accessing NCBI Pathogen Detection data in BigQuery tables.
- Isolates browser at GCP - The data from the Isolates Browser
- MicroBIGG-E at GCP - The data from the Microbial Browser for the Identification of Genetic and Genomic Elements (MicroBIGG-E) - Information about, and contig and protein seqeunces for genes and point mutations identified in Pathogen Detection isolates by AMRFinderPlus.
- BioProject Hierarchy at GCP - All data BioProjects for isolates in the Pathogen Detection browsers as well as any parent umbrella BioProjects they are children of.
- Email us with questions or feedback at [email protected]
ASM NGS 2022 Workshop
We participated in a workshop at ASM NGS 2022 which included some projects that demonstrate how to use our resources in the cloud. See the Workshop Setup, Project 1: Use BigQuery to search MicroBIGG-E and Isolates data, Project 2: Generate tree of KPC alleles to examine evolution of size variants , and Project 3: Selection analysis on 293-aa blaKPC genes for examples of how to use NCBI Pathogen Detection data in the cloud.
Update frequency
Pathogen Detection data at Google Cloud Platform is updated daily, so updated results may lag behind results presented in the browsers or on FTP by up to one day.
Additional related resources
Tools
- SRA in the cloud SRA metadata in Google BigQuery and AWS Athena
- GCP RAPT (Read assembly and annotation pipeline tool) Cloud-ready tool to assemble and annotate microbial genomes. See also main page for RAPT.
- ElasticBLAST Parallelized BLAST for the cloud
Google docs
Don't have a cloud account?
NIH Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) Initiative has launched a new NIH Cloud Lab program that lets you experiment with using cloud for your research. You can request a GCP or AWS account, and will receive $500 and three months, in addition to access to biomedical tutorials that walk you through common cloud-based research use cases. This is available to intramural researchers currently but expect it to be ready for extramural researchers in the coming months. Learn more via this link- https://cloud.nih.gov/resources/cloudlab/
If you have any questions or issues or anything that you need let us know at [email protected]