Using the Entrez Programming Utilities with dbSNP
This quick guide provides information on using the Entrez Programming Utilities (E-utilities) specifically to query the NCBI dbSNP database. Comprehensive general documentation for E-utilities may be found in “Entrez Programming Utilities Help” book. Briefly, E-utilities are an interface into the Entrez query and database system at NCBI. They can be used to search Entrez, to store search results temporarily on the History server, and to retrieve them selectively. E-utilities may be accessed over HTTP or installed locally and used from the command line (EDirect).
Searching dbSNP with ESearch
All E-utilities HTTP calls share the same base URL:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/
For searching dbSNP, the base URL is followed by the name of the E-utility, esearch.fcgi
, and a query string comprising the following parameters:
db
The database to search. Should be set to snp
.
term
The search query, in URL encoded format. The query syntax is described in Entrez Searching Options. An easy way to compose a query is to use SNP Advanced Search Builder (also see a list of dbSNP-specific search terms). For example, to find all RSIDs for pathogenic SNPs associated with FLT3 gene, one could use the query (FLT3[Gene Name]) AND "pathogenic"[Clinical Significance]
.
After URL encoding, the complete minimal ESearch call for this query becomes:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=snp&term=%28FLT3%5BGene%20Name%5D%29%20AND%20%22pathogenic%22%5BClinical%20Significance%5D%20
tool, email, api_key
Should be used to comply with the E-utilities usage guidelines and requirements.
usehistory, WebEnv, query_key
Used to store search results to and to retrieve them from the History server. ESearch returns only a list of RSIDs, or just their count. Retrieving further information about the list items involves the History server. Another reason to use the History server could be to combine multiple search results before retrieving them.
retstart, retmax, rettype, retmode, sort, field
Optional parameters described in ESearch help
Retrieving RefSNP JSON objects with EFetch
EFetch is now intended to be used to retrieve dbSNP’s new JSON objects containing the full RefSNP content. The objects’ schema is available in var_service.yaml in refsnp_snapshot:
section. To retrieve the JSON objects, parameters rettype and retmode must be set to values json
and text
, respectively.
The db parameter should be set to snp
.
In concordance with EFetch parameter specification, the set of RefSNP objects to retrieve may be specified either as a comma-separated list of RSIDs in id parameter, or as a previously obtained pair of values in WebEnv and query_key parameters.
For example, this minimal EFetch call will return two RefSNP JSON objects, for rs268 and rs328:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=268,328&rettype=json&retmode=text
Other available optional parameters are retstart and retmax, also described in EFetch parameter specification.
Accessing dbSNP from command line with EDirect
EDirect is a set of UNIX programs that bring the functionality of E-utilities to the command line. Once installed, EDirect can facilitate the construction of powerful queries and extraction pipelines with minimal coding.
The ESearch example given above can now be run run in a UNIX terminal window as:
esearch -db snp -query '(FLT3[Gene Name]) AND "pathogenic"[Clinical Significance]' \
| efetch -format uid
To fetch the two RefSNP JSON objects for rs268 and rs328, do:
epost -db snp -id 268,328 \
| efetch -format json
To extract information from the returned objects inline, a JSON processing tool like jq
may be applied. For example, the following pipeline lists PMIDs associated with each of the two RefSNPs:
epost -db snp -id 268,328 \
| efetch -format json \
| jq '{refsnp_id,citations}'
More information on EDirect functions usage is available in “Entrez Direct: E-utilities on the UNIX Command Line” book.