DATATOOL - NCBI data conversion tool
Program Description
DATATOOL is a utility program designed to convert ASN.1 specifications
into XML DTD and vice versa, and to convert data between ASN.1 and XML
formats. DATATOOL makes it possible to convert ASN.1 specification into
XML DTD or schema, DTD into ASN.1 (with limitations), and DTD into XML
schema. Also, once the specification is known, DATATOOL can convert
data
from ASN.1 to XML, or from XML to ASN.1 format.
DATATOOL is a part of
NCBI C++
toolkit
that can be freely downloaded from:
ftp://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools++/CURRENT/
For more information please refer to:
http://ncbi.github.io/cxx-toolkit/pages/ch_app
Prebuilt DATATOOL for some platforms can be found at:
ftp://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools++/BIN/CURRENT/datatool/
Basic instructions
DATATOOL can be used to formally convert any ASN.1 or XML data or data
specification.
For the list of command line arguments please refer to
http://ncbi.github.io/cxx-toolkit/pages/ch_app
Important Note
As DATATOOL performs only formal
data conversion so it cannot be used to perform any additional
processing on the converted data. If you need an additional data
processing you can either:
Example
Converting GenBank ASN.1 data file to XML:
- Obtain GenBank ASN.1 data file at:
ftp://ftp.ncbi.nlm.nih.gov/ncbi-asn1/.
Here
daily-nc
directory
contains individual files for each day's new or updated entries since
close-of-data for the last GenBank Release in ASN.1 format.
Additional documentation:
/ncbi-asn1/README.asn1
/ncbi-asn1/daily-nc/README.asn1.daily-nc
- Download the appropriate datatool binary for your platform:
ftp://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools++/BIN/CURRENT/datatool/
- Download NCBI data specification file:
https://ncbi.nlm.nih.gov/data_specs/asn/NCBI_all.asn
- Run the program:
./datatool -m NCBI_all.asn -d gbest225.aso -t Bioseq-set -px gbest225.xml
Here:
- gbest225.aso
- is the name of the source GenBank data file in ASN binary
format
- Bioseq-set
- is the name of the data type in the source file
- gbest225.xml
- is the name of the output file in XML format
PLEASE NOTE:
-
The uncompressed XML file is about 10 times the size of the compressed
binary ASN.1 file, so it can be extremely big.
- The ASN.1 usually contains more than the GenBank files; it
includes other databases like PDB and RefSeq, gaps of HTG records, and
shows the quality scores of HTG records. For further information, read:
ftp://ftp.ncbi.nlm.nih.gov/ncbi-asn1/README.asn1
Please email questions at:
[email protected]
Last updated: Mar 30, 2006