How to Submit to dbSNP:
V. Formatting Submission Metadata (Meta)files
Version 4.1; December 11, 2015
There are two distinct files required for a dbSNP VCF submission:
1. Required metadata files or metafiles that include publication, method, population, and assay information associated with the submission. You can
submit these Meta files separately or combine them into a single text file for submission. Specifications for each Meta file are available below in the
dbSNP Metafile Specifications section of this document.
2. A Submission file for your data. We prefer that you submit your data in VCF format, (see above) but dbSNP does also accept other formats.
Formatting Metafiles
What is a MetaFile?
A Metafile is a file you submit to dbSNP that contains publication, method, population, and assay metadata associated with the data you will be submitting. Metafiles are required by dbSNP for each submission and should be submitted seperately from your variation data. You can combine your metafiles into a single text file for submission, or you can submit each Metafile type individually if you prefer.
dbSNP Metadata File Specifications
Publication
Below are the valid tags that can be used in the publication portion of the metadata files submitted to dbSNP as well as a brief description of the data required for each tag. Usage examples follow the tags and their descriptions
Note: dbSNP requires the tags TYPE, TITLE, YEAR, and STATUS in all publication meta sections. The TYPE field is obligatory at the beginning of each entry, even if there are multiple entries of a given type in a file.
TYPE: | Must be "PUB" for publication entries. |
HANDLE: | Submission handle supplied by NCBI |
MEDUID: | Medline unique identifier. Not required, so include it only if you know it. |
PMID: | PubMed unique identifier. Not required, so include it only if you know it. |
TITLE: | Title of article. Insert your entry starting on the line below the tag and use multiple lines if necessary. |
AUTHORS: | Author's names using this format: FamilyName AA, FamilyName BC, FamilyName DE. Insert your entries starting on the line below tag and use multiple lines if necessary. |
JOURNAL: | Journal name |
VOLUME: | Volume number |
SUPPL: | Supplement number |
ISSUE: | Issue number |
I_SUPPL: | Issue supplement number |
PAGES: | Page, format: 123-9 |
YEAR: | Year of publication. |
STATUS: | Enter one of the following: 1=unpublished, 2=submitted, 3=in press, 4=published |
NOTE: The TITLE field is a free format string. dbSNP requires that you put an identical string in the CITATION field of the SNP assay or use section, since we will be matching that field automatically against the publications in the publication table and replacing the string with the publication id in the dbSNP table. In practice the handle and title, in combination, must be unique, so submitters may choose any title they wish, even for unpublished citations, as long as it is distinct from other titles that they have used.
Publication Tag Usage Example 1:
TYPE: PUB
HANDLE: KAMBOH
TITLE:
Human Chromosome 8
AUTHORS:
Kamboh MI
YEAR: 2014
STATUS: 1
Publication Tag usage Example 2:
TYPE: PUB
HANDLE: KAMBOH
PMID: 24212298
TITLE:
Lipoprotein lipase gene sequencing and plasma lipid profile
AUTHORS:
Pirim D, Wang X, Radwan ZH, Niemsiri V, Hokanson JE, Hamman RF, Barmada MM, Demirci FY, Kamboh MI
JOURNAL: J Lipid Res.
VOLUME: 55
PAGES: 85-93
YEAR: 2014
STATUS: 4
Method
Below are the valid tags that can be used in the Method portion of the metadata files submitted to dbSNP as well as a brief description of the data required for each tag. Usage examples follow the tags and their descriptions.
NOTE: The TYPE field is required at the beginning of each Method entry, even if there are multiple entries of a given type in a file.
TYPE: | Entry type - must be "Method" for method entries. |
HANDLE:<handle> | Submission handle as supplied by NCBI |
ID: <local_method_ID> | Use the identifier your lab uses to refer to a method for assaying variation. However, if you are, or have submitted your sequences to SRA, use your SRA experiment accession (e.g.SRX1131768) as your method ID. |
METHOD_CLASS: |
1. In order to maintain consistency between databases, dbSNP and dbVar have adopted SRA Method_Class values. See TABLE 1 for a list of accepted values. If if you are or have submitted your sequences to SRA, and have used your SRA experiment accession as your method ID, you do not have to provide a Method_Class value.
2. If you have employed library enrichment, screening and/or selection methods in your assay, include either free text describing your enrichment/selection method following your method_class value, or use the appropriate value from TABLE 2. |
SEQ_BOTH_STRANDS: <YES, NO, NA, UNKNOWN> | Were both strands sequenced? |
TEMPLATE_TYPE: <DIPLOID,CLONE, OTHER, UNKNOWN> | Was the template DNA used in the assay derived from a clone or from a diploid genomic DNA extraction? |
MULT_PCR_AMPLIFICATION: <YES, NO, NA, UNKNOWN> | Were independent PCR amplifications tested? |
MULT_CLONES_TESTED: <YES, NO, NA, UNKNOWN> | Were Independent clones tested? |
METHOD: | A desription of the method given in multiple lines of free text. Line breaks will be preserved. |
PARAMETER: | Provide the reaction parameters starting on the line below the tag and use multiple lines if necessary. |
Method Tag Usage Example:
TYPE: METHOD
HANDLE: WHOEVER
ID: My_Variatio_Seq_method
METHOD_CLASS: WGS Size Fractionation
SEQ_BOTH_STRANDS: YES
TEMPLATE_TYPE: DIPLOID
MULT_PCR_AMPLIFICATION: YES
MULT_CLONES_TESTED: NO
METHOD: PCR reactions were performed with genomic DNA and products were analysed by DNA sequencing.
PARAMETER:
Template: 50 ng genomic DNA
Primer: each 0.5 uM
dNTPs: each 0.2 mM
PCR Buffer: 5 ul (10X), Mg 2+ 1.5 mM, Taq Polymerase: 1.25units/ul
Population
Below are the valid tags that can be used in the Population portion of the metadata files submitted to dbSNP as well as a brief description of the data required for each tag. Usage examples follow the tags and their descriptions.
NOTE: The TYPE field is required at the beginning of each population entry, even if there are multiple entries of a given type in a file.
TYPE: | Entry type - the value placed here must be "POPULATION" for Population entries. |
HANDLE:<handle> | Submission handle as supplied by NCBI |
ID:<local population id> |
The identifier you or your lab uses to refer to: |
MANDATORY: | This tag is for a mandatory free text comment that should be displayed if the sequence being submitted comes from a population whose consent form requires it. If the sample being submitted does not come from a population that requires a statement, you can skip this tag. |
POPULATION: |
This field contains multiple lines of free text to allow you to describe your population in greater detail. We encourage you to format your text in this field as PARAMETER:VALUE pairs whenever you can to maintain line breaks and allow your data to be more easily queried. |
Population Tag Usage Example:
TYPE:POPULATION
HANDLE:WHOEVER
ID:YOUR_POP
POP_CLASS: EUROPE
POPULATION:
Continent:Europe
Nation:Some Nation
Phenotype:You name it
Note: The tags "Continent", "Nation", and "Phenotype" used in the Population Tag Example above are for illustrative purposes only. Choose tags for your submission which you think are meaningful for your particular population. You can also choose not to use tag:value pairs in the POPULATION field if it doen't make sense to use them for your population.
Assay
Below are the valid tags that can be used in the Assay portion of the metadata files submitted to dbSNP as well as a brief description of the data required for each tag. Usage examples follow the tags and their descriptions.
NOTE: Required fields in the Assay Metadata section are are HANDLE, BATCH, MOLTYPE, SAMPLE SIZE and METHOD. The additional tags you see in the header tag descritpions below are optional. If the ORGANISM tag is left out of the Assay meatadata section, dbSNP will assume it is Homo sapiens.
TYPE: | SNPASSAY Entry type, must be "SNPASSAY" for these. |
HANDLE: <handle> | Required field. Submission handle as supplied by NCBI |
BATCH: <local_batch_ID> | Required field. . A local_batch_ID is simply a name you give to the set of variation assays or experiments you are submitting. The local_batch_ID allows for clear reference to the submitted set in communication between NCBI and submitters. |
MOLTYPE: Genomic|cDNA|Mito|Chloro | Required field. As the molecule type can vary by method, it must be placed in the header. If you want to submit a mixture of molecular types, split your submission so each that each submission contains variations assayed using a single moltype. |
METHOD: <local_method_id> | Required field. A local method id is the identifier you or your lab uses to refer to a method for assaying variation. However, if you are,or have submitted your sequences to SRA, use your SRA experiment accession (e.g.SRX1131768) as your method ID. |
METHOD_EX: free text | Free text for explaining the given method in detail. |
SUCCESS_RATE: 100% | Probability that the variant is real, based on validation. Defined as: 1 - false positive rate. |
SAMPLESIZE: <number> | Required field. The number of distinct chromosomes examined during the discovery of the variation. |
SYN NAMES: <name,[name, name, etc...]> |
Define the meaning of the synonyms presented on the "SYNONYM" lines allowed for each SNP assay in |
ORGANISM: SCIENTIFIC NAME | as per NCBI Taxonomy |
STRAIN: Strain or breed name | Provide the strain or breed if the sampled germplasm has distinctive properties (e.g. inbred mice, commercial livestock breeds, or pooled DNA sample for SNP discovery). Individuals with genotype data referencing variations in this batch may have different strain or breed attributes. Provide these data separately in the population and pedigree (need link) sections. |
CULTIVAR: cultivar name | Provide the cultivar name if the organism is a laboratory cultivar. |
POPULATION: <local population id> |
This is the identifier you or your lab uses to refer to: 1. the population of individuals used to define a SNP assay, or 2. the population that was assayed for variants Note: some population strings will be predefined, or "globally" defined, and can be used by more than one submitter. The handle for these globally defined populations is 'NCBI'. To remove ambiguity, populations will always be used as <handle>|<population id> |
CITATION:Title of publication | The title of publication associated with the variants being submitted. Be sure that the title entered here matches the title of the entry in the publication section of this submission. This field may repeat. If this field is ommitted and a single citation is included in the batch, the parser will associate the citation with the assay. |
LINKOUT_URL: free text | A free text (255 char max) URL that links to the submitter's local website. NCBI requests that links to data for individual SNP records be formed by the concatenation of this URL string with the local SNP ID. |
COMMENT: free text | Free text for public viewing. Anything written in this field will be shown with each SNP assay in this batch. |
PRIVATE: free text | Free text for you to comment to NCBI about the processing of the batch being submitted. |
Assay Tag Usage Example:
Here is what a theoretical submission of a set of SNP assays from the Whitehead Institute (handle: 'WI') might look like:
TYPE:SNPASSAY
HANDLE:WI
BATCH: 1.98
MOLTYPE:Genomic
METHOD:RESEQ
SYN NAMES:WI-SNP,DnaId,MapDna
COMMENT:
This is where you place a public comment that applies to the entire
batch of SNPS you are submitting.
PRIVATE:
This is where you place a note to NCBI regarding the processing
of the submission that will not be seen by the public.
An example of such a note might be:
Note: These are is not exactly real variants, as
the data were modified.
Appendix
Table 1: METHOD_CLASS Values for Sequencing Strategies
METHOD_CLASS Value | Description |
---|---|
WGS |
Whole Genome Sequencing - random sequencing of the whole genome (see pubmed 10731132 for details) |
WGA |
Whole Genome Amplification followed by random sequencing. (see pubmed 1631067, 8962113 for details) |
WXS |
Random sequencing of exonic regions selected from the genome. (see pubmed 20111037 for details) |
RNA-Seq | Random sequencing of whole transcriptome, also known as Whole Transcriptome Shotgun Sequencing, or WTSS). (seepubmed 18611170 for details) |
ssRNA-seq |
Important for "strand-specific" RNA-seq experiments which have the advantage that the information about the |
miRNA-Seq |
Micro RNA sequencing strategy designed to capture post-transcriptional RNA elements and include non-coding functional elements. (see pubmed 21787409 for details) |
ncRNA-Seq |
Capture of other non-coding RNA types, including post-translation modification types such as snRNA (small nuclear RNA) or snoRNA (small nucleolar RNA), or expression regulation types such as siRNA (small interfering RNA) or piRNA/piwi/RNA (piwi-interacting RNA). |
FL-cDNA | Full-length sequencing of cDNA templates |
EST | Single pass sequencing of cDNA templates |
WCS | Random sequencing of a whole chromosome or other replicon isolated from a genome. |
RAD-Seq | |
CLONE | Genomic clone based (hierarchical) sequencing. |
POOLCLONE | Shotgun of pooled clones (usually BACs and Fosmids). |
AMPLICON |
Sequencing of overlapping or distinct PCR or RT-PCR products. For example, metagenomic community profiling using SSU rRNA . |
CLONEEND | Clone end (5', 3', or both) sequencing. |
FINISHING | Sequencing intended to finish (close) gaps in existing coverage. |
ChIP-Seq |
chromatin immunoprecipitation. |
MNase-Seq | following MNase digestion. |
DNase-Hypersensitivity | Sequencing of hypersensitive sites, or segments of open chromatin that are more readily cleaved by DNaseI. |
Bisulfite-Seq |
MethylC-seq. Sequencing following treatment of DNA with bisulfite to convert cytosine residues to uracil depending on methylation status. |
CTS | Concatenated Tag Sequencing |
MRE-Seq | Methylation-Sensitive Restriction Enzyme Sequencing. |
MeDIP-Seq | Methylated DNA Immunoprecipitation Sequencing. |
MBD-Seq | Methyl CpG Binding Domain Sequencing. |
Tn-Seq |
Quantitatively determine fitness of bacterial genes based on how many times a purposely seeded transposon gets inserted into each gene of a colony after some time. |
VALIDATION | CGHub special request: Independent experiment to re-evaluate putative variants. |
FAIRE-seq | Formaldehyde Assisted Isolation of Regulatory Elements |
SELEX | Systematic Evolution of Ligands by Exponential enrichment |
RIP-Seq | Direct sequencing of RNA immunoprecipitates (includes CLIP-Seq, HITS-CLIP and PAR-CLIP). |
ChIA-PET | Direct sequencing of proximity-ligated chromatin immunoprecipitates. |
Synthetic-Long-Read | binning and barcoding of large DNA fragments to facilitate assembly of the fragment |
Targeted-Capture | enrichment of a targeted subset of loci. used for cancer gene panels, genetic testing panels, etc. replaces VALIDATION |
Tethered Chromatin Conformation Capture | Citation Needed |
OTHER | Library strategy not listed. |
Table 2: MEDTHOD_CLASS Values for Library Enrichment, Screening and/or Selection Strategies
METHOD_CLASS | DESCRIPTION |
---|---|
RANDOM | No Selection or Random selection |
PCR | target enrichment via PCR |
RANDOM PCR | Source material was selected by randomly generated primers. |
RT-PCR | target enrichment via |
HMPR | Hypo-methylated partial restriction digest |
MF | Methyl Filtrated |
Repeat Fractionation |
Selection for less repetitive (and more gene rich) sequence through Cot filtration (CF) or other fractionationtechniques based on DNA kinetics. |
Size Fractionation | Physical selection of size appropriate targets. |
MSLL | Methylation Spanning Linking Library |
cDNA PolyA | selection or enrichment for messenger RNA (mRNA); synonymize with PolyA |
cDNA_randomPriming | |
cDNA_oligo_dT | |
PolyA | PolyA selection or enrichment for messenger RNA (mRNA); should replace cDNA enumeration. |
Oligo-dT | enrichment of messenger RNA (mRNA) by hybridization to Oligo-dT. |
Inverse rRNA | depletion of ribosomal RNA by oligo hybridization. |
Inverse rRNA selection | depletion of ribosomal RNA by inverse oligo hybridization. |
ChIP | Chromatin immunoprecipitation |
MNase |
Micrococcal Nuclease (MNase) digestion |
DNase | Deoxyribonuclease (MNase) digestion |
Hybrid Selection | Selection by hybridization in array or solution. |
Reduced Representation | Reproducible genomic subsets, often generated by restriction fragment size selection, containing a manageable number of loci to facilitate re-sampling. |
Restriction Digest | DNA fractionation using restriction enzymes. |
5-methylcytidine antibody | Selection of methylated DNA fragments using an antibody raised against 5-methylcytosine or 5-methylcytidine (m5C). |
MBD2 protein methyl-CpG binding domain | Enrichment by methyl-CpG binding domain. |
CAGE | Cap-analysis gene expression. |
RACE | Rapid Amplification of cDNA Ends. |
MDA |
Multiple Displacement Amplification, a non-PCR based DNA amplification technique that amplifies a minute quantifies of DNA to levels suitable for genomic analysis. |
Padlock probes capture method |
Targeted sequence capture protocol covering an arbitrary set of nonrepetitive genomics targets. An example is capture bisulfite sequencing using padlock probes (BSPP). |
Other | Other library enrichment, screening, or selection process. |
Unspecified | Library enrichment, screening, or selection is not specified. |
Contact dbSNP
If you do not find the answer to your submission questions in the How to Submit to dbSNP document series, contact dbSNP submissions at [email protected], and we will do our best to answer your submission question or help you solve a difficult submission problem.
- Send submissions and submission questions to: [email protected]
- Send submission updates to: [email protected]
- Send general inquiries, etc. to: [email protected]
Other Titles in the How to Submit to dbSNP Series:
- I. Submission Quick Start
- II. Introduction and Submission Overview
- III. The dbSNP Pre-Submission Process
- IV. BioProject, BioSample and dbSNP
- V. Introduction to Submission Formatting and Formatting Metadata (Meta) files
- VI. Formatting Data for Submission
- VII. Formatting Data for Updates and Withdrawals