Validation and Discrepancy Report Error Explanations
Explanations for individual errors found during processing are listed below. Suggestions for fixing the errors are included to fix the most common issues. For genomes-related questions, write to: [email protected]. For GenBank-related questions, write to: [email protected]
Error List
- SEQ_FEAT_BadCharInAuthorLastName
- SEQ_FEAT_BadCharInAuthorName
- SEQ_FEAT_InternalStop
- GENERIC_AuthorListHasEtAl
- SEQ_DESCR_BadAltitude
- SEQ_DESCR_BadCollectionDate
- SEQ_DESCR_BadGeoLocNameCode
- SEQ_DESCR_BadPCRPrimerName
- SEQ_DESCR_BadPCRPrimerSequence
- SEQ_DESCR_BadVoucherID
- SEQ_DESCR_BioSourceMissing
- SEQ_DESCR_DuplicatePCRPrimerSequence
- SEQ_DESCR_IncorrectlyFormattedVoucherID
- SEQ_DESCR_LatLonGeoLocName
- SEQ_DESCR_LatLonFormat
- SEQ_DESCR_LatLonRange
- SEQ_DESCR_LatLonValue
- GENERIC_MissingPubRequirement
- GENERIC_MissingPubInfo
- SEQ_DESCR_UnstructuredVoucher
- SEQ_DESCR_WrongOrganismFor16SrRNA
- SEQ_DESCR_WrongVoucherType
- SEQ_DESCR_InvalidSexQualifier
- DISC_DUP_DEFLINE
- DISC_FEATURE_MOLTYPE_MISMATCH
- DISC_INFLUENZA_DATE_MISMATCH
- DISC_INFLUENZA_QUALS
- DISC_INFLUENZA_SEROTYPE
- DISC_INFLUNEZA_SEROTYPE_FORMAT
- DISC_MISSING_AFFIL
- DISC_NO_ANNOTATION
- DISC_REQUIRED_CLONE
- DISC_REQUIRED_STRAIN
SEQ_FEAT_BadCharInAuthorLastName
Explanation : An author name has illegal characters.
Suggestion : Check the last names (family names) in the sequence and publication references. Use only plain ASCII text for the names. The last name should NOT contain symbols, numbers, accents, umlauts, characters with diacritical marks, and should NOT end in punctuation. Note that names with internal punctuation such as "St. John" or "D'Abaco" will validate.
examples:
incorrect: Henry Jones., Carlos Méndez, Xu 1Weng
corrected: Henry Jones, Carlos Mendez, Xu Weng
The use of a terminal period, accent, and number in these family names causes an error. The error can be corrected by removing the symbols, characters with diacritical marks, numbers, or punctuation.
SEQ_FEAT_BadCharInAuthorName
Explanation : An author name has illegal characters.
Suggestion : Check the first names (given names) in the sequence and publication references. Use only plain ASCII text for the names. The names should NOT contain symbols, numbers, accents, umlauts, characters with diacritical marks, and should NOT end in punctuation. Note that names with internal punctuation such as "St. John" or "D'Abaco" or "Mei-Lai" are okay.
examples:
incorrect: J#ane Doe, José Perez, 1Xu Weng
corrected: Jane Doe, Jose Perez, Xu Wang
The use of symbols and numbers causes an error. The error can be corrected by removing the symbols, characters with diacritical marks, numbers, or punctuation.
SEQ_FEAT_InternalStop
Explanation : The predicted coding region contains an internal stop codon. This generally indicates errors in the nucleotide sequence or insufficient trimming of low quality sequence ends.
Suggestion : Review your sequence data. Trim low quality or questionable data from the sequences.
GENERIC_AuthorListHasEtAl
Explanation : Author list contains et al.
Suggestion : Check the names (family names) in the sequence and publication references. Use the full list of names for the authors. Do not use "et al." to represent an author list.
SEQ_DESCR_BadAltitude
Explanation : Altitude is an invalid altitude value. Altitude should be provided in meters
Suggestion : Correct or remove the altitude source modifier value. Provide the altitude in meters above or below nominal sea level. The altitude represents the geographical altitude of the location from which the sequenced sample was collected.
examples:
1235 m
-20 m
SEQ_DESCR_BadCollectionDate
Explanation : The collection date is not in the required format.
Suggestion : Correct the collection-date source modifier so the date is in the correct format. For example, a collection-date should be formatted like this: DD-MMM-YYYY, where the month is the three letter code in English. For genomes and biosample submissions, the ISO 8601 standard may be used, see descriptions and examples here .
examples of correctly formatted collection-dates:
01-Jul-1999
Nov-2010
2008
SEQ_DESCR_BadGeoLocNameCode
Explanation : The geographic location name code (up to the first colon) is not on the approved list of countries.
Suggestion : Correct the geographic location name source modifier with a location name on the approved geographic location name list and verify the value is correctly formatted. If you want to include more specific location information, you must place the approved geographic location name first, followed by a colon and then the additional information. The geographic location name has a specific format and must be formatted as follows:
<approved geographic location name>: <region or specific area>
examples:
Iceland
Canada: Vancouver
Atlantic Ocean: Charlie Gibbs Fracture Zone
SEQ_DESCR_BadPCRPrimerName
Explanation : The PCR primer name appears to be a sequence instead of an identifying label.
Suggestion : The fwd-primer-name and rev-primer-name values should not be primer sequences. Correct this information in the source modifiers. If you intended to provide primer sequences, use the fwd-primer-sequence and rev-primer-sequence source modifiers.
SEQ_DESCR_BadPCRPrimerSequence
Explanation : The PCR primer sequence has illegal characters or non-IUPAC nucleotides.
Suggestion : PCR primer sequences must only contain the nucleotide sequence. Do not include any extra information such as primer names, 5'-, or 3'. Remove the extra information so the fwd-primer-sequence and rev-primer-sequence modifiers contain nucleotides given in the IUPAC degenerate-base alphabet. If the there is an inosine (i) in the primer sequence, format with greater than and less than symbols flanking the letter i, like this "atggggaccc".
For example:
incorrect: 5'-atggggaccc-3', 5'-ttkktcaiccgc-3'
corrected: atggggaccc, ttkktcaccgc
SEQ_DESCR_BadVoucherID
Explanation : A specific identifier is missing from one of the following source modifiers: culture-collection, specimen-voucher, or bio-material.
Suggestion : Correct the format of the culture-collection, specimen-voucher, or bio-material source modifiers. The culture-collection, specimen-voucher, or bio-material is missing the identifier. Culture-collection should be used for microbial sequences, while specimen-voucher should be used for plants and animals only. Do not use specimen-voucher to describe host information for a microbial sequence submission.
The culture-collection must be formatted like this: "<institution-code>:[<collection-code>:]<culture id>". The institution code and culture ID are required, the collection-code is optional. The institution code must be valid. See the description for the proper format and list of allowed institutes.
An example culture-collection is: CBS:1234
In this example, CBS is the institution code and 1234 is the culture ID. There must be a colon between the institution code and the culture ID.
The specimen-voucher is not required to be structured in the format described above. You may remove the colon to unstructure specimen-voucher.
SEQ_DESCR_BioSourceMissing
Explanation : The biological source of this sequence has not been described correctly. A submission must have a source descriptor that covers the entire molecule. Please add the source information.
Suggestion : Provide an organism name for each sequence in your submission.
SEQ_DESCR_DuplicatePCRPrimerSequence
Explanation : The PCR primer sequence has duplicate subsequences.
Suggestion : There are multiple identical primer sequences in the source modifiers. Remove the duplicate sequences so there are only unique primer sequences.
SEQ_DESCR_IncorrectlyFormattedVoucherID
Explanation : A specific identifier is missing from one of the following source modifiers: culture-collection, specimen-voucher, or bio-material.
Suggestion : Correct the format of the culture-collection, specimen-voucher, or bio-material source modifiers. The culture-collection, specimen-voucher, or bio-material is missing the identifier. Culture-collection should be used for microbial sequences, while specimen-voucher should be used for plants and animals only. Do not use specimen-voucher to describe host information for a microbial sequence submission.
The culture-collection must be formatted like this: "<institution-code>:[<collection-code>:]<culture id>". The institution code and culture ID are required, the collection-code is optional. The institution code must be valid. See the description for the proper format and list of allowed institutes.
An example culture-collection is: CBS:1234
In this example, CBS is the institution code and 1234 is the culture ID. There must be a colon between the institution code and the culture ID.
The specimen-voucher is not required to be structured in the format described above. You may remove the colon to unstructure specimen-voucher.
SEQ_DESCR_LatLonGeoLocName
Explanation : lat_lon and geographic location name disagree
Suggestion : The latitude-longitude (lat-lon) value provided does not map to the source location provided in geographic location name (geo_loc_name), so correct or remove the lat-lon values and/or geo_loc_name source modifiers. Provide lat-lon in decimal degrees with the compass direction (for example: 39.7 N 42.1 W) and check that the lat-lon coordinates map to the country you have provided.
SEQ_DESCR_LatLonFormat
Explanation : The format of lat-lon should be dd.dd N|S ddd.dd E|W.
Suggestion : Correct the latitude-longitude (lat-lon) source modifier with lat-lon coordinates in decimal degree format with the compass directions. For example: 39.7 N 42.1 W
SEQ_DESCR_LatLonRange
Explanation : Latitude or longitude is out of range.
Suggestion : Correct or remove the latitude-longitude (lat-lon) values in the source modifiers. Provide lat-lon in decimal degrees and include the compass direction (for example, 39.7 N 42.1 W). Longitude values range from 0 to 180E or 0 to 180W. Latitude values range from 0 to 90 N or 0 to 90 S. Numbers outside of these ranges will cause errors.
SEQ_DESCR_LatLonValue
Explanation : Latitude or longitude values appear to be in the wrong hemisphere or swapped.
Suggestion : Correct or remove the latitude-longitude (lat-lon) values in the source modifiers. The lat-lon value for the record does not agree with the source location provided in geo_loc_name. Based on the source location, the lat-lon value appears to have the incorrect hemisphere or is swapped. Check the coordinates and compass direction and provide the correct values.
GENERIC_MissingPubRequirement
Explanation : The publication is missing essential information, such as title or authors.
Suggestion : Check the references. Provide author names, a title, and select the publication status (unpublished, in press, or published). If the title is published or is in press, provide additional information including publication year, journal, volume, and pages, where applicable.
GENERIC_MissingInfo
Explanation : One of the publication requirements is missing.
Suggestion : If the publication is in-press or published check that the year, authors, title and journal name have been added. For the submitter reference, check that the institution and country are present.
SEQ_DESCR_UnstructuredVoucher
Explanation : The culture-collection needs to be structured as "<institution-code>:[<collection-code>:]<culture id>".
Suggestion : Correct the format of the culture-collection source modifier. The institution code and culture ID are required, the collection-code is optional. Follow the formatting instruction in the explanation. The culture collection must have a valid institution code followed by a colon and the culture ID. See the list of allowed institutes.
For example CBS:1234
In this example, CBS is the insitution code and 1234 is the culture ID. There must be a colon between the institution code and the culture ID.
If the collection is not on the list of allowed institutes, please send an email to [email protected] with the following information: your GenBank SUB#, confirmation the collection is a curated specimen collection, a home page for the collection, and a name, phone and email for the curator.
SEQ_DESCR_WrongOrganismFor16SrRNA
Explanation: 16S ribosomal RNA is not present in eukaryotic nuclear ribosomes.
Suggestions: Check the organism names for the sequences you are submitting. If you are submitting prokaryotic 16S ribosomal RNA, the organism names should have a prokaryotic name. Do not use the host as the organism name. For example if bacterial 16S rRNA was sequenced from human samples, use uncultured bacterium as the organism name, do NOT use Homo sapiens as the organism name.
Check the tool you are using to submit. If you are using the prokaryotic 16S rRNA sequence submission wizard for sequences that are not prokaryotic 16S rRNA, you should use a different submission tool to submit the sequences.
SEQ_DESCR_WrongVoucherType
Explanation : The institution (or institution: collection) code normally uses a different bio-material/culturecollection/specimen voucher type.
Suggestion : In the source modifiers, use the source modifier "culture-collection" instead of "specimen-voucher" or vice versa. For example, if you provided the source modifiers in a tab-delimited table, edit the table so the column header "culture-collection" is used in place of "specimen-voucher" and upload the revised table.
Note that culture-collection should be used for microbial sequences, while specimen-voucher should be used for plants and animals only. Do not use specimen-voucher to describe host information for a microbial sequence submission.
SEQ_DESCR_InvalidSexQualifier
Explanation : The provided information is not on the approved list of values for this qualifier.
Suggestion : Use one of the terms listed below. If your term is not listed you may have used an incorrect qualifier. See descriptions and examples.
Values for Sex: female, male, hermaphrodite, unisexual, bisexual, asexual, monoecious [or monecious], dioecious [or diecious]
DISC_DUP_DEFLINE
Explanation : Definition lines should be unique
Suggestion : Some of your records do not have unique information. Provide a combination of unique source information. For example, provide unique clone names. For example, two unique clone names are xyz1-123 and xyz2-567.
DISC_FEATURE_MOLTYPE_MISMATCH
Explanation : Sequences with rRNA or misc_RNA features should be genomic DNA
Suggestion : The molecule type should represent what type of molecule was actually sequenced. In general, submissions that are sequencing rRNA genes or rRNA/ITS regions are actually sequencing genomic DNA. Correct this information where you indicated the type of molecule that was sequenced. For example if you indicated rRNA in your FASTA definition lines, remove this information or change it to genomic DNA. If you actually sequenced RNA, please write to [email protected] with your submission number.
DISC_INFLUENZA_DATE_MISMATCH
Explanation : There is a discrepency in the year given in the strain and the collection_date
Suggestion : Please verify the correct date of collection of the virus in the field and update either the strain or collection_date.
DISC_INFLUENZA_QUALS
Explanation : Submissions of Influenza sequences require the following source information: strain, geo_loc_name, collection_date, host
Suggestion : Review the uploaded source tables and add the missing information.
DISC_INFLUENZA_SEROTYPE
Explanation : Submissions of Influenza A sequences require a serotype
Suggestion : Review the uploaded source tables and add a serotype for the Influena A submissions.
DISC_INFLUNEZA_SEROTYPE_FORMAT
Explanation : Influenza A serotypes must be in the format HxNx, Hx, Nx or mixed, where x represents a number
Suggestion : Review the provided serotype information and adjust to the correct format.
DISC_MISSING_AFFIL
Explanation : Missing affiliation
Suggestion : The name of the institution where the sequencing/analysis work was performed is missing from your submission. Provide this information where you provided your contact information.
DISC_NO_ANNOTATION
Explanation : Annotation could not be added to this sequence.
Suggestion : Verify that you have selected the correct submission type for your sequence. Additional errors may appear in conjunction with this error to clarify the issue.
DISC_REQUIRED_CLONE
Explanation : Uncultured or environmental sources should have clone
Suggestion : Provide unique (non-identical) clone names for your sequences. A clone name is typically an alpha-numeric identifier used to track the sample in your laboratory. The clone name is not the organism name and it is not the name of the gene you are working on. An example of two unique clone names are xyz1-123 and xyz2-567.
DISC_REQUIRED_STRAIN
Explanation : Cultured prokaryotic sources should have strain
Suggestion : Provide unique (non-identical) strain names for your sequences. A strain name is an alphanumeric identifier that may be designated in any manner, for example, it may be based on the name of an individual or locality. As an example, for Escherichia coli K12, "K12" is the strain name or identifier.