GEO Metadata Validation Rules
To improve GEO's processing rate and maintain a high standard of metadata collection, GEO has implemented an automated pre-checking service for metadata completeness, formatting and content in the metadata spreadsheet. After completion of FTP transfer for raw and processed data files, the completed metadata file should be uploaded on the Submit Metadata page.
Upon upload, the metadata file will be scanned and checked for formatting and content within seconds. For example, if a section (STUDY, SAMPLES, PROTOCOLS, PAIRED-END EXPERIMENTS) is missing, you will receive the error message "Uploaded file is missing mandatory section" and a table will appear with the name of the missing section. If you receive an error message, please correct the indicated fields of your metadata file and upload your file again. Uploading a complete metadata file will return the message "Your metadata file has been successfully uploaded". Successful uploading of the metadata file places your submission into GEO's processing queue and you will receive an email notification with your submission summary.
error name | error message that you will receive | explanation and how to fix |
---|---|---|
excel_parse_failure | Uploaded file cannot be read. The file must be in Excel version 2007 or higher with .xlsx extension. | The file is not an Excel version 2007 or higher file with .xlsx extension. GEO cannot process metadata files submitted with extension .txt, .csv, or .tsv. Do not compress the metadata Excel spreadsheet. A compressed metadata Excel spreadsheet cannot be read. |
discontinued_template | It appears that you have used a discontinued version of the metadata spreadsheet. Please use the above link to download the newest version and resubmit. | Old versions of the metadata spreadsheet are not supported. Please download, complete, and submit the newest version of the metadata spreadsheet. |
missing_worksheet | Uploaded file is missing required worksheet named "Metadata". Please make sure you are using our newest metadata template. | The Excel tab (also called a worksheet) containing the metadata information must be named "Metadata" or "2. Metadata Template". Any other tab name will produce the "missing_worksheet" error. For example, do not rename the tab "RNAseq" or "ChIPseq". Do not include multiple tabs with metadata for separate studies in the same file. GEO needs one metadata file per study. |
missing_section | Uploaded file is missing mandatory section: | The metadata tab must have sections titled STUDY, SAMPLES and PROTOCOLS. If it is a paired-end sequencing study, the metadata file must also contain a PAIRED-END EXPERIMENTS section. |
empty_samples_section | SAMPLES section does not list any samples. Please make sure that library names do not start with "#" symbol since such lines are treated as comments and ignored. | Samples must be listed in the SAMPLES section. |
missing_mandatory_info | Uploaded file is missing mandatory information in the STUDY or PROTOCOLS sections: | Required fields in STUDY and PROTOCOLS sections are: title, summary (abstract), experimental design, extract protocol, library construction protocol, data processing description, assembly or genome build, and processed data files format and content. A table will be provided that lists the fields in STUDY and/or PROTOCOLS sections that are empty. |
missing_sample_header | SAMPLES section is missing required headers for the table: | Deleting columns from the metadata template in the SAMPLES section is not allowed and will produce the "missing_sample_header" error. A table will be provided which lists the missing headers in the SAMPLES section. You can add columns to the SAMPLES section for additional characteristics appropriate for your samples. For example, you could use the header "overall survival" and provide survival data for each sample. |
empty_library_name | At least one of the samples has empty library name. | In the SAMPLES section at least one of the samples has empty library name. Sometimes this error is caused by non-empty cells in the SAMPLES section that are not associated with the included samples. |
missing_sample_info | SAMPLES section is missing required information: | Every sample in the SAMPLES section must include information for library name, title, organism, library strategy, molecule, single or paired-end, and instrument model. A table will be provided which lists the missing field for each library name. Valid entries for library strategy, molecule, single or paired-end and instrument model are available from drop-down list in each of these columns in the metadata template. |
duplicate_library_names | Identical library names were found. Library names must be unique. This check is case insensitive, meaning that "Control1" and "control1" will be considered identical. Identical names are: | Every library name in the SAMPLES section must be unique. A table will be provided which lists the non-unique library name and the number of times (occurrences) it was found in the SAMPLES section. |
duplicate_sample_titles | Identical sample titles were found. Sample titles must be unique. This check is case insensitive, meaning that "Control1" and "control1" will be considered identical. Identical titles are: | Every title in the SAMPLES section must be unique. A table will be provided which lists the non-unique title and the number of times (occurrences) it was found in the SAMPLES section. |
invalid_contributor_format | The contributor name is not correctly formatted. The format is: 'Firstname, I, Lastname' or 'Firstname, Lastname'. First (given) name must be at least one character long. 'I' represents middle name initial and must be exactly one character. Last (family) name must be at least two characters long. List only one contributor name per row. Examples and guidance for contributor name format are available in the metadata template. | Contributor names must be provided in the accepted format of First, Last or First, I, Last. I represents middle name initial, if present. A comma must separate the individual parts of the name. List one contributor name per row. You can add as many extra rows with field name "contributor" as you need. |
long_sample_title | Sample title is too long. Maximum length allowed is 120 characters. | Sample titles can be no longer than 120 characters. A short sample title of 3-5 words is easy to read and displays clearly on the website. |
empty_field_name | The following rows in STUDY and/or PROTOCOLS sections are missing the field name such as "contributor" or "data processing step". Add the correct field name in the cell to the left of the cell with text listed below. | |
out_of_bound_text | Extra text was found beyond the first two columns in STUDY and/or PROTOCOLS sections. Please remove it. If you need to include different protocols for subsets of samples, please add all PROTOCOLS fields (extract protocol, library protocol, data processing step, etc) to the SAMPLES section as additional columns. | |
raw_file_not_found | The metadata file lists raw files that are not found in your personalized upload space. Upload any missing files OR correct the metadata file by listing the exact file names (names are case-sensitive, cannot include paths, and must include file extensions such as ".gz" when compressed). The following raw files are not found in your personalized upload space: | |
no_paths_allowed | A directory path to a file name has been found in the metadata file. All raw data, processed data, and supplementary files must be listed without a path. For example, use "data_matrix.txt" instead of "/Home/RNAseq/Data/Processed/data_matrix.txt". Please remove paths and resubmit. | Inclusion of a path in a file name prevents file detection on GEO's server. List the file name without path. |
invalid_organism_name | Organism name(s) could not be resolved automatically in NCBI Taxonomy database. The name was either not found, or it returned multiple entries. Please check spelling of organism name. Make sure you have provided a valid scientific name at species level (or lower rank, such as subspecies), e.g., Mus musculus. Do not include taxonomic authority in the name such as L. for Linnaeus. If the organism name is valid but not yet included in NCBI Taxonomy database, contact GEO using the "email us" link located above this message. | Make sure that the 'organism' field contains the scientific name of the organism at species level or below. The organism name cannot include additional text such as tissue information e.g., Mus musculus heart. List one name per column. Add extra 'organism' columns if the sample includes material from more than one organism. |
missing_sample_column_name | Some columns in the SAMPLES section are not named. Add column names to the header row. | The header row in the SAMPLES section must have a name for each column for which there is sample information. Remove any unintentional text that you do not want on the sample record. |
duplicate_raw_file_names | Identical raw data file names have been found in the SAMPLES section. All samples must be associated with unique raw data files. Please check raw file names for typos or inadvertent copy/paste errors. For single-cell studies with multiplexed raw data, please see the metadata template worksheet "scMulti-omics seq EXAMPLE" for guidance. If you have questions or need help, contact GEO using the "email us" link located above this message. | Each sample must be associated with independent raw data files. If your single-cell samples have been multiplexed, create one sample per sequencing library and create separate samples for individual library types such as GEX, HTO, ADT, TCR, etc. |
processed_data_file_not_found | The metadata file lists processed data files that are not found in your personalized upload space. Upload any missing files OR correct the metadata file by listing the exact file names (names are case-sensitive, cannot include paths, and must include file extensions such as ".gz" when compressed). List one processed data file per "processed data file" column in the SAMPLES section or "supplementary file" field in the STUDY section. If a sample (for example, input) does not have any associated processed data, leave the "processed data file" cell empty for that sample. The following processed data files are not found in your personalized upload space: | |
processed_data_required | Your submission does not contain any processed data file(s). Include a processed data file that contains data for all samples as a "supplementary file" in the STUDY section or provide sample-specific processed data file(s) listed in the "processed data file" field of the SAMPLES section. You can add as many "processed data file" columns as you need. Enter only one file per spreadsheet cell. If some samples (such as input) do not have associated processed data, leave the field empty for those samples. | |
paired_end_section_invalid_header | The PAIRED-END EXPERIMENTS section header row is not formatted correctly. There should be up to 4 columns, named as "file name 1", "file name 2", "file name 3" and "file name 4". All columns with file names must include a header. | |
paired_end_section_column_limit | Each row of the PAIRED-END EXPERIMENTS section can include a maximum of four files. Each row should include paired-end files from one run. The following file names were found beyond the fourth column: | |
paired_end_section_raw_file_omitted | Paired-end raw files must be listed in both sections of the metadata file. List one set of paired-end raw files (R1, R2 or I1, R1, R2, for example) per row in the PAIRED-END EXPERIMENTS section. The following raw files from SAMPLES section are not found in the PAIRED-END EXPERIMENTS section: | |
paired_end_section_with_non_paired_end_file | PAIRED-END EXPERIMENTS section includes raw files that are marked as "single" in SAMPLES section or files that are not included in "raw file" columns in SAMPLES section: | |
paired_end_section_library_mismatch | PAIRED-END EXPERIMENTS section contains at least one row with files from different libraries or samples. | |
paired_end_section_duplicate_file_names | The PAIRED-END EXPERIMENTS section contains non-unique file names. Please correct all file names so that they are unique. | |
duplicate_section | Each section (SERIES, SAMPLES, PROTOCOLS, PAIRED-END EXPERIMENTS) can only occur once in the "Metadata" worksheet. Upload one metadata file per data type (e.g., ChIP-seq, RNA-seq). Some sections were found more than once. | |
invalid_molecule_value | Your submission contains an invalid value for "molecule". Choose an option from the dropdown list in the metadata template in the "molecule" column. | |
invalid_single_paired_end_value | Your submission contains an invalid value for "single or paired-end". The value of this field must be either "single" or "paired-end". | |
invalid_instrument_model_value | Your submission contains an invalid value for "instrument model". Choose an option from the dropdown list in the metadata template in the "instrument model" column. Only one instrument model may be included per cell. If needed, enter name of additional instrument models in the "description" column. | |
invalid_library_strategy_value | Your submission contains an invalid value for "library strategy". Choose an option from the dropdown list in the metadata template in the "library strategy" column. | |
insufficient_biological_information | The following samples are missing biological information. At least one of these fields is required: tissue, cell line or cell type. |