C. acetobutylicum ATCC824 Transcriptional array v2 includes over 20K 60-mer oligonucleotide probes selected after experimental testing of our proof-of-concept platforms GPL4029 and GPL4030. The programs Comm_Oligo (Li, He et al. 2005), ROSO (Reymond, Charles et al. 2004), YODA (Nordberg 2005), ArrayOligoSelector (Bozdech, Zhu et al. 2003), OligoWiz 2.0 (Wernersson and Nielsen 2005) and Picky (Chou, Hsia et al. 2004) were used to generate several 60-mers for each Clostridium acetobutylicum ATCC824 chromosome and pSOL1 megaplasmid ORF (Nölling, Breton et al. 2001). Whenever possible the DNA sequences belonging ribosomal RNAs, tRNA and the intergenic regions of the whole genome were used as a negative set (i.e. no match allowed). A maximum identity of 75-85% to any other sequence and other parameters were set to the program defaults. On average, 32 60-mers per ORF where generated.Melting temperatures and DeltaG values for the generated oligomer and its complementary sequence were re-calculated using Hybrid 2.5 (Markham and Zuker 2005) (included in (Rouillard, Zuker et al. 2003)). For each 60-mer, the best four non-specific matches against the Clostridium acetobutylicum ATCC 824 genome were determined using FASTA (Pearson and Lipman 1988; Pearson 1990). The melting temperatures of the heterodimers formed by a 60-mer and the complementary sequence of each of its non-specific matches were also calculated. The difference between the melting temperature between the 60-mer and each one of the heterodimers was calculated and the minimal value of the differences was recorded. The 60-mers targeting each particular ORF were ranked (in descending order) according to the minimal DeltaT previously recorded. mRNAs from wild type and M5 C. acetobutylicum strain cultures in exponential phase where hybridized onto two pairs of slides on a dye-swap configuration using Agilent and cDNA arrays for a total of twelve arrays. After background subtraction, the intensities of the probes on each channel were ranked independently and scaled to a maximum value of 100 on each slide. For each gene four median ranks were calculated: two corresponding to the WT values (one for its probes in our previous cDNA platform GPL3820 and another for its probes in our proof-of-concept platforms GPL4029 and GPL4030) and the other two for the M5 values using the same procedure. To select the most representative pair of probes for each gene, first we selected the probe with a mean WT rank closest to the median rank of the Agilent probes for that gene. The same procedure was applied for the M5 ranks. If the selected probe for the WT and M5 samples is the same, then we selected the second closest probe to the median rank of the strain (WT or M5) with the higher median rank on the spotted arrays (WT or M5). We did so to avoid choosing probes with very low intensities in those cases where the mRNA is not expressed by one of the strains. Control features are automatically included on the array by Agilent and follow their naming convention. The name for our custom 60-mers is composed of the ORF name (CACXXXX or CAPXXXX), the 60-mer number (1,2 or 3), a character (d,e or f) indicating if it is the first (d), second (e) or third (f) occurrence of this specific 60-mer on this platform, and a two letter code (Ch, Co or Tr). A Ch 60-mer is a 60-mer located in the lower 50% or 500 bp (whichever is shorter) of the target ORF and has a rank of four (4) or greater. A Co 60-mer is a 60-mer located in the lower 50% or 500 bp (whichever is shorter) of the target ORF and has a rank of four (4) or smaller. A Tr 60-mer is any 60-mer that does not meet the requirements of a Ch or Co 60-mer regarding location and/or rank. Orientation: Features are numbered numbered Left-to-Right, Top-to-Bottom as scanned by an Agilent scanner (barcode on the left, DNA on the back surface, scanned through the glass), matching the FeatureNum output from Agilent's Feature Extraction software. The ID column represents the Agilent Feature Extraction feature number. Rows and columns are numbered as scanned by an Axon Scanner (barcode on the bottom, DNA on the front surface).
Custom 60-mer and control features identification information.
OPEN_READING_FRAME
The ORF targeted by each 60-mer according to the NC_003030.1 (chromosome), NC_001988.1 (pSOL1 plasmid) sequences and the original annotation files provided by Genome Therapeutics Corp. For those ORF that have been deleted in successive versions of the genome annotation, the GI entry on the has been set to N/A and their GeneID can be found appended at the end of the annotation entry.
ORF
SPOT_ID
ANNOTATION
The original annotation provided by Genome Therapeutics Corp.