DATA SETS Sets of extremely conserved noncoding regions (pvalue <= 1e-40) identified by Gumby in a whole-genome alignment of human (hg16 assembly), mouse (mm4) and rat (rn3) at 6 different settings of the Gumby R-ratio parameter (R = 5, 15, 50, 200, 1000, 10000). The files contain coordinates of conserved elements in the human hg18 genome assembly. See the first reference below for a complete description of the method. REFERENCES These data sets are described in and form the basis of: Visel A. et al., Nature Genetics 2008 (Note: the set with R=50 was the primary data set used in this paper). The whole-genome alignments were performed using the VISTA genome alignment pipeline [Frazer KA et al., Nucleic Acids Res. 2004;32:W273-9] and MLAGAN global alignment program [Brudno M. et al, Genome Research 2003;13(4):721-31]. Genome sequences were downloaded from the UCSC Genome Bioinformatics website [http://genome.ucsc.edu/]. FORMAT The files are in 4-column format: chr start end log10(1/pvalue) As in the UCSC BED format, the start-coordinate is zero-based and the end coordinate is one-based. For example, chr10:10000-20000 would be written as chr10 9999 20000 in this format. The conservation pvalue was calculated by Gumby (Prabhakar S. et al, Genome Research 2006). The value in the 4th column can be thought of as a conservation score, i.e., the higher the value, the greater the degree of evolutionary conservation. CAVEAT These lists have some false negatives, in regions where one of the compared genomes has an incomplete/incorrect assembly, or in regions that are not syntenically aligned in all three species by the genome alignment pipeline. Also, 1e-40 is an extremely strict pvalue threshold, designed to select only the extreme tail of the distribution of constrained sequences in the human genome.