Here is a somewhat low-tech and convoluted way of doing what you want using Phastcons scores, as i have no idea what the paper you mentioned did.
1) Get PhastCons scores from UCSC:
There are various phastCons score files, depending on the genomes in the pileup. E.g.
http://hgdownload.cse.ucsc.edu/goldenPath/mm9/phastCons30way/
You can get them by ftp:
ftp hgdownload.cse.ucsc.edu
user name: anonymous
password: <your email address>
cd goldenPath/mm9/phastCons30way/vertebrate
mget -a
2) Convert wig to bigWig:
Each chromosome file of scores (wig files) can be converted into bigWig files. 64-bit binaries come from:
http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/
The fetchChromSizes
script is required to get the chromosome length for the 'wigToBigWig' script:
fetchChromSizes mm9 > mm9.chrom.sizes
Run wigToBigWig on each chromosome file, e.g:
wigToBigWig -clip chr1.data mm9.chrom.sizes chr1.bw
3) Calculate mean phastCons score for the coordinates of interest:
Another script called bigWigSummary
can be obtained from UCSC, using the link above. It takes in a set of genome coordinate and will spit out the max or mean of values contained in the bigWig file, in our case phastCons scores.
Below is a Perl script i wrote (below) that retrieves max and mean scores. Paths to files and scripts will need editing.
This is at least a backup if you don't get anything else!
#!/usr/bin/perl -w
use strict;
#####
# Get highest and mean phastCons score for each binding region in BED file
# 10 Sept 2010. Ian Donaldson.
# 20 Oct 2010 modified to read BED format
#####
# Command line usage
if(@ARGV!=2) {
die ("USAGE: $0 | Regions BED file | Output\n");
}
open(BED, "<$ARGV[0]") or die("Could not open input BED file!\n");
open(OUTPUT, ">$ARGV[1]") or die("Could not open output file!\n");
# Work thru each line of BED file
while(defined(my $line = <BED>)) {
chomp($line);
my ($chr, $start, $end) = '';
# Skip line if line does not start with 'chr'
unless($line =~ /^chr/) { next }
# Remove EOL from each line
chomp($line);
# Split each line by TAB
my @line_bits = split(/\t/, $line);
$chr = $line_bits[0];
$start = $line_bits[1];
$end = $line_bits[2];
# Run bigWigSummary
system("/home/idonaldson/bin/UCSC/bigWigSummary -type=max /home/idonaldson/genomes/mm9_vertebrate_phastCons/bw/$chr.bw $chr $start $end 1 > $$.max_tmp 2>&1");
system("/home/idonaldson/bin/UCSC/bigWigSummary -type=mean /home/idonaldson/genomes/mm9_vertebrate_phastCons/bw/$chr.bw $chr $start $end 1 > $$.mean_tmp 2>&1");
# open temp file
open(MAX_TEMP, "<$$.max_tmp") or die("Cannot open max temp file!\n");
open(MEAN_TEMP, "<$$.mean_tmp") or die("Cannot open mean temp file!\n");
# MAX phastCons
my $max_temp_output = '';
while(<MAX_TEMP>) {
$max_temp_output .= $_;
chomp($max_temp_output);
}
# close max temp file
close(MAX_TEMP);
# remove blank lines s/^$//g
if($max_temp_output =~ /^no/) {
$max_temp_output = '';
}
# MEAN phastCons
my $mean_temp_output = '';
while(<MEAN_TEMP>) {
$mean_temp_output .= $_;
chomp($mean_temp_output);
}
# close max temp file
close(MEAN_TEMP);
# remove blank lines s/^$//g
if($mean_temp_output =~ /^no/) {
$mean_temp_output = '';
}
# save temp file variable to $ARGV[1]
print OUTPUT "$line_bits[0]\t$line_bits[1]\t$line_bits[2]\t$max_temp_output\t$mean_temp_output\n";
}
# remove temporary file to final output
system("rm $$.max_tmp");
system("rm $$.mean_tmp");
# close files
close(BED);
close(OUTPUT);
exit;
I think this code is very useful. Thanks.
great effort lan ! i used it and found interesting. but do you think mean and max phast score would work if we need to find conservastion of bit longer regions , say 500-3000 bps? chr7:21114483-21117423 (hg19) has phastcons max =1 and mean= 0.437672 when i check it in UCSC browser, its conserved down to fish ! where as chr7:20838708-20841649 (hg19) max=1 and mean =0.459534 while on browser its conserved to mammals only! why is this contradiction here. may be there is some misunderstanding in my concept about phastcons score. can you please clarify me ? i actually want to find conservation depth of many such regions, and based on 46 vertebrate multiple alignments phastcons score , i am trying to identify distant most specie to which an element/region is conserved.