Is there any simple way of extracting 44way or 29 mammals conservation levels of given genomic regions in human genome (BED format) ?
Is there any simple way of extracting 44way or 29 mammals conservation levels of given genomic regions in human genome (BED format) ?
44way conservation levels are available as a set of WIG/BIGWIG files at the UCSC. For example, see: ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/phastCons46way/primates/
you can query and convert those wig/bigwig files using wigtobigwig and bigWigToBedGraph in http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/
================================================================
======== bigWigToBedGraph ====================================
================================================================
bigWigToBedGraph - Convert from bigWig to bedGraph format.
usage:
bigWigToBedGraph in.bigWig out.bedGraph
options:
-chrom=chr1 - if set restrict output to given chromosome
-start=N - if set, restrict output to only that over start
-end=N - if set, restict output to only that under end
-udcDir=/dir/to/cache - place to put cache for remote bigBed/bigWigs
In the title you ask how to calculate the conservation score, and in the description just how to extract conservation score from multiple regions.
For extracting you might want to take a look at biopieces. It includes a program for exactly that purpose: http://code.google.com/p/biopieces/wiki/get_genome_phastcons
Of course, the genome and conservation scores have to be downloaded from UCSC and set up for biopieces first.
I found an easy way, using CompleteMOTIFS tool. It gives average PhastCons scores of a given BED region. http://cmotifs.tchlab.org/cgi/pipeline_v3.cgi
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
29 mammals is not there. you can find it here, already in bed format. http://www.broadinstitute.org/scientific-community/science/projects/mammals-models/29-mammals-project-supplementary-info
extraction of regions can be done with bedtools i think?
i removed "29 mammals"
I used these steps to get my phastcons scores -
awk '{print $1 "\t" $2 "\t" $3 "\t" $4}'
)bedtools intersect -a ChIP_peaks.bed -b phastCons.bed
Is this the best way to do this? My final goal is to plot a PhastCons score vs Distance from Binding site plot. Please advice.
P.S: I am working with an insect genome, most of the available tools seem to be for human genomes.