How to get conservation information for bed file using 100wayPhastCons
2
1
Entering edit mode
7.7 years ago
chrys ▴ 80

Hi,

I have several bed files for which I would like to get conservation information. I have downloaded the 100wayPhastCons.bw file from UCSC and my plan is to now get the conservation scores for my bed intervals from this file.

Is there a way to achieve this ?

Thanks,

genome bed conservation sequence • 3.3k views
ADD COMMENT
1
Entering edit mode

Not sure if one of these help (you may have to modify solutions to fit your needs): Working With Phastcons Files
Obtaining phastCons conservation score for every gene in the Human genome
Batch conservation analysis

ADD REPLY
0
Entering edit mode

Thanks, those are the ones I am going by right now. I was just hoping that somebody knew a tutorial or a site where a pipeline is described to get this kind of conservation information.

Quick Edit: I also was hoping to get conservation score per base. Most ways described are only returning a summary.

ADD REPLY
4
Entering edit mode
7.7 years ago

Here's a pipeline you can use:

  1. Convert the conservation bigWig (.bw) to Wiggle (.wig) with bigWigToWig, using /dev/fd/1 as a filename for standard output. This should allow you to pipe the result to BEDOPS wig2bed and get a sorted five-column BED file:

    $ bigWigToWig conservation.bw /dev/fd/1 | wig2bed - > conservation.bed5
    
  2. Split your intervals up into per-base elements with bedops --chop 1.

    $ bedops --chop 1 intervals.bed > perbase.bed3
    
  3. Run bedmap to map your per-base elements to the BED-formatted conservation signal. Normally, unmapped elements report no value. You can use the following command to replace unmapped bases with a score of 0:

    $ bedmap --count --echo --echo-map-score --delim '\t' perbase.bed3 conservation.bed5 | awk '{ if ($1=="0") { print $0"0.0"; } else { print $0; } }' | cut -f2- > answer.bed
    

    This will only make sense if a missing conservation score should be zero. You could alternatively replace 0 with NaN or a similar IEEE-ready placeholder for a non-number.

Finally, if you want to put everything into a one-liner, use file substitutions:

$ bedmap --count --echo --echo-map-score --delim '\t' <(bedops --chop 1 intervals.bed) <(bigWigToWig conservation.bw /dev/fd/1 | wig2bed - ) | awk '{ if ($1=="0") { print $0"0.0"; } else { print $0; } }' | cut -f2- > answer.bed

If you're going to do repeated queries with different sets of intervals, it probably makes sense to do the bigWig-to-BED5 conversion step only once, and drop that in as conservation.bed5 where you need it in mapping.

ADD COMMENT
2
Entering edit mode
7.7 years ago
genecats.ucsc ▴ 580

Hi chrys,

You will probably be interested in the following page, which details how to extract wiggle data: http://genomewiki.ucsc.edu/index.php/Using_hgWiggle_without_a_database

If you have any follow-up questions, it would be helpful if you could post them to our Google Groups forum: https://groups.google.com/a/soe.ucsc.edu/forum/#!forum/genome, that way our whole team can see the question and help with an answer.

Thanks,

ChrisL from the UCSC Genome Browser

ADD COMMENT

Login before adding your answer.

Traffic: 2366 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6