Hey all,
I was doing conservation analysis for some regions of interest and wanted to map PhyloP score onto the selected regions (in a .bed file). I got different results using different method.
For example, for chr10: 91853138-91853143 If I just use UCSC Table Browser, I will get phyloP as described here:
track type=wiggle_0 name="100 Vert. Cons" description="100 vertebrates Basewise Conservation by PhyloP" 91853138: -0.62474; 91853139: -0.0926929; 91853140: 0.27974; 91853141: -0.145898; 91853142: 0.332945; 91853143:* 0.758583*.
If I use the method described here, I would get
chr10 91853138 91853139 000048-000000|-0.051000 chr10 91853139 91853140 000048-000001|0.332000 chr10 91853140 91853141 000048-000002|-0.107000 chr10 91853141 91853142 000048-000003|0.357000
Notice that the second method only gives me the scores for 4 bases and they are different from the values acquired from the Table Browser.
I also tried the BigWigAverageOverBed utilities and it produced the same results as the second method. Could anyone explain why there's discrepancy? Did I overlook anything?
Thank you so much!
Which human genome build is the data from? The second method used
hg19
- is this the same genome build as the first method?Which conservation scores did you use? The first method uses
The second method uses 46 vertebrate basewise conservation
Why did you get 6 values for the first method and 4 for the second? Wiggle files use 1-based counting, whereas bed files use 0-based counting. For more info on this, look at the way UCSC browser and bed files count nucleotides.
Thank you for the answer! I used hg19 and 100 vertebrate phyloP for both methods and got different results. I got a response from UCSC genome browser team (quoted below)
which is similar to what you said. Thank you!
What track(s) are you using?
My previous answer uses conservation signal for an older assembly.
If your UCSC Table Browser session is displaying a newer track for the current assembly (
hg38
) then you will almost certainly get different signal over the same genomic range, than what you would get by copying-pasting my older answer (which useshg19
data).It also looks like these are different tracks (100-way vs 46-way comparison).
If you're looking to verify a procedure by using different methods, you'll need to start with the same input.