hello! i am trying to calculate conservation depth of multiple regions (100-3000 bps) present in human genome. since regions are more than 10,000, its not possible to check conservation manually. i attempted to download phastcons score files from ucsc (46way.wigifx files). and when i tried to average the phascons score of my desired regions, the results were bit confusing. for example: chr7:21114483-21117423 (hg19) has phastcons max =1 and mean= 0.437672 when i check it in UCSC browser, its conserved down to fish ! where as chr7:20838708-20841649 (hg19) max=1 and mean =0.459534 while on browser its conserved to mammals only! if a region has 0.4 score and is conserved in tetraodon fish, then every other region having this score must be conserved till fish. why is this contradiction here? kindly guide me or suggest me some other way of getting proper conservation depth of these multiple regions.
an other solution i saw was to check for the maximum distant specie having at least 50% sequence conserved with query sequence in chain file. but when i see chain files on ucsc, they are splited into blocks or patches according to conservation . i cant understand how to combine those blocks to search my desired region and then how to calculate 50% conservation with the maximally distant specie.
Perhaps you might investigate per-base conservation/evolution signal, like phyloP.
nopes.... not per base, i think thats the main issue. phyloP and PhastCons both give me per base score, but i need to estimate conservation depth of whole patch/region. very crude solution was to average the per base score of whole region to get one mean score of whole region, but thats not working,as i told in details of my question. And thank you so much for answering, i am looking forward to your suggestions and comments. Actually i just need to know the organism till which my region is conserved, and there are some 10K above such regions, which makes viewing them on UCSC custom track impossible