Interpretting phastConsElements from USCS Table Browser
2
1
Entering edit mode
3.2 years ago

I am trying to understand the information present in PhastCons elements bed files from USCS Table Browser. Following the information I found here I managed to get a description of the column names, but not a description of what they are measuring. I checked other BioStars posts, and apparently, the PhastCons have scores that range from 0-100, but this is not true according to the description from Table Browser (see below):

enter image description here

I also checked the distribution of the the "score" column, and on the file that is available to download, the smallest score is 117. So I am assuming the phastConsElements46wayPrimates bed file only contains the conserved regions. Is this correct? Also, where can I get a description of what the LOD score is measuring here.

phastcons conservation genomics ucsc • 1.3k views
ADD COMMENT
1
Entering edit mode
3.2 years ago
Luis Nassar ▴ 670

Hello,

We offer more information in the track description (https://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&c=chrX&g=cons46way).

There's 4 components to our conservations tracks:

  • A multiz alignment of the sequences
  • Basewise conservation as predicted by phyloP - this is displayed as a wiggle track
  • Element conservation as predicted by phastCons - this is also displayed as a wiggle, but less resolution than the phyloP
  • Conserved elements as predicted by phastCons with the --viterbi option (I believe this is now --most-conserved, it used to be viterbi: http://compgen.cshl.edu/phast/phastCons-tutorial.php) - this is discrete elements in bed format, which is the file you are looking at

Below is the description of that specific track. lod stands for "log-odds score", and the score is ultimately transformed to be between 0-1000 to optimize the Genome Browser's display.

The conserved elements were predicted by running phastCons with the --viterbi option. The predicted elements are segments of the alignment that are likely to have been "generated" by the conserved state of the phylo-HMM. Each element is assigned a log-odds score equal to its log probability under the conserved model minus its log probability under the non-conserved model. The "score" field associated with this track contains transformed log-odds scores, taking values between 0 and

  1. (The scores are transformed using a monotonic function of the form a * log(x) + b.) The raw log odds scores are retained in the "name" field and can be seen on the details page or in the browser when the track's display mode is set to "pack" or "full".

If you have any follow up questions, our public help desk can always be reached at genome@soe.ucsc.edu. You may also send questions to genome-www@soe.ucsc.edu if they contain sensitive data. For any Genome Browser questions on Biostars, the UCSC tag is the best way to ensure visibility by the team.

ADD COMMENT
0
Entering edit mode
3.2 years ago

I don't recall what LOD references (level of detail?), but I think raw phastCons scores (where present) are usually between 0 and 1, being probability measures.

For visual presentation in a UCSC Genome Browser instance, the raw score may have a -log or other transform applied on it to convert it to a value between 0 and 1000. This value can then be used to draw the height of a bin in the browser.

It may help to take a look at the bigWig downloads available here, instead: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/phastCons46way/

These are very large downloads, but may likely contain raw, untransformed data that is of more use to you.

ADD COMMENT

Login before adding your answer.

Traffic: 2441 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6