Question

Why Do Tools Use Probabilistic Models To Find Highly Conserved Elements?

0

Entering edit mode

11.3 years ago

vasilislenis ▴ 160

Hello everybody,

I have a question which maybe you will find it naive, but for me your answer will be really helpful.

I’m interesting for the highly conserved elements among different genomes and I’m using the phastCons software to do that. My question is that.

In order to find the conserved elements why do I need a probabilistic model like the model that phastCons uses and not to identify the identical parts of each alignment blocks of the maf files?

I believe that I’m loosing something really important here. Thank you very much in advance for your time and your patience.

Vasilis.

• 2.5k views

ADD COMMENT • link updated 11.3 years ago by Istvan Albert 102k • written 11.3 years ago by vasilislenis ▴ 160

score 2 · Answer 1 · 2014-01-20

There are probably other technical advantages, but I think the strongest advantage to phastCons is the ease of use. Namely, most people should never have to run phastCons themselves - you can just view the annotations in the UCSC GenomeBrowser and/or download a track with the conserved locations.

http://genome.ucsc.edu/cgi-bin/hgGateway

http://genome.ucsc.edu/cgi-bin/hgTables?hgsid=359897491

I think working with a track of locations for a standardized reference should be much easier then trying to parse and interpret blocks of a raw multi-species alignment file.

score 2 · Answer 2 · 2014-01-20

2

Entering edit mode

11.3 years ago

Istvan Albert 102k

The only addition that I would make to cwarden45 's answer is that just "using sequence alignment" is not nearly as simple as it sounds.

Another way to say it, alignments are very easy to define and use when the alignment is exact and a lot more difficult to interpret when the matches are inexact, at that point they can become very sensitive to the scoring scheme. So saying "just use the alignments" tacitly assumes that alignments must always be correct.

ADD COMMENT • link 11.3 years ago by Istvan Albert 102k

0

Entering edit mode

So, the problem is that the alignment could not be accurate, thats why we are using probabilistic methods like phastCons? (I'm sorry for such childish questions, but I have a lack of biological background).

ADD REPLY • link 11.3 years ago by vasilislenis ▴ 160

0

Entering edit mode

Broadly speaking, species that are more diverged from each other should have fewer true conserved sequences and lower quality alignments.

I don't recall the details of the phastCons algorithm right now, but I would assume that one of the factors it considers is the likelihood of seeing a conserved region based upon chance (depending on things like the size of the conserved region, the divergence within the conserved region, etc.)