Why Do Tools Use Probabilistic Models To Find Highly Conserved Elements?
2
0
Entering edit mode
10.8 years ago
vasilislenis ▴ 160

Hello everybody,

I have a question which maybe you will find it naive, but for me your answer will be really helpful.

I’m interesting for the highly conserved elements among different genomes and I’m using the phastCons software to do that. My question is that.

In order to find the conserved elements why do I need a probabilistic model like the model that phastCons uses and not to identify the identical parts of each alignment blocks of the maf files?

I believe that I’m loosing something really important here. Thank you very much in advance for your time and your patience.

Vasilis.

• 2.3k views
ADD COMMENT
2
Entering edit mode
10.8 years ago

There are probably other technical advantages, but I think the strongest advantage to phastCons is the ease of use. Namely, most people should never have to run phastCons themselves - you can just view the annotations in the UCSC GenomeBrowser and/or download a track with the conserved locations.

http://genome.ucsc.edu/cgi-bin/hgGateway

http://genome.ucsc.edu/cgi-bin/hgTables?hgsid=359897491

I think working with a track of locations for a standardized reference should be much easier then trying to parse and interpret blocks of a raw multi-species alignment file.

ADD COMMENT
2
Entering edit mode
10.8 years ago

The only addition that I would make to cwarden45 's answer is that just "using sequence alignment" is not nearly as simple as it sounds.

Another way to say it, alignments are very easy to define and use when the alignment is exact and a lot more difficult to interpret when the matches are inexact, at that point they can become very sensitive to the scoring scheme. So saying "just use the alignments" tacitly assumes that alignments must always be correct.

ADD COMMENT
0
Entering edit mode

So, the problem is that the alignment could not be accurate, thats why we are using probabilistic methods like phastCons? (I'm sorry for such childish questions, but I have a lack of biological background).

ADD REPLY
0
Entering edit mode

Broadly speaking, species that are more diverged from each other should have fewer true conserved sequences and lower quality alignments.

I don't recall the details of the phastCons algorithm right now, but I would assume that one of the factors it considers is the likelihood of seeing a conserved region based upon chance (depending on things like the size of the conserved region, the divergence within the conserved region, etc.)

ADD REPLY
0
Entering edit mode

Thank you very much, I believe that I'm starting to understand...

ADD REPLY

Login before adding your answer.

Traffic: 793 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6