Question

What are some drawbacks in using highly conserved genes to determine genomic completeness?

0

Entering edit mode

8.6 years ago

Tom ▴ 20

For bacteria for example; draft assemblies and draft genomes are often assessed for how complete they are based on how well the raw reads cover certain conserved genes that are assumed to be present in all bacteria. What are some limitations and weaknesses in using this model? I can only imagine that some regions can be covered more than others, and thus would give you false positives/negatives.

https://peerj.com/preprints/554.pdf

genome bacteria sequencing Assembly • 2.4k views

ADD COMMENT • link updated 8.6 years ago by Ibrahim Tanyalcin ★ 1.2k • written 8.6 years ago by Tom ▴ 20

0

Entering edit mode

I think it was some guy from JGI who made the observation that many of such conserved genes are located in proximity to each other, so if you miss one such region from your assembly, the completeness estimate can be way off..

ADD REPLY • link 8.6 years ago by 5heikki 11k

0

Entering edit mode

Thats a really cool observation, can I have a reference to what you're talking about? I'm doing a discussion on this topic and it would be most helpful!

ADD REPLY • link 8.6 years ago by Tom ▴ 20

0

Entering edit mode

Do you have a name for this guy?

ADD REPLY • link 8.6 years ago by Tom ▴ 20

0

Entering edit mode

Just a guess. Perhaps Nikos C. Kyrpides.

ADD REPLY • link 8.6 years ago by GenoMax 148k

0

Entering edit mode

I think I heard it in person at the JGI but it most definitely wasn't Kyrpides. Sorry, this was a few years ago..

ADD REPLY • link 8.6 years ago by 5heikki 11k

0

Entering edit mode

If not the person specifically; do you know of any papers that would illustrate this evidence? I'm unable to find anything

ADD REPLY • link 8.6 years ago by Tom ▴ 20

0

Entering edit mode

Have a look for example here.

ADD REPLY • link 8.6 years ago by 5heikki 11k

score 0 · Answer 1 · 2016-05-26

0

Entering edit mode

8.6 years ago

Ibrahim Tanyalcin ★ 1.2k

I would assume if one wants to assess quality of the mappings he/she may want to use highly conserved genes (because you know there 'should' not be a wrong call there). But if by genomic completeness you mean what percentage of the genome is covered than I would say ubiquitously expressed genes might be more accurate. Most of the time one would assume highly conserved genes = highly expressed, but this may not always be the case. Therefore the ideal gene set might deviate from cell type to cell type. Due to applicability I assume there is a consensus gene set that is more or less OK. In any case you need to validate what is the state of art, my reasoning might be wrong.

ADD COMMENT • link 8.6 years ago by Ibrahim Tanyalcin ★ 1.2k

0

Entering edit mode

I am not sure I understand your point. If you're looking at genomes, expression doesn't matter, the genes just have to be present/absent.

ADD REPLY • link 8.6 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Dear Jean, I just answered another question regarding gene expression before this post. Somehow I interpreted this question within the context of RNAseq. So my point here does not make sense. I dont understand myself either :). I will keep this post in case people ask similar question regarding RNAseq.

ADD REPLY • link 8.6 years ago by Ibrahim Tanyalcin ★ 1.2k