For bacteria for example; draft assemblies and draft genomes are often assessed for how complete they are based on how well the raw reads cover certain conserved genes that are assumed to be present in all bacteria. What are some limitations and weaknesses in using this model? I can only imagine that some regions can be covered more than others, and thus would give you false positives/negatives.
I think it was some guy from JGI who made the observation that many of such conserved genes are located in proximity to each other, so if you miss one such region from your assembly, the completeness estimate can be way off..
Thats a really cool observation, can I have a reference to what you're talking about? I'm doing a discussion on this topic and it would be most helpful!
I would assume if one wants to assess quality of the mappings he/she may want to use highly conserved genes (because you know there 'should' not be a wrong call there). But if by genomic completeness you mean what percentage of the genome is covered than I would say ubiquitously expressed genes might be more accurate. Most of the time one would assume highly conserved genes = highly expressed, but this may not always be the case. Therefore the ideal gene set might deviate from cell type to cell type. Due to applicability I assume there is a consensus gene set that is more or less OK. In any case you need to validate what is the state of art, my reasoning might be wrong.
Dear Jean,
I just answered another question regarding gene expression before this post. Somehow I interpreted this question within the context of RNAseq. So my point here does not make sense. I dont understand myself either :). I will keep this post in case people ask similar question regarding RNAseq.
I think it was some guy from JGI who made the observation that many of such conserved genes are located in proximity to each other, so if you miss one such region from your assembly, the completeness estimate can be way off..
Thats a really cool observation, can I have a reference to what you're talking about? I'm doing a discussion on this topic and it would be most helpful!
Do you have a name for this guy?
Just a guess. Perhaps Nikos C. Kyrpides.
I think I heard it in person at the JGI but it most definitely wasn't Kyrpides. Sorry, this was a few years ago..
If not the person specifically; do you know of any papers that would illustrate this evidence? I'm unable to find anything
Have a look for example here.