Im coming more from the computer science part of Bioinformatics and have a biology-based question as im learning and preparing for my exam:
Why can a conserved gene be not in my assembly?
And what could be reasons a conserved gene is two (or multiple) times in my assembly?
A gene might be missing due to simply not getting sequenced (or not sequenced sufficiently to be assembled). Alternatively, it could have been sequenced but misassembled.
Possible reasons for a gene to exist multiple times in an assembly are (1) it actually does have multiple copies (this would end up being tough to assemble with short-reads, you'd need long reads or decent scaffolds) or (2) the assembler is representing the ambiguity of multiple paths as multiple copies. There are likely additional reasons, but these are the first that come to mind.