I'm currently working on whole exome data and I wanted to know if it's better to map the reads against a reference genome directly. Or is it better to first make a de novo assembly which will give me hopefully a number of contig corresponding to the number of sequenced exon (~200,000) and then map those contigs against the reference genome?
What do you think? Could it possibly improve the final mapping?
If yes which de novo assembly tool would you recommend me?
It is generally better to align the reads to the reference genome unless you don't have the reference genome available to you. The de novo assembly is rather complicated and might take more time and computation power in order to get the required output. So if I were you, I will definitely prefer aligning my reads to the reference genome rather than performing the de-novo assembly
De-novo assembly is a very complicated process and most probably you would not get a "reference exome", due to the presence of contaminations , sequencing errors, low complexity regions, transposons...etc. De-novo assembly is mostly recommended when a reference genome is not available.
You should first map your reads against the reference genome, and then you can extract those reads mapped against exons using bedtools or something similar.
It is generally better to align the reads to the reference genome unless you don't have the reference genome available to you. The de novo assembly is rather complicated and might take more time and computation power in order to get the required output. So if I were you, I will definitely prefer aligning my reads to the reference genome rather than performing the de-novo assembly