So I am doing some research focusing on certain SNPs of importance found within metagenomes. For example, within the human gut from data provided by the Human Microbiome Project (HMP). My questing is how does the process of contig assembly affect SNPs? Will some SNPs be masked by the assembly process when overlapping contigs are joined to make a longer sequence or when these sequences are assigned phylogeny based on a reference genome? HMP uses the SOAPdenovo process. I have tried to determine the answer to this question but it still remains a little unclear. Does anyone have any knowledge or experience with this? Also, if assembly is a problem in regards to SNPs, can someone suggest the best way to BLAST unassembled metagenomic data and perhaps the best source of such data? Thanks in advance for any help.
Accurately assembling a single genome can be a challenge. Assembling metagenomes is even more challenging. What exactly happens when a metagenome gets dotted with SNPs .. This would be a fun project to simulate and post some in-silico results. I wish I had fewer responsibilities and would do it myself. Alas career advancements usually mean that we can work less and less on problems that are fun.