Hey all!
I have 2x251 bp MiSeq amplicon sequencing data of a gene from environmental metagenomes. The amplicon, a part of a bacterial topoisomerase gene, has a size of 350bp. After merging the reads, I want to map them to a reference sequence and check the mutation frequency for each base relative to the reference. My first idea would be to create a MSA from the reference and the reads and then just go through every position in each entry of the resulting alignment file. However, since I have around 1 million sequences per sample, I fear that both creating and parsing the alignment file would be really slow...Does someone have experience with this/knows useful tools or can think of another way of doing this?
Thanks for your input!
*Edit**
Sorry if I was unclear, I am quite new to metagenomics and my understanding/definition of terms is still developing. I have amplicon sequencing data from several E. coli dominated metagenomes. The amplicon is a subregion of a bacterial topoisomerase. I want to check the mutation frequency at different positions for each of the genomes, which have been subjected to different treatments. So I want to compare each read to the reference sequence and see how frequent mutations at the specific site are under the different conditions. I want to do this for nucleotide sequences first, later eventually translate the reads and do it for amino acid positions. Hope that clarifies it a bit, sorry again for being unclear.
Merge, scan/trim to remove adapter contamination, map with bbmap and the call variants using
callvariants.sh
. All using BBMap suite.Can't you just map the reads to the reference genome and make a vcf of it. You only need to know the differences between your sample and the references, right (not of every base I assume).