I have a gene sequence file about 500bp in length, which I mapped to a set of Illumina paired-end reads to extract the reads specific to this sequence. My objective is to determine the allele frequency of a particular SNP (located at codon 87) in this gene among the extracted reads. In the first step, BWA identified 300 reads that mapped to my gene. After that, I trimmed the gene sequence to a shorter 100bp length that contains the SNP and remapped this trimmed sequence to the 300 reads to further narrow down the candidates. As a result, I now have 35 sets of paired reads that map to the 100bp sequence.
When I use aligners like MUSCLE, the alignment algorithm generates more gaps than aligned reads, which is not providing the expected results. Going through each and every read by eye doesn’t make much sense. I’m sure, there is a better way to do this.
Has anyone done something similar? Any insights would be helpful. Thanks.