Question

Allele frequency in gene sequence

0

Entering edit mode

4 months ago

Elizabeth ▴ 30

I have a gene sequence file about 500bp in length, which I mapped to a set of Illumina paired-end reads to extract the reads specific to this sequence. My objective is to determine the allele frequency of a particular SNP (located at codon 87) in this gene among the extracted reads. In the first step, BWA identified 300 reads that mapped to my gene. After that, I trimmed the gene sequence to a shorter 100bp length that contains the SNP and remapped this trimmed sequence to the 300 reads to further narrow down the candidates. As a result, I now have 35 sets of paired reads that map to the 100bp sequence.

When I use aligners like MUSCLE, the alignment algorithm generates more gaps than aligned reads, which is not providing the expected results. Going through each and every read by eye doesn’t make much sense. I’m sure, there is a better way to do this.

Has anyone done something similar? Any insights would be helpful. Thanks.

Allele-frequency • 332 views

ADD COMMENT • link updated 4 months ago by Ram 44k • written 4 months ago by Elizabeth ▴ 30

score 0 · Answer 1 · 2024-07-19

Hi, I guess you are calling variants on a reference genome fragment of 500bp. I would recommend:

mapping your reads with bwa mem and produce a BAM file
stack mapped reads and get stats with samtools mpileup, see instructions at https://github.com/samtools/bcftools/wiki/HOWTOs#mpileup-calling. For each genomic position you will see how many reads map and which alleles can be found (where . is the reference allele).

Hope this helps