Hi all.
I am trying to understand the mutation frequency to annotate my exome sequence data.
Since I look it up on the Internet but I tried to fail how to calculate it.
To make it matter worse, I didn't know the actual meaning of mutation frequency (Wiki seems to explain their definition but I didn't get it..it is too formal) and why this information is important when analyzing mutation.
Could anyone tell me a little bit about their basic meaning and equation to get it?
Thank you for your comments.
What I would like to to is a case that # of mutation / per MB.
In this case, how to calculate the MB?
Does it simply refer to sum of entire chromosome's length from 1 to 22? (or just considering only exome length?)
In your case you would use the exome length. If you have a bed file for the captured regions, then this should be pretty easy. If not, you can compute the genome coverage from the bam file with bedtools and then add up the regions that have depth above a minimum threshold.
Thanks for you reply.
Fortunately, I have a bed for the exome sequencing.
my bed files are composed of four columns as follows:
As you mentioned, is it right to sum up each (END-START) corresponding each exon and then divide them by mutation that I found?
That should be a good enough approximation, yes, assuming all the mutations called are within the exome regions. Minor point: the length of each region is END-START+1
Thanks!
Your comments will be very helpful in my research.
I will try it :)