How to get mutation frequency from exome sequencing data?
1
1
Entering edit mode
9.4 years ago
mangfu100 ▴ 810

Hi all.

I am trying to understand the mutation frequency to annotate my exome sequence data.

Since I look it up on the Internet but I tried to fail how to calculate it.

To make it matter worse, I didn't know the actual meaning of mutation frequency (Wiki seems to explain their definition but I didn't get it..it is too formal) and why this information is important when analyzing mutation.

Could anyone tell me a little bit about their basic meaning and equation to get it?

sequencing sequence • 5.3k views
ADD COMMENT
3
Entering edit mode
9.4 years ago
ethan.kaufman ▴ 380

Frequency is just a count. It is usually normalized to some fixed unit of time or space to enable comparison with other counts. Mutation frequency can conceivably refer to many things depending on the context:

  • Number of mutations per sample/per Mb/per gene, etc
  • Number of samples in which a particular mutation is observed
  • Percent of reads that support a particular mutation
  • Number of mutant alleles in an individual or population (usually called "allele frequency")

Really, you need to define for yourself what it is you want to calculate. The calculation itself should then be straightforward.

ADD COMMENT
0
Entering edit mode

Thank you for your comments.

What I would like to to is a case that # of mutation / per MB.

In this case, how to calculate the MB?

Does it simply refer to sum of entire chromosome's length from 1 to 22? (or just considering only exome length?)

ADD REPLY
0
Entering edit mode

In your case you would use the exome length. If you have a bed file for the captured regions, then this should be pretty easy. If not, you can compute the genome coverage from the bam file with bedtools and then add up the regions that have depth above a minimum threshold.

ADD REPLY
0
Entering edit mode

Thanks for you reply.

Fortunately, I have a bed for the exome sequencing.

my bed files are composed of four columns as follows:

GENE START END EXON_NAME

As you mentioned, is it right to sum up each (END-START) corresponding each exon and then divide them by mutation that I found?

ADD REPLY
1
Entering edit mode

That should be a good enough approximation, yes, assuming all the mutations called are within the exome regions. Minor point: the length of each region is END-START+1

ADD REPLY
0
Entering edit mode

Thanks!

Your comments will be very helpful in my research.

I will try it :)

ADD REPLY

Login before adding your answer.

Traffic: 1907 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6