Question

How to get mutation frequency from exome sequencing data?

1

Entering edit mode

9.9 years ago

mangfu100 ▴ 810

Hi all.

I am trying to understand the mutation frequency to annotate my exome sequence data.

Since I look it up on the Internet but I tried to fail how to calculate it.

To make it matter worse, I didn't know the actual meaning of mutation frequency (Wiki seems to explain their definition but I didn't get it..it is too formal) and why this information is important when analyzing mutation.

Could anyone tell me a little bit about their basic meaning and equation to get it?

sequencing sequence • 5.6k views

ADD COMMENT • link updated 2.4 years ago by Ram 45k • written 9.9 years ago by mangfu100 ▴ 810

Ram · Accepted Answer · 2015-06-30

3

Entering edit mode

9.9 years ago

ethan.kaufman ▴ 380

Frequency is just a count. It is usually normalized to some fixed unit of time or space to enable comparison with other counts. Mutation frequency can conceivably refer to many things depending on the context:

Number of mutations per sample/per Mb/per gene, etc
Number of samples in which a particular mutation is observed
Percent of reads that support a particular mutation
Number of mutant alleles in an individual or population (usually called "allele frequency")

Really, you need to define for yourself what it is you want to calculate. The calculation itself should then be straightforward.

ADD COMMENT • link updated 2.4 years ago by Ram 45k • written 9.9 years ago by ethan.kaufman ▴ 380

0

Entering edit mode

Thank you for your comments.

What I would like to to is a case that # of mutation / per MB.

In this case, how to calculate the MB?

Does it simply refer to sum of entire chromosome's length from 1 to 22? (or just considering only exome length?)

ADD REPLY • link updated 2.4 years ago by Ram 45k • written 9.9 years ago by mangfu100 ▴ 810

0

Entering edit mode

In your case you would use the exome length. If you have a bed file for the captured regions, then this should be pretty easy. If not, you can compute the genome coverage from the bam file with bedtools and then add up the regions that have depth above a minimum threshold.

ADD REPLY • link updated 2.4 years ago by Ram 45k • written 9.9 years ago by ethan.kaufman ▴ 380

0

Entering edit mode

Thanks for you reply.

Fortunately, I have a bed for the exome sequencing.

my bed files are composed of four columns as follows:

GENE START END EXON_NAME

As you mentioned, is it right to sum up each (END-START) corresponding each exon and then divide them by mutation that I found?

ADD REPLY • link updated 2.4 years ago by Ram 45k • written 9.9 years ago by mangfu100 ▴ 810

1

Entering edit mode

That should be a good enough approximation, yes, assuming all the mutations called are within the exome regions. Minor point: the length of each region is END-START+1

ADD REPLY • link updated 2.4 years ago by Ram 45k • written 9.9 years ago by ethan.kaufman ▴ 380

0

Entering edit mode

Thanks!

Your comments will be very helpful in my research.

I will try it :)

ADD REPLY • link 9.9 years ago by mangfu100 ▴ 810