Question

CNVnator CNV-calling result interpretation

2

Entering edit mode

8.5 years ago

milk841103 ▴ 10

Hello all,

I am a new-comer in bioinformatics and I don't really have a strong stats background. Recently while I am using CNVnator v0.3.2 for a project, I bump into questions when trying to make sense of results from the CNV calling step. I would really appreciate it if anyone can help me with them.

I saw in many literatures and older posts that people refer to the data given by the step of CNV calling as p-values and filter their raw CNV calls using p=0.05 as cut-off, however in the README file of the newest version of CNVnator, the results are referred as e-values instead of p-values. Anyone knows what has been changed in the newest version of CNVnator? Is it the case that the e-vals are converted from the p-vals (which is calculated from the t-test) and if so how? or the e-vals and the p-vals in the output can be treated interchangeably?
for e-val2, what does it mean by "the region to be in the tail of Gaussian distribution"? Can I interpret this value as the significance of the call being a CNV?
for e-val3 and e-val4, what does it mean by "for the middle of CNV" and what's the purpose of looking specifically in the middle of CNV?

here is the information given by CNVnator README file on the output for your reference:

normalized_RD -- normalized to 1.

e-val1 -- is calculated using t-test statistics.

e-val2 -- is from the probability of RD values within the region to be in the tails of a gaussian distribution describing frequencies of RD values in bins.

e-val3 -- same as e-val1 but for the middle of CNV

e-val4 -- same as e-val2 but for the middle of CNV

q0 -- fraction of reads mapped with q0 quality

https://github.com/abyzovlab/CNVnator

Thank you in advance for comments and help!

genome sequence CNVnator CNV • 7.4k views

ADD COMMENT • link updated 8.5 years ago by Eric T. ★ 2.8k • written 8.5 years ago by milk841103 ▴ 10

0

Entering edit mode

That explanation for e-val2 sounds like its the same thing as p-value. q0 is the fraction of reads with mapping quality zero, which indicates dubious mapping results in the region of the CNV.

ADD REPLY • link 8.5 years ago by Vivek ★ 2.7k

0

Entering edit mode

yeah i found the explanations were basically the same as when they labelled the values as p-vals, just the p-vals are changed into e-vlue, but in my results I could get really large e-vals (>>1) which is impossible for p-values which makes me wonder (otherwise why would they change p-vals to e-vals, to my understanding they represent different things in stats). and there is no explanation on how the e-vals are generated. I also have difficult time understanding what each one of the four e-vals indicates..... do you by any chance have idea what are they trying to test for each of the e-vals?

ADD REPLY • link 8.5 years ago by milk841103 ▴ 10

0

Entering edit mode

The eval2 should never be > 1 as it is mentioned specifically as a probability so if you are getting values >> 1, you might want to do a bit of troubleshooting or e-mail the author.

ADD REPLY • link 8.5 years ago by Vivek ★ 2.7k

score 0 · Answer 1 · 2016-06-03

It looks like this e-value means the same thing as in BLAST statistics: the number of times we expect a hit of this significance would be observed by chance in a genome or database of this size. For small values (e.g. below 0.05) the e-value and p-value converge on the same number, but for p-values that approach 1.0, the e-values instead grow above 1. An e-value >>1 means something similar to a p-value with leading 9's, i.e. almost certainly due to chance and not significant under the null model.