Hello all,
I am a new-comer in bioinformatics and I don't really have a strong stats background. Recently while I am using CNVnator v0.3.2 for a project, I bump into questions when trying to make sense of results from the CNV calling step. I would really appreciate it if anyone can help me with them.
I saw in many literatures and older posts that people refer to the data given by the step of CNV calling as p-values and filter their raw CNV calls using p=0.05 as cut-off, however in the README file of the newest version of CNVnator, the results are referred as e-values instead of p-values. Anyone knows what has been changed in the newest version of CNVnator? Is it the case that the e-vals are converted from the p-vals (which is calculated from the t-test) and if so how? or the e-vals and the p-vals in the output can be treated interchangeably?
for e-val2, what does it mean by "the region to be in the tail of Gaussian distribution"? Can I interpret this value as the significance of the call being a CNV?
for e-val3 and e-val4, what does it mean by "for the middle of CNV" and what's the purpose of looking specifically in the middle of CNV?
here is the information given by CNVnator README file on the output for your reference:
normalized_RD -- normalized to 1.
e-val1 -- is calculated using t-test statistics.
e-val2 -- is from the probability of RD values within the region to be in the tails of a gaussian distribution describing frequencies of RD values in bins.
e-val3 -- same as e-val1 but for the middle of CNV
e-val4 -- same as e-val2 but for the middle of CNV
q0 -- fraction of reads mapped with q0 quality
https://github.com/abyzovlab/CNVnator
Thank you in advance for comments and help!
That explanation for e-val2 sounds like its the same thing as p-value. q0 is the fraction of reads with mapping quality zero, which indicates dubious mapping results in the region of the CNV.
yeah i found the explanations were basically the same as when they labelled the values as p-vals, just the p-vals are changed into e-vlue, but in my results I could get really large e-vals (>>1) which is impossible for p-values which makes me wonder (otherwise why would they change p-vals to e-vals, to my understanding they represent different things in stats). and there is no explanation on how the e-vals are generated. I also have difficult time understanding what each one of the four e-vals indicates..... do you by any chance have idea what are they trying to test for each of the e-vals?
The eval2 should never be > 1 as it is mentioned specifically as a probability so if you are getting values >> 1, you might want to do a bit of troubleshooting or e-mail the author.