I have a whole file with CNVnator calls like the one below.
deletion chr1:179135401-179150100 14700 0.156562 1.08417e-11 2.871e+09 1.2549e-11 2.871e+09 1
deletion chr1:179161601-179166400 4800 0.0137354 3.32026e-11 1.94083e-64 5.69188e-11 8.27154e-72 1
deletion chr1:179181001-179194400 13400 0.239849 1.18935e-11 1.70262e-08 1.398e-11 6.34306e-06 1
The definition of the colums is:
normalized_RD -- normalized to 1.
p-val1 -- is calculated using t-test statistics.
p-val2 -- is from probability of RD values within the region to be in the tails of gaussian distribution describing frequencies of RD values in bins.
p-val3 -- same as p-val1 but for the middle of CNV
p-val4 -- same as p-val2 but for the middle of CNV
q0 -- fraction of reads mapped with q0 quality
The last column indicates that all the reads on which the deletion call are based have mapping quality. That is true for all 61850 calls. Even for the ones that indicate regions where no reads are mapped, ie a cnv 1 -> 0 change. Is this normal?
A large part of the genome of a lot of species exist of repetitive areas for which by definition:
1) reads map with mapping quality 0
2) reads already span the repeat
So when excluding mapping quality zero based CNV calls the read depth method can only be used for detection of copy number 1 -> copy number 2 or copy number 1 -> copy number 0 events? Or should I also look into the deletion calls based on mapping quality zero reads?
I used CNVnator a while ago and encountered this, and I think most likely its a bug. You could check your BAM for the specified region manually to make sure that's the case if you haven't already. In my case, the mapping qualities in the BAM were alright.
I'm afraid I don't quite understand what you're asking. Can you try rephrasing a little and giving a more thorough example? Defining the columns in the output you pasted would help as well.
My main question is if it correct to expect a lot of CNV calls that are based on regions with mainly mapping quality zero reads. My second question is if I should drop the calls based on these areas or use them.