Entering edit mode
15 months ago
prasundutta87
▴
670
Hi,
I was just wondering if anyone has seen a genome coverage histogram as the green one and have any explaination for the same? This plot is generated after aligning ONT long reads to the human genome.
Regards, Prasun
What is represented on X axis?
Depth of coverage.
I had actually added a legend, but it never appeared. Probably, I did it wrongly.
Why is it plotted in that spiky way where as the other curves are smooth?
Exactly... that's my query. It is actually a multiqc output plot of multiple qualimap reports created after minimap2 alignment of ont reads.
There is something systematic here where odd numbers of depth are not found in any bins.
Could it be that for some reason in the qualimap report, maybe for simple presentation, they only summarize coverages of even counts. It seems like there are more bins with higher (>40) coverage that any other histogram, so I could see the program simplifying the report by only reporting bin counts for 0,2,4,6,8... read depths instead. Then, multiqc wouldn't know the difference so would report 0 bins for the missing reads since it is not an XY plot.
I had a similar thing happen once in a different context (not long reads). I cannot remember exactly what the issue was, but I remember it was something simple like that.
I don't think that it is Qualimap's issue. Other samples look fine. Additionally, I have never seen this issue with ONT long reads for other samples. I am waiting for some other samples to check if it is related to the type of flowcell used for sequencing.
If you go directly to the qualimap report, does it show the same thing? Admittedly, I am naïve on the technical aspects, but I don't understand how the flow cell would affect read counts over genomic bins.
Its the same for Qualimap. Even Mosdepth has the same issue. Odd numbers have near 0 coverage. R9 and R10 of ONT have different chemistries and the aligner settings may have to be changed for proper mapping. This is just a speculation, because I am seeing this issue for the first time, so don't have much background on it.
Very strange. So it seems like almost each fragment is counted twice. Sort of like PE data or some strange duplication.
Is be interested in a follow up if you find the source.
Just wanted to add here that there was no differnce in the flowcell chemistry in any of the samples..all used R10 flowcell..