Hi,
After googleing for a while and not finding any hint, I would like to ask the experts here for their opinion. I have a very strange result for the distribution of the depth of coverage (see picture). It seems as there were two curves. Some samples more than others, some (not show. In total I have 20 samples, they are bee samples and were sequenced on Illumina HiSeq3000 (whole genome). I have been working with bee sequence data before and never saw this behaviour. Normally always more or less a smooth distribution. I wonder if somebody of you came across something similar or has an idea why there is this irregular distribution? I cannot think of something to explain this..
How to add images to a Biostars post
thanks for the hint ;-)
Were these libraries prepared in two batches/with different methods?
No, should be just one library prep for each sample. (kit NEBNext Ultra II) But I can double check with the sequencing facility.
This looks like a histogram plotting issue where it is binning the depth values in a skewed way.
I also checked the numbers directly from the output of GATKs DepthOfCoverage. And its already like this there, so should not be related to the plotting itself.
Even if multiple samples were mixed together in the depth of cov reporting, it should not show this distribution. Multi-sample mix should show a smooth histogram with potentially multiple peaks. Maybe try using samtools stats' depth of coverage info for plotting and see if you get the same thing. Perhaps it is something to do with GATK.
Hi, I had recalculated the depth with samtools, and it was the same. But Today it came to my mind why the coverage could be like this: I have overlapping pair-end sequencing reads! That would make sense, no?
Should be easy enough to check that theory. I recommend using
bbmerge.sh
from BBMap suite for merge the reads. Do the merging on raw data (non-trimmed).Yeah that could be it. You can check this by looking at the insert size distribution of your PE reads. See if it is smaller than 2 * average read length. The 9th column of your .sam/.bam should be the insert size.