Strange Depth of Coverage distribution

0

Entering edit mode

6.7 years ago

melaniep • 0

Hi,

After googleing for a while and not finding any hint, I would like to ask the experts here for their opinion. I have a very strange result for the distribution of the depth of coverage (see picture). It seems as there were two curves. Some samples more than others, some (not show. In total I have 20 samples, they are bee samples and were sequenced on Illumina HiSeq3000 (whole genome). I have been working with bee sequence data before and never saw this behaviour. Normally always more or less a smooth distribution. I wonder if somebody of you came across something similar or has an idea why there is this irregular distribution? I cannot think of something to explain this..

coverage distribution

sequencing genome • 4.6k views

ADD COMMENT • link 6.7 years ago by melaniep • 0

0

Entering edit mode

How to add images to a Biostars post

ADD REPLY • link 6.7 years ago by GenoMax 154k

0

Entering edit mode

thanks for the hint ;-)

ADD REPLY • link 6.7 years ago by melaniep • 0

0

Entering edit mode

Were these libraries prepared in two batches/with different methods?

ADD REPLY • link 6.7 years ago by GenoMax 154k

0

Entering edit mode

No, should be just one library prep for each sample. (kit NEBNext Ultra II) But I can double check with the sequencing facility.

ADD REPLY • link 6.7 years ago by melaniep • 0

0

Entering edit mode

This looks like a histogram plotting issue where it is binning the depth values in a skewed way.

ADD REPLY • link 6.7 years ago by Damian Kao 16k

0

Entering edit mode

I also checked the numbers directly from the output of GATKs DepthOfCoverage. And its already like this there, so should not be related to the plotting itself.

ADD REPLY • link 6.7 years ago by melaniep • 0

0

Entering edit mode

Even if multiple samples were mixed together in the depth of cov reporting, it should not show this distribution. Multi-sample mix should show a smooth histogram with potentially multiple peaks. Maybe try using samtools stats' depth of coverage info for plotting and see if you get the same thing. Perhaps it is something to do with GATK.

ADD REPLY • link 6.7 years ago by Damian Kao 16k

0

Entering edit mode

Hi, I had recalculated the depth with samtools, and it was the same. But Today it came to my mind why the coverage could be like this: I have overlapping pair-end sequencing reads! That would make sense, no?

ADD REPLY • link 6.6 years ago by melaniep • 0

0

Entering edit mode

Should be easy enough to check that theory. I recommend using bbmerge.sh from BBMap suite for merge the reads. Do the merging on raw data (non-trimmed).

ADD REPLY • link 6.6 years ago by GenoMax 154k

0

Entering edit mode

Yeah that could be it. You can check this by looking at the insert size distribution of your PE reads. See if it is smaller than 2 * average read length. The 9th column of your .sam/.bam should be the insert size.

ADD REPLY • link 6.6 years ago by Damian Kao 16k

Login before adding your answer.