Strange pattern of uneven read count distribution
2
0
Entering edit mode
5 weeks ago
BioStar22 • 0

Hi Biostars,

I have multiple murine whole genome sequencing samples which show a non-uniform readcount distribution along the genome, all following the same pattern, example attached. Has anyone be facing this pattern in readcount distribution before and may have an idea what could be causative? (i dont see patterns in standard QC parameters)

Thanks for any input on this! enter image description here

Update on GC bias:

GC content was proposed as a reason, so I plotted mean GC in reference genome along with the readcount distribution. There is no clear correlation, maybe a mild high GC - low coverage, but it’s very subtle. Thus, I guess GC bias is not the (major) cause of the fluctuating coverage. Does anyone have a different idea?

enter image description here

WGS fluctuating coverage • 500 views
ADD COMMENT
0
Entering edit mode
5 weeks ago
shelkmike ★ 1.4k

Sequencing coverage by short reads depends on GC content (https://pubmed.ncbi.nlm.nih.gov/22323520/). Maybe, this coverage distribution reflects distribution of GC content along chromosomes?

Also, if you removed reads with low mapping quality, this may have reduced coverage in repetitive regions.

ADD COMMENT
0
Entering edit mode

I had a quick look at overall GC content, which looked ok on a first look, but I'll dive deeper into GC, thanks for the hint!

ADD REPLY
0
Entering edit mode
5 weeks ago
d-cameron ★ 2.9k

Megabase/arm-level GC biases do exist. Whilst some tools perform post-segmentation CN smoothing to handle exact this sort of gradual change in CN without any explanatory SV, for the most part, CN biases on this scale remain largely uninvestigated.

ADD COMMENT
0
Entering edit mode

Thank you for your comment! I was using HMMcopy (https://bioconductor.org/packages/release/bioc/vignettes/HMMcopy/inst/doc/HMMcopy.pdf) which is correcting for GC content and Mappability, but I'll anyway check if the pattern relates to the GC content pattern.

ADD REPLY
0
Entering edit mode

GC normalisation is only for small-scale (~1kbp) differences in GC content. GC bias does not explain arm-level copy number biases. That fact you're seeing higher coverage toward all centromeres can't be explained by GC. The abrupt changes (e.g. chr12/chr13) can be explained by genomic rearrangements, but the gradual arm-level changes (e.g chr2) looks more like a sequence artifacts than real biology.

ADD REPLY
0
Entering edit mode

This agrees with my current picture of the problem. Thank you for your opinion!

ADD REPLY

Login before adding your answer.

Traffic: 1177 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6