Intro: Yeast chromosome 12 contains a really long region of rDNA. There are about 150-200 or greater copies of a 9.1kb repeat, so about 1.5M bases in total, which is the majority of the chromosome. The published yeast reference genome has only 2 copies of the repeat.
My situation:
I just did whole genome sequencing of some yeast clones. I have on about 70X coverage in most of the genome. If you figure that there are 200 copies of the rDNA repeat on chromosome 12, but only 2 copies represented in the reference sequence, then in this region I should see a read depth of roughly 70*100 = 7000. And in fact, this is pretty much what I see, here's a plot of read depth in that area. To generate this plot I mapped reads with bwa
and then used samtools depth
and then plotted in R.
My question: The plot looks obviously artificially cut off at a read depth of ~8000--why? I didn't specify any limits in creating the plot. So where is the cutoff happening? Is it something samtools depth is doing? Something bwa did? Or something about how the reads are generated during Illumina sequencing?
Last thing to mentions: the highest value is not exactly 8000, there are a few positions with depths of 8002, just to make things weirder. And I've shown just one example, but it happens on to all clones I sequenced.
Thanks for any help!
Aha!
-d 0
worked! Thanks!