Entering edit mode
14 days ago
Lawrence
•
0
Hi All,
I am working on some eukaryotic samples to do a variant analysis, and have found across multiple sequencing runs a strange pattern of read mapping. In these samples, as the amount of reads gets higher, the coverage increases but in very biased locations, rather than smoothly across the genome. The reference genome used is the same species as the tissue the DNA was extracted from.
I have attached a screenshot from IGV showing this pattern in a lower coverage (top) and higher coverage (bottom) sequencing run of the same sample. Of note, this pattern is true across all of the samples.
Are you downsampling the data or are you sequencing more from the same libraries? In any case this may be a characteristic of the libraries themselves. What kind of data is this? Was there some enrichment done.
The original libraries were sequenced more deeply. These are Kapa DNA libraries with some size selection but no specific enrichment. I should also add that the libraries were remade from scratch for a third sequencing run and the result was the same.
Did you check what kind of sequence these peaks overlap and which species is this? Possibly, these are repetitive or low complexity regions. In fact this is a likely explanation if you have a draft reference that captures only a small fraction of repeat inserts. In that case, you would expect a bias induced by much higher copy numbers of TEs because the reads stemming from them have "nowhere else to go".