I have several questions about adapters that I haven't been able to answer, and some of them arise in relation to the QC. When looking at the base per position in read plot, the first n bases are non-uniform, and then they approach the uniform distributions at the other end. Why does this happen at all? Why does it happen only on one end?
My first assumption was those are adapters, but they should be at the end of the read, right? And also, there is very little soft-clipping by the aligner (over 99.6% of reads are all matches/mismatches), so there can't be any adapters there, right?
I also thought they might be biased sequences for breaking in fragmentation, but I've found no support for it, and the effect seems to pronounced to be explained by this.
Thanks in advance!
Regarding the non-uniform base distribution of FastQC, it has been stated in many threads here that this is normal and that the classification as a failure is misleading. I think it has confused many beginners including myself. The reason is that FastQC's evaluation is not based on empirical data and combines this with a suggestive "traffic light system" (fail, warning, pass), which is in my opinion not a good thing to do. I wish there was a new QC tool that would incorporate all the experiences we have gained since then.
Thank you. Sorry for missing those and asking again, googling and searching here didn't help me initially!
I think the question is if it is based on random priming. If so, that could explain it. If your reads align at so high rates, I would say, forget about the bias, everything is fine.
Thank you. Unfortunately, I am having trouble confirming what random priming really is. Is this an example of the method, introduction of random primers for PCR? http://www.biotechniques.com/multimedia/archive/00009/03354st06_9908a.pdf
In my -incomplete- understanding of lab processes, random priming is PCR using all possible primer sequences in a mix. Imagine you generate all 4^6 haxamers as oligo primers and use them in a single PCR reaction, any DNA sequence would be amplified.