I see poly-A trimming listed as a recommended step in many RNA-Seq protocols. However, I can imagine that trimming e.g. all As from the ends of a sequence could ultimately lead to false results if what is being trimmed is not a poly-A tail but a repetitive A sequence within mRNA. This might cause reads to align non-uniquely or trim reads to a length that would be filtered before alignment. While trimming all A's is not the typical approach I've seen, this would also theoretically be possible if A k-mers were used. I couldn't find a research paper where this was tested.
What is your opinion on this?
Is there maybe a k-mer length where poly-A trimming specificity is optimal?
Thanks for any input!
Scan data with
fastqc
if polyA contamination is an issue. If not, so not coming up as overrepresented sequence, don't do any trimming. Same goes for adapter contamination. Only act if it is an issue. Otherwise leave data as is.