I have a large RNA-Seq dataset that I am working on. In half of my samples approximately 2-5% of reads are duplicates of this sequence "AAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTT". I understand this sequence is a primer from the cDNA step of RNA-Seq data preperation. This leads to a clear divide between my samples, such that in the MDS space, one axis divides samples by type and the other divides samples by whether or not they have this artifact.
How should I deal with this? Can I completely remove this sequence from my files? How should I go about that?