Question

Issue with cDNA PCR artifact in RNA-Seq dataset

0

Entering edit mode

18 months ago

metzgeryonatan • 0

I have a large RNA-Seq dataset that I am working on. In half of my samples approximately 2-5% of reads are duplicates of this sequence "AAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTT". I understand this sequence is a primer from the cDNA step of RNA-Seq data preperation. This leads to a clear divide between my samples, such that in the MDS space, one axis divides samples by type and the other divides samples by whether or not they have this artifact.

How should I deal with this? Can I completely remove this sequence from my files? How should I go about that?

PCR RNA-seq artifact cDNA • 670 views

ADD COMMENT • link updated 18 months ago by Matthias Zepper 5.1k • written 18 months ago by metzgeryonatan • 0

score 0 · Answer 1 · 2024-02-14

0

Entering edit mode

18 months ago

Matthias Zepper 5.1k

You can remove such sequences with multiple tools: BBDuk for example. However, I do not think that the divide will emerge just because of that. Rather, it may be low-input/complexity samples that happen to suffer from this artifact the most.

ADD COMMENT • link 18 months ago by Matthias Zepper 5.1k