Hello,
We sent our samples off for RNA exome sequencing. We normally do mRNA sequencing but our RNA was degraded so RNA exome sequencing was recommended. We used the TruSeq RNA Exome kit. I am encountering issues I haven't experienced before when trimming reads.
Below is the adapter content from 2 reads (R1 and R2) before trimming. Color key: purple = R2 PolyG, gray = R1 PolyA, orange = R2 PolyA, blue = R1 Illumina Universal Adapter, green = R2 Illumina Universal Adapter,
Normally, I trim using the bbduk.sh recommended parameters (below). My adapter file contains the universal, read 1, and read 2 sequences.
bbduk.sh -Xmx3g in1={input.R1} in2={input.R2} out1={output.trim_R1} out2={output.trim_R2} ref={params.adapters} ktrim=r k=23 mink=11 hdist=1 tpe tbo threads={params.threads}
However, after trimming I still see the PolyA and PolyG in my FastQC: Adapter Content report. I have read that artificial PolyGs can occur due to 2-color chemistry. I have confirmed that we used the NovaSeq 6000 instrument which has Red-Green 2-Channel SBS chemistry. Bbduk has a trimpolyg argument I can change. I haven't seen PolyA show up before in my other sequencing runs. What would explain this?
I adjusted my trimming parameters to include trimpolya=5
and trimpolyg=5
. The percentage is reduced, but the problem doesn't completely go away.
Color key: gray = R2 PolyA, green = R2 PolyG, blue = R1 PolyA
Should I make the filter more stringent? As long as these percentages are low, is this even a big issue when mapping?
Poly-G can be explained but poly-A is odd. You could simply filter out these odd sequences before trimming the remaining in a second step (not tested but try it out)
GenoMax Wouldn't that filter the middle of the read too? When I tested this I lost the majority of my reads and have < 5% reads leftover. Wouldn't I only want to trim from the ends by using the trimpolyg and trimpolya arguments?
Above was an example.I thought you had reads that were just poly-A/T but if that is not the case then you can ignore my comment.
Ideally. But I thought you said you did that and the results still showed reads with poly-G/T? You can increase the length of
trimpolyN=NN
if you are still seeing the bases.At the end of the day none of this should matter since the aligner will soft-clip parts of the reads that do not align (as long as you are aligning to a good reference).
I encountered sometimes the same issue and I got good trimming results using fastp. It has a feature to activate poly-G trimming by specifying the argument
-g
(look at the section polyG tail trimming).Hope it helps!