PolyA and PolyG sequences in FastQC/MultiQC report
0
2
Entering edit mode
15 months ago
kenneditodd ▴ 50

Hello,

We sent our samples off for RNA exome sequencing. We normally do mRNA sequencing but our RNA was degraded so RNA exome sequencing was recommended. We used the TruSeq RNA Exome kit. I am encountering issues I haven't experienced before when trimming reads.

Below is the adapter content from 2 reads (R1 and R2) before trimming. Color key: purple = R2 PolyG, gray = R1 PolyA, orange = R2 PolyA, blue = R1 Illumina Universal Adapter, green = R2 Illumina Universal Adapter, Raw Reads MultiQC adapter content

Normally, I trim using the bbduk.sh recommended parameters (below). My adapter file contains the universal, read 1, and read 2 sequences.

bbduk.sh -Xmx3g in1={input.R1} in2={input.R2} out1={output.trim_R1} out2={output.trim_R2} ref={params.adapters} ktrim=r k=23 mink=11 hdist=1 tpe tbo threads={params.threads}

However, after trimming I still see the PolyA and PolyG in my FastQC: Adapter Content report. I have read that artificial PolyGs can occur due to 2-color chemistry. I have confirmed that we used the NovaSeq 6000 instrument which has Red-Green 2-Channel SBS chemistry. Bbduk has a trimpolyg argument I can change. I haven't seen PolyA show up before in my other sequencing runs. What would explain this?

I adjusted my trimming parameters to include trimpolya=5 and trimpolyg=5. The percentage is reduced, but the problem doesn't completely go away. Color key: gray = R2 PolyA, green = R2 PolyG, blue = R1 PolyA Trimmed read adpater QC

Should I make the filter more stringent? As long as these percentages are low, is this even a big issue when mapping?

multiqc fastqc • 2.9k views
ADD COMMENT
0
Entering edit mode

Poly-G can be explained but poly-A is odd. You could simply filter out these odd sequences before trimming the remaining in a second step (not tested but try it out)

bbduk.sh -Xmx4g in1=file.R1.fq.gz in2=file.R2.fq.gz out=stdout.fq literal=AAAAAAAA,TTTTTTT k=5 | bbduk.sh -Xmx3g in=stdin.fq out1={output.trim_R1} out2={output.trim_R2} ref={params.adapters} ktrim=r k=23 mink=11 hdist=1 tpe tbo threads=NN
ADD REPLY
0
Entering edit mode

GenoMax Wouldn't that filter the middle of the read too? When I tested this I lost the majority of my reads and have < 5% reads leftover. Wouldn't I only want to trim from the ends by using the trimpolyg and trimpolya arguments?

ADD REPLY
0
Entering edit mode

Above was an example.I thought you had reads that were just poly-A/T but if that is not the case then you can ignore my comment.

Wouldn't I only want to trim from the ends by using the trimpolyg and trimpolya arguments?

Ideally. But I thought you said you did that and the results still showed reads with poly-G/T? You can increase the length of trimpolyN=NN if you are still seeing the bases.

As long as these percentages are low, is this even a big issue when mapping?

At the end of the day none of this should matter since the aligner will soft-clip parts of the reads that do not align (as long as you are aligning to a good reference).

ADD REPLY
0
Entering edit mode

I encountered sometimes the same issue and I got good trimming results using fastp. It has a feature to activate poly-G trimming by specifying the argument -g (look at the section polyG tail trimming).

Hope it helps!

ADD REPLY

Login before adding your answer.

Traffic: 2268 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6