Paired-end RNA seq (NextSeq2000): 1-3% if R2 reads are only contain G bases
2
0
Entering edit mode
13 months ago
Jurgen • 0

Dear Biostars,

I have been performing paired-end RNA seq on the nextseq2000. When running fastqc, I noticed an overrepresented poly-G sequence consisting of 59 G bases (my read length is also 59 bp). Overrepresentation was between 1-3% and interestingly only present in R2 (for all my samples. N=35).

I read in some other posts that G could indicate that there was “no-signal”, this would mean that for 1-3% of my reads, the sequencing of read 2 failed. I am quite surprised by this. Any ideas why this is happening?

What can I best do with these reads? Remove them? And would it be possible to still map the R1 for this 1-3%, while for all the other reads I would be mapping the pairs?

I hope you can help me!

Best,
Jurgen

RNA-seq • 1.3k views
ADD COMMENT
2
Entering edit mode
13 months ago
GenoMax 147k

Are multiple samples showing this or just some? It is difficult to divine a specific reason but it is possible that some of the library fragments do not have a functional adapter on second end and thus failed to prime/generate sequencing signal.

As long as you have enough data you should simply ignore these errant reads (and their R1 counterparts) and move on with the analysis with the rest of the data. Most aligners do not allow you to mix single- and paired-end reads while aligning (bbmap does) and discarding those 3% reads is not going to make a big difference.

ADD COMMENT
0
Entering edit mode

Thank you for your response, I have 35 samples and they all show this

ADD REPLY
0
Entering edit mode

GenoMax would you recommend removing these reads before mapping with STAR? I think these reads wouldn’t map anyway, so maybe I don’t need to bother to remove them?

ADD REPLY
1
Entering edit mode

Correct. For alignments it should be fine.

ADD REPLY
0
Entering edit mode

Thank yuo for confirming

ADD REPLY
2
Entering edit mode
13 months ago

NextSeq2000 is a dual-color chemistry Illumina machine

G is being called when there's no fluorescence signal with the dual color chemistries. It appears that the second primer might not have bound.

source: https://www.reddit.com/r/bioinformatics/comments/16n116z/r2_novaseq_full_of_gs/

ADD COMMENT
0
Entering edit mode

Thank you for helping out!

ADD REPLY

Login before adding your answer.

Traffic: 1684 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6