Question

Paired-end RNA seq (NextSeq2000): 1-3% if R2 reads are only contain G bases

0

Entering edit mode

22 months ago

Jurgen • 0

Dear Biostars,

I have been performing paired-end RNA seq on the nextseq2000. When running fastqc, I noticed an overrepresented poly-G sequence consisting of 59 G bases (my read length is also 59 bp). Overrepresentation was between 1-3% and interestingly only present in R2 (for all my samples. N=35).

I read in some other posts that G could indicate that there was “no-signal”, this would mean that for 1-3% of my reads, the sequencing of read 2 failed. I am quite surprised by this. Any ideas why this is happening?

What can I best do with these reads? Remove them? And would it be possible to still map the R1 for this 1-3%, while for all the other reads I would be mapping the pairs?

I hope you can help me!

Best,
Jurgen

RNA-seq • 2.3k views

ADD COMMENT • link 21 months ago by Jurgen • 0

score 2 · Accepted Answer · 2023-10-27

2

Entering edit mode

22 months ago

GenoMax 153k

Are multiple samples showing this or just some? It is difficult to divine a specific reason but it is possible that some of the library fragments do not have a functional adapter on second end and thus failed to prime/generate sequencing signal.

As long as you have enough data you should simply ignore these errant reads (and their R1 counterparts) and move on with the analysis with the rest of the data. Most aligners do not allow you to mix single- and paired-end reads while aligning (bbmap does) and discarding those 3% reads is not going to make a big difference.

ADD COMMENT • link 22 months ago by GenoMax 153k

0

Entering edit mode

Thank you for your response, I have 35 samples and they all show this

ADD REPLY • link 22 months ago by Jurgen • 0

0

Entering edit mode

GenoMax would you recommend removing these reads before mapping with STAR? I think these reads wouldn’t map anyway, so maybe I don’t need to bother to remove them?

ADD REPLY • link 22 months ago by Jurgen • 0

1

Entering edit mode

Correct. For alignments it should be fine.

ADD REPLY • link 22 months ago by GenoMax 153k

0

Entering edit mode

Thank yuo for confirming

ADD REPLY • link 21 months ago by Jurgen • 0

score 2 · Accepted Answer · 2023-10-27

2

Entering edit mode

22 months ago

benformatics 4.1k

NextSeq2000 is a dual-color chemistry Illumina machine

G is being called when there's no fluorescence signal with the dual color chemistries. It appears that the second primer might not have bound.

source: https://www.reddit.com/r/bioinformatics/comments/16n116z/r2_novaseq_full_of_gs/