Dear Biostars,
I have been performing paired-end RNA seq on the nextseq2000. When running fastqc, I noticed an overrepresented poly-G sequence consisting of 59 G bases (my read length is also 59 bp). Overrepresentation was between 1-3% and interestingly only present in R2 (for all my samples. N=35).
I read in some other posts that G could indicate that there was “no-signal”, this would mean that for 1-3% of my reads, the sequencing of read 2 failed. I am quite surprised by this. Any ideas why this is happening?
What can I best do with these reads? Remove them? And would it be possible to still map the R1 for this 1-3%, while for all the other reads I would be mapping the pairs?
I hope you can help me!
Best,
Jurgen
Thank you for your response, I have 35 samples and they all show this
GenoMax would you recommend removing these reads before mapping with STAR? I think these reads wouldn’t map anyway, so maybe I don’t need to bother to remove them?
Correct. For alignments it should be fine.
Thank yuo for confirming