Question

Illumina RNA-seq: Overrepresented sequences are mostly in the R2 reads, not R1

1

Entering edit mode

5 months ago

Umberto ▴ 10

Hi everyone,

I'm testing a universal rRNA depletion protocol followed by Illumina sequencing. I have sequenced my RNA (from an insect species) on an Illumina NextSeq2500, 2x150bp, about 30mln reads per sample. I don't expect my protocol to complete deplete the rRNA and when I check the overrepresented sequences on FastQC I see something interesting:

The R1 always have about 2/3% overrepresented sequences The R2 have about 7/8%. In both cases, these sequences are rRNA, but I don't understand why they would be more abundant in the R2 than the R1 since they are sequenced from the same fragment (note that fragments are often smaller than reads size in this case, but I don't think it explains it).

Did anybody saw a similar pattern before? Thanks!

illumina rRNA RNAseq RNA • 567 views

ADD COMMENT • link updated 5 months ago by GenoMax 149k • written 5 months ago by Umberto ▴ 10

0

Entering edit mode

why they would be more abundant in the R2 than the R1 since they are sequenced from the same fragment (note that fragments are often smaller than reads size in this case, but I don't think it explains it).

You could try to merge your R1/R2 reads and then scan/trim/QC the merged reads. Having inserts shorter than the length of sequencing does not bode well for the quality of library.

Trying to reason out every FastQC graph is not essential.

ADD REPLY • link 5 months ago by GenoMax 149k

score 0 · Answer 1 · 2024-10-11

0

Entering edit mode

5 months ago

noodle ▴ 640

Sounds like an artifact of whatever QC program you're using. For a better glimpse of these numbers look at an aligned bam and maybe try https://gatk.broadinstitute.org/hc/en-us/articles/360037057492-CollectRnaSeqMetrics-Picard

ADD COMMENT • link 5 months ago by noodle ▴ 640