Hi everyone,
I'm testing a universal rRNA depletion protocol followed by Illumina sequencing. I have sequenced my RNA (from an insect species) on an Illumina NextSeq2500, 2x150bp, about 30mln reads per sample. I don't expect my protocol to complete deplete the rRNA and when I check the overrepresented sequences on FastQC I see something interesting:
The R1 always have about 2/3% overrepresented sequences The R2 have about 7/8%. In both cases, these sequences are rRNA, but I don't understand why they would be more abundant in the R2 than the R1 since they are sequenced from the same fragment (note that fragments are often smaller than reads size in this case, but I don't think it explains it).
Did anybody saw a similar pattern before? Thanks!
You could try to merge your R1/R2 reads and then scan/trim/QC the merged reads. Having inserts shorter than the length of sequencing does not bode well for the quality of library.
Trying to reason out every FastQC graph is not essential.