Question

Poor alignment post-SortMeRNA (rRNA removal)

0

Entering edit mode

10 months ago

Fossil ▴ 30

Hello,

I checked the quality of my fastq files (MGI Tech; paired-end bulk RNAseq) with FastQC and I have one sample that had a fail for the 'per sequence GC content' (huge secondary peak - see below). Most of the overrepresented sequences are rRNA from Rattus Norvegicus, my model. So what I did was run SortMeRNA on fileR1.fq and fileR2.fq separately. I took the non-rRNA read outputs and aligned them with STAR. The alignment rate/uniquely mapped reads is 1.88%.

If I do not filter the rRNA and align with STAR with the original fastq file, it is 76% (all my other samples are ~88-97%). On the PCA plot, this 'problematic' sample clusters tightly with the rest of its group.

I am not sure how to proceed. Is there another way to check for rRNA maybe more downstream with the count matrix?

Is it okay to continue downstream analysis (featureCounts then DESeq2 or limma) without sorting the rRNA / i.e. with the raw fastq? Or should I remove this sample from my analysis?

Thanks in advance for the help!

Per seq GC content for one of the fastq files for sample X

STAR rRNA RNAseq SortMeRNA • 549 views

ADD COMMENT • link updated 10 months ago by andres.firrincieli 3.9k • written 10 months ago by Fossil ▴ 30

0

Entering edit mode

If you have enough replicates, I would simply trash the sample. Even if you remove the rRNA genes from the count matrix you will end up with an expression matrix of mostly low-count genes

ADD REPLY • link 10 months ago by andres.firrincieli 3.9k