Hello,
i searched in the forum but did not found a clear answer.
I have a certain number of bacteria RNA samples that were sequenced using an illumina paired end approach. The library construction illumina total RNA + ribodepletion were performed by the sequencing provider. By using FastQC and sortMeRNA I found that all datasets are contaminated with rRNA to different %. Some datasets have 1%, while others have 10%, 20% and some even 50%! I understand that there was an obvious problem with the rRNA depletion. Still, I have more than 20M reads mapping to transcripts, in principle 5M reads are sufficient for DEG analysis in bacteria. My questions are:
These datasets could be still be used for DEG analysis if i remove the rRNA reads ? In general how much rRNA contamination is tolerated ? Is there something published in the litterature about rRNA contamination issue for DEG analysis?
thank you very much
there is no need to remove the rRNA reads.
Map your reads against the reference genome and get a gene count
table
Remove the rRNA genes from the gene count table