How to remove rRNA, bacterial RNA and polyA contamination from RNA-seq data(fastq format)?
3
3
Entering edit mode
7.5 years ago
Megan ▴ 50

Hi all,

I am trying to do some QC on RNA-seq raw reads. According to FastQC results, there is some rRNA, bacterial RNA and polyA contamination. But here are my problems.

  1. I have no idea how serious the contamination is. How can I tell it from the results of FastQC?
  2. Is it necessary to remove contamination? Or is there a cutoff beyond which should I remove the contamination?
  3. How can I remove these contamination?

    • How to remove PolyA and bacterial RNA contamination?
    • For rRNA, I have tried the following: (1) download Mt_rRNA, rRNA and Mt_tRNA sequences from BioMart of Ensembl. (2) using bowtie2 for rRNA + tRNA removal.

      step 1: create index
              bowtie2-build rRNA.fasta rRNA.index
      step 2: Align to rRNA index inorder to get rRNA free fastq file. 
              bowtie2 -x rRNA.index -1 sampleA.1.fq -2 sampleA.2.fq --phred33 -N 0 
              --un-conc sampleA-filter.fq --al-conc rRNA.fq -p 8
      

      Is this correct?

Thank you very much!

RNA-Seq sequencing • 6.3k views
ADD COMMENT
0
Entering edit mode

Is it necessary to remove contamination? Or is there a cutoff beyond which should I remove the contamination?

It depends on your downstream analyses - what do you want to do?

Are you sure you have polyA contamination? What kind of libraries do you have? The most common Illumina RNAseq library is mRNA with polyA capture.

How did FastQC tell you had bacterial contamination? If I am not mistaken, FastQC does not include bacterial contamination by default.

ADD REPLY
0
Entering edit mode

Hi,

For downstream analysis, I am going to do DEA and transcriptome reconstruction, etc.

The mRNA is enriched using polyA capture. It is possible to have polyA contamination.

How can I tell if there is contamination? Primarily, I looked at the 'per sequence GC content' and 'overrepresented sequences' sections of FastQC report. And I checked those overrepresented sequences in blast. The overrepresented sequences show polyA and adenovirus contamination.

ADD REPLY
1
Entering edit mode
7.5 years ago

SortMeRNA could help you. It was developed to filter ribosomal RNA. In addition, it give you an idea of how serious is the contamination because you obtain a percent of the reads aligned to ribosomal RNA.

ADD COMMENT
0
Entering edit mode

In addition to SortMeRNA, BBDuk can also do this (see this thread how). My impression is SortMeRNA is slightly more precise, but BBDuk is much faster.

ADD REPLY
1
Entering edit mode
7.4 years ago

I don't know if you have done it but you can use fastq_screen to check for cross-species or other (e.g adapters) contaminations. You can also load your aligned SAM/BAM files in SeqMonk for an RNASeq QC report. SeqMonk is GUI based and hence user friendly. Thereafter as suggested above you can probably use SortMeRNA to remove rRNA reads.

ADD COMMENT
1
Entering edit mode
2.5 years ago
Dreamer ▴ 40

Removing rRNA reads with short reads aligners (based on short exact match seed) might cause some mRNA reads which shares partial sequence similarity being removed. There are a number of tools specifically for rRNA reads removal available. Recently, we developed a rRNA reads detection software named RiboDetector (https://github.com/hzi-bifo/RiboDetector). Benchmarking: https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkac112/6533611 shows that RiboDetector is the most computationally efficient and most accurate software for rRNA reads removal.

RiboDetector can be used out-of-the-box without any database:

  • GPU mode:

    ribodetector -t 20 \
    -l 100 \
    -i inputs/reads.1.fq.gz inputs/reads.2.fq.gz \
    -m 10 \
    -e rrna \
    --chunk_size 256 \
    -o outputs/reads.nonrrna.1.fq outputs/reads.nonrrna.2.fq
    
  • CPU mode

    ribodetector_cpu -t 20 \
    -l 100 \
    -i inputs/reads.1.fq.gz inputs/reads.2.fq.gz \
    -e rrna \
    --chunk_size 256 \
    -o outputs/reads.nonrrna.1.fq outputs/reads.nonrrna.2.fq
    
ADD COMMENT

Login before adding your answer.

Traffic: 1565 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6