Hi all,
I am trying to do some QC on RNA-seq raw reads. According to FastQC results, there is some rRNA, bacterial RNA and polyA contamination. But here are my problems.
- I have no idea how serious the contamination is. How can I tell it from the results of FastQC?
- Is it necessary to remove contamination? Or is there a cutoff beyond which should I remove the contamination?
How can I remove these contamination?
- How to remove PolyA and bacterial RNA contamination?
For rRNA, I have tried the following: (1) download Mt_rRNA, rRNA and Mt_tRNA sequences from BioMart of Ensembl. (2) using bowtie2 for rRNA + tRNA removal.
step 1: create index bowtie2-build rRNA.fasta rRNA.index step 2: Align to rRNA index inorder to get rRNA free fastq file. bowtie2 -x rRNA.index -1 sampleA.1.fq -2 sampleA.2.fq --phred33 -N 0 --un-conc sampleA-filter.fq --al-conc rRNA.fq -p 8
Is this correct?
Thank you very much!
It depends on your downstream analyses - what do you want to do?
Are you sure you have polyA contamination? What kind of libraries do you have? The most common Illumina RNAseq library is mRNA with polyA capture.
How did FastQC tell you had bacterial contamination? If I am not mistaken, FastQC does not include bacterial contamination by default.
Hi,
For downstream analysis, I am going to do DEA and transcriptome reconstruction, etc.
The mRNA is enriched using polyA capture. It is possible to have polyA contamination.
How can I tell if there is contamination? Primarily, I looked at the 'per sequence GC content' and 'overrepresented sequences' sections of FastQC report. And I checked those overrepresented sequences in blast. The overrepresented sequences show polyA and adenovirus contamination.