Many overrepresented sequences in RNA-seq data. Should them be trimmed out?
2
0
Entering edit mode
9.2 years ago
dgtiezzi ▴ 10

I downloaded data from SRA database and fastqc shows many overrepresented sequences with no hits. I blast some sequences and they match with rRNA and mtDNA. The per sequence GC content is weird due to those contaminants. Should I trim the out before alignment or I should ignore them. I believe they will not align to the reference genome, do they?

RNA-Seq • 6.6k views
ADD COMMENT
1
Entering edit mode
9.2 years ago

RNASeq is supposed to have duplication when a gene is in heavy use. FastQC just says no hits because it searches a small database of artifacts. Chances are this is a real spliced gene FastQC won't know about.

ADD COMMENT
0
Entering edit mode
9.2 years ago
Chirag Nepal ★ 2.4k

If it is rRNA, they should align it to the genome. You could actually map the reads to rRNA and filter unmapped reads. Check the quality of unmapped reads and over-represented sequences. If they look OK, then you could mapped those unmapped reads to the genome.

ADD COMMENT
0
Entering edit mode

The reads quality are all good. My objective is to analyze gene expression and I'm going to align the sequences to gene regions. So, those rRNA sequences may interfere in my analysis? They are supposed to not align to gene, right?

ADD REPLY
1
Entering edit mode

They will align to rRNA genes when aligned to the genome. It can probably skew quantification, so remove rRNA, and map the remaining reads.

ADD REPLY
1
Entering edit mode

I think how well they align (for human) will be determined by whether you include the unassembled contigs in the reference. In reality, the rRNA genes are on several different chromosomes and MT, but most of my rRNA reads align to one of those extra contigs. Not everyone uses a good version with all the extra contigs at the bottom so if you have a lot you could see different alignment rates by not using the good reference.

ADD REPLY

Login before adding your answer.

Traffic: 1303 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6