Not removing biological contaminants from transcriptome - consequences?
0
0
Entering edit mode
4.5 years ago
tanya_fiskur ▴ 70

Hello! I have the next issue: on the first steps of my transcriptome analysis I didn't remove biological contaminates. I was completely new in all these things. Could it affect the differential expression analysis? I guess, they were mapped to the genome, and then not recognized by the annotation file while doing counts. Could it affect the power of analysis?

next-gen • 808 views
ADD COMMENT
1
Entering edit mode

Not sure what you mean with "biological contaminant". A whole-body sample from a free-living multicellular animal or plant consists rarely of a single species only. So, some transcript sequences from symbionts might be expected. There shouldn't be a problem with that in the first place. Instead, you gain information about the host-symbiont relationship if you will.

ADD REPLY
0
Entering edit mode

It was a whole-brain sample and there were contaminants such as ribosomal RNAs, mitochondrial RNAs of the same species. I blasted them, but failed to remove, so they were retained.

ADD REPLY
0
Entering edit mode

How are they contaminants? You may not be interested in them specifically but they are part of normal cells.

I guess, they were mapped to the genome, and then not recognized by the annotation file while doing counts. Could it affect the power of analysis?

Potentially. If these contaminants were not present at the same level across your sample set then the number of reads that were used for counting may have been different. If the difference is drastic then you would need to account for that in your analysis.

ADD REPLY
0
Entering edit mode

Well, I read that they are considered as contaminants and are present in the overexpressed sequences https://wiki.bits.vib.be/index.php/RNA-Seq_analysis_for_differential_expression

But anyway, if the reads are not present in the .gtf annotation file, which I use for counting, can they affect the counts?

ADD REPLY
0
Entering edit mode

They are present as a large fraction of the total cellular RNA (e.g. rRNA can be upwards of 95% of total) so unless you are studying rRNA they are not of direct interest. So that is the reason people try to eliminate them when possible. Their presence will affect counts indirectly, depending of what fraction of the total counts they consume. Consider the following toy example.

Type                 Sample 1               Sample 2
mRNA                 150                      105
rRNA/contaminants    10                       75
ADD REPLY

Login before adding your answer.

Traffic: 1913 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6