Entering edit mode
4.5 years ago
tanya_fiskur
▴
70
Hello! I have the next issue: on the first steps of my transcriptome analysis I didn't remove biological contaminates. I was completely new in all these things. Could it affect the differential expression analysis? I guess, they were mapped to the genome, and then not recognized by the annotation file while doing counts. Could it affect the power of analysis?
Not sure what you mean with "biological contaminant". A whole-body sample from a free-living multicellular animal or plant consists rarely of a single species only. So, some transcript sequences from symbionts might be expected. There shouldn't be a problem with that in the first place. Instead, you gain information about the host-symbiont relationship if you will.
It was a whole-brain sample and there were contaminants such as ribosomal RNAs, mitochondrial RNAs of the same species. I blasted them, but failed to remove, so they were retained.
How are they contaminants? You may not be interested in them specifically but they are part of normal cells.
Potentially. If these
contaminants
were not present at the same level across your sample set then the number of reads that were used for counting may have been different. If the difference is drastic then you would need to account for that in your analysis.Well, I read that they are considered as contaminants and are present in the overexpressed sequences https://wiki.bits.vib.be/index.php/RNA-Seq_analysis_for_differential_expression
But anyway, if the reads are not present in the .gtf annotation file, which I use for counting, can they affect the counts?
They are present as a large fraction of the total cellular RNA (e.g. rRNA can be upwards of 95% of total) so unless you are studying rRNA they are not of direct interest. So that is the reason people try to eliminate them when possible. Their presence will affect counts indirectly, depending of what fraction of the total counts they consume. Consider the following toy example.