Hi guys,
I find lots of viruses in human brain Rna-seq samples. I'm not sure whether those viruses are due to sample pollution or intrinsic virus infection. How to remove contaminations? Any idea is appreciated.
I list top10 viruses as follows, based on how many sample has this virus:
Shamonda virus
Equine infectious anemia virus
Choristoneura occidentalis granulovirus
Murine leukemia virus
Cafeteria roenbergensis virus
White spot syndrome virus
Murine osteosarcoma virus
Tomato mosaic virus
Human herpesvirus 5
Autographa californica multiple nucleopolyhedrovirus
Hmm interesting, could you maybe share what you did and how you end up with these viruses from human brain RNAseq data.
What do you mean by "find"?
For example, I use DIAMOND/KRAKEN a lot on my metagenomic data sets, and I will always get a number of reads hitting specific viruses - one of those is Shamonda. If you then look at the reads that are giving the hits, then they tend to be low complexity reads - things like AAAAAAAAAAAAAAAC in the read and it just so happens that that kmer is found in that virus. These are errorenous hits as that is the only part of the virus that is found, and is
If however, you mean that you find full genomes or large contigs of these viruses in your sample, check you get reads mapping back onto them, but there is no easy way to distinguish between contamination and real. I would be surprised if you got good contigs of all those viruses in a human brain sample.