How to tell viruses/bacteria found in human brain Rna-seq samples are intrinsic or contamination?
2
0
Entering edit mode
8.7 years ago
Tao ▴ 540

Hi guys,

I find lots of viruses in human brain Rna-seq samples. I'm not sure whether those viruses are due to sample pollution or intrinsic virus infection. How to remove contaminations? Any idea is appreciated.

I list top10 viruses as follows, based on how many sample has this virus:

Shamonda virus
Equine infectious anemia virus
Choristoneura occidentalis granulovirus
Murine leukemia virus
Cafeteria roenbergensis virus
White spot syndrome virus
Murine osteosarcoma virus
Tomato mosaic virus
Human herpesvirus 5
Autographa californica multiple nucleopolyhedrovirus

RNA-Seq virus bacteria contamination human • 3.6k views
ADD COMMENT
0
Entering edit mode

Hmm interesting, could you maybe share what you did and how you end up with these viruses from human brain RNAseq data.

ADD REPLY
0
Entering edit mode

What do you mean by "find"?

For example, I use DIAMOND/KRAKEN a lot on my metagenomic data sets, and I will always get a number of reads hitting specific viruses - one of those is Shamonda. If you then look at the reads that are giving the hits, then they tend to be low complexity reads - things like AAAAAAAAAAAAAAAC in the read and it just so happens that that kmer is found in that virus. These are errorenous hits as that is the only part of the virus that is found, and is

If however, you mean that you find full genomes or large contigs of these viruses in your sample, check you get reads mapping back onto them, but there is no easy way to distinguish between contamination and real. I would be surprised if you got good contigs of all those viruses in a human brain sample.

ADD REPLY
1
Entering edit mode
8.7 years ago
GenoMax 147k

It is unlikely that a normal human brain sample would have all those viruses (unless you are picking up some fragment(s) that is common to all these viruses that matches something in human genome).
That said, you can use BBSplit from BBMap to bin reads that map to these viral genomes.

ADD COMMENT
0
Entering edit mode

Thanks for your advice. I will try.

ADD REPLY
1
Entering edit mode
8.7 years ago

One possibility is that you are mapping to some type of vector. These are sometimes derived from viruses. You could try to filter against Univec or some other vector database. You could (although it is dull) try to go through some of these alignments and see if they are all stacking up to a certain region of the genome, or if they are just hitting the poly-A tail or something. These things can be quite tricky.

ADD COMMENT
0
Entering edit mode

excellent! I will try to filter against vector. Thanks so much.

ADD REPLY

Login before adding your answer.

Traffic: 2183 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6