Question

Where does my RNASeq contamination fits on the Tree of Life?

0

Entering edit mode

7.8 years ago

Biomonika (Noolean) 3.2k

I have RNASeq of algal cultures so my samples are not really axenic. Instead, the presence of some bacteria or fungus is to be expected. I assembled my reads with Trinity and now I would like to estimate the origin of each individual contig. Ideally, I would like to get visualization of where the contamination is coming from on the Tree of Life as a:

quality metrics that the origin of my contamination makes sense (and I see what I expect to see for algal cultures)
to remove contaminants and "clean" the assembly

Is there any tool that could do this for me?

I started by automatically outputting the "best" blast hit for each contig, but I am getting large variety of the hits and I am not sure how to summarize them or properly assign them phylogenetically.

Thanks for help.

RNA-Seq rna-seq contamination species algae • 1.9k views

ADD COMMENT • link 7.8 years ago by Biomonika (Noolean) 3.2k

2

Entering edit mode

NCBI has a new ref_prok_rep (representative prokaryotic genomes) pre-made blast database available. Since you have assembled sequences you could do a quick blast against that to see if you can find any low hanging fruits in terms of identification.

ADD REPLY • link 7.8 years ago by GenoMax 147k

0

Entering edit mode

Why not start with filtering out reads that can be mapped to known bacterial/fungal species?

ADD REPLY • link 7.8 years ago by pld 5.1k

0

Entering edit mode

Can you point me to such list/database?

ADD REPLY • link 7.8 years ago by Biomonika (Noolean) 3.2k

0

Entering edit mode

I'm a big fan of Kraken for screening against contamination, the program assigns a taxid to each read, with a little leg work you could filter off of that. If you use the kraken-translate tool you should be able to get the whole taxonomy for each read and filter there using keywords. E.g. get a list of reads with the word "bacteria" in their kraken-translate entry, then toss all of those reads from your reads.

https://ccb.jhu.edu/software/kraken/MANUAL.html#output-format

I am honestly not sure which is better: clean before assembly or after assembly.

ADD REPLY • link 7.8 years ago by pld 5.1k