Question

RNA viruses - analysis of assembled sequences?

1

Entering edit mode

6.4 years ago

Guillermo D. Huerta ▴ 10

Hi all,

this one is a (I guess) tricky question...

RNA virus discovery from metagenome/metatranscriptome dataset (overall from environmental samples) is particularly difficult because of their VERY DIVERGENT genome sequences, with poor relationship with what is available in reference sequence databases.

Can you recommend a "typical" protocol for this?

I found 2 "versions" by now:

#FIRST PROTOCOL# - Assemble reads with Trinity or metaSPAdes. - Do tBLASTn with the generated contigs/scaffolds against a database made of RNA virus proteins (ssRNA and dsRNA viruses). Use an e-value cutoff of <=10-3. - All candidate contigs screened by the previous step are queried against NCBI RefSeq db using BLASTx. - Only contigs with topmost hits to viruses are kept. - Binning to distinct viral groups according to their best blast hits.

#SECOND PROTOCOL# - Assemble reads with Trinity or metaSPAdes. - Do BLASTx with generated contigs/scaffolds against a database made of RNA virus proteins (ssRNA and dsRNA viruses). Use an e-value cutoff of <=10-5. - All candidate contigs are converted into proteins with Prodigal. - The proteins are queried against CDD blast (0.01 cutoff) to look for conserved domains. - Keep the contigs containing domains of RNA-dependent RNA-polymerases or reverse-transcriptases. - Contigs containing those domains are queried against NCBI nr db using BLASTx to discard "false-positives". Only contigs with hits to viruses are kept.

Thoughts?

Thanks very much in advanced!

blast RNA virus environment next-gen discovery • 1.5k views

ADD COMMENT • link 6.4 years ago by Guillermo D. Huerta ▴ 10

0

Entering edit mode

I am currently trying to clean the set of reads prior to assembly (with Trinity, also trying Oases). I use centrifuge against nt and take only reads that are either classified as viral (very few), unclassified or not classified as the host. I have a couple of virus transcripts via blastx against nr, but the sequence divergence is a pain in the ass.

ADD REPLY • link 6.4 years ago by cschu181 ★ 2.8k