Hello all,
I have a viral genomics question for you! I am analyzing an RNA-Seq library comprised of pooled RNA samples from bumble bees across the United States, in order to quantify the diversity of viruses that infect these populations. I assembled the RNA-Seq reads into contigs using the de novo assembly option in CLC workbench, and after searching for viral contigs using BLAST, found one novel virus candidate. From the BLAST search, I know that the candidate is closely related to a mosquito virus family. Based on an alignment and search in the NCBI conserved protein domain database, I roughly know what size and what protein families the virus is likely to have. However, because it is a new virus, I have no reference to check if I've obtained the full genome. After aligning it with its close relatives, the novel virus contig is roughly a third of the size of the other related viral genomes, indicating that this contig probably does not represent the complete genome.
To solve this issue, I figure that I need to redo the assembly with a pipeline that is more sensitive to recovering viral genomes as opposed to CLC workbench. However, I'm not sure what the best way to proceed is. Is there a particularly good de novo assembler for obtaining complete viral genomes?
I would like to know if there is a way to obtain the complete genome of the novel virus from the RNA-Seq reads given that I approximately know its size, its close relatives, and what conserved proteins it should have given my phylogenetic analysis is correct? Is there a way I can use a close relative to map the reads onto, despite not having an exact reference to use?
Any suggestions on how to proceed would be greatly appreciated! Thank you in advance, Brianna
What all is expected to be in the sample you sequenced? Bee RNA + RNA viruses + ?
Yes, we expect to find bee host RNA, plant RNA and RNA viruses (typically from pollen, though some plant viruses do infect bee hosts), and insect specific RNA viruses.