Hello everyone,
I'm new to this type of analysis, so please bear with me. I'm currently conducting research to detect a specific virus, which I'll refer to as virus X, in bulk RNA-seq data obtained from human tissue samples. Virus X is known to cause latent infections in humans, meaning that traces of the virus are low, and infected individuals typically do not exhibit any symptoms.
I'm considering various approaches to determine the presence of virus X in my samples. At the moment, I'm leaning towards aligning the reads to the human reference genome first and then extracting the unaligned reads. Then I plan to align these unaligned reads to the reference genome of virus X. Would this approach be more effective than directly aligning my reads to the reference genome of virus X, or perhaps using a combination of both virus and human references? Additionally, how should I analyze the results of the reads that do map to the virus genome to confirm if they originate from virus X?
Thanks!
Already published pipelines that can help:
VirDetect: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6986043/
VIRTUS: https://academic.oup.com/bioinformatics/article/37/10/1465/5918022