Entering edit mode
4.1 years ago
Mania
•
0
Hello biostars!
I'm working on a viral metagenomics classification project and I would like to ask what is the cutoff, the minimum number of reads that should be present in a sample, in order to report it as positive. I'm searching information in the literature, but I haven't found anything that could help. Still working on it. Thanks
This question has no single/correct answer. If you are working with short reads it is possible that they will align with multiple viral genomes due to sequence similarity that may be purely by chance.
That said here is a paper that used one definition: https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-020-6483-6
You can use that definition. The problem is that viruses are very diverse so it will give a looooooot of false negatives and a lot of false positives too. Set the threshold quite high to be sure that your annotation is correct and if you want to go deeper you will probably have to assemble and classify using existing tools.
Thanks for your quick response! Let's say I have downloaded a fastq file of a sample, which is positive for Enterovirus A (qPCR). When using metagenomics classif. workflows, like Kraken, the tool successfully classifies the virus as Enterovirus A, however reports a very low number of reads (<10, even 1 read). Of course, this is not a results you can trust, however it is consistent with qPCR outcome. In case I didn't know qPCR outcome, I would have considered metagenomics analysis output as false positive. There must be a range of viral reads numbers, to which we can rely on to decide whether the virus is trully present in the sample