So, I have been reading a lot about host removal with viral data. More than one group is just using BLAST to their specific host to remove possible contamination.
How does that work? I know how to download a genome from NCBI. I have ncbi tools on my machines and can create a custom db using only the my host(s) of choice. However I'm fuzzy on how that actually removes them from the read pool.
Can BLAST take all of your reads and only output the reads that have no match or is it more of piping the results to a file and removing everything that has a good match via a script?
I understand BLAST is a slower way of doing this so what would be the advantage of this say over BBsplit (from the BBMap) that can map to multiple references at once? Or just concatenating all the host/viral dna/rna into one file and mapping to it?
Thanks, that looks interesting :)