I want to use an R script to blast sequences to the NCBI databases online.
This post already explored the subject , but all the tools proposed (rBLAST, BLASTr, and metablast) use local databases to do the Blast.
So it does not look possible to do a blast on NCBI online databases using R script, or I have not found this information. Did you come across such a thing?
Keep in mind that "remote blast" is not supposed to be used as a replacement for local blast for large amount of sequences. You may get errors or at worse you may get IP banned.
At least BLASTr seems to be quite comprehensive from looking at the code. It supports the remote option. I haven't tried it but it looks like you can call blast like in the usage example:
blastn(query, db = "nt", out = NULL, outfmt = "xml", max_hits = 20, evalue = 10, remote = TRUE, ...)
You just need a local installation of command line blast tools. If you want to define your workflow in R, it might be worth installing and trying it.
The only problem is that the author didn't assign an open-source license or any proper terms at all. So, in principle, you cannot use or modify it for anything public.
EDIT: Thinking about it, the lack of proper license terms is actually a problem here, in the DESCRIPTION file it mentions MIT though. I would anyway ask the author for a clarification if it is useful and you want to use the code in a project.
Thanks very much for showing this option I had missed, and to have noticed the lack of clarity about the license. It is a problem and I will indeed ask to the author for clarification.
I just looked at the documentation, but there might be a discrepancy.
How did you invoke blastn?
You can also always invoke blastn via system though. However, I think that managing your work-flow through R has considerable downsides as there will be significant overhead for controlling environments, dependencies, resource control, and re-entrance (re-running of failed or outdated processing steps). I suggest you implement your workflow in Snakemake instead, using conda environments to install blast and possibly the databases locally. And then let the R script become a single step of your workflow.
Thank you for the advice @michael. I do not have the choice of the tool to use in this case, and using a single R script has been requested by my colleagues. I wish I could use Snakemake.
However I can follow your and GenoMax option and use local databases. I had many issues with the -remote flag and will not use it anymore.
Keep in mind that "remote blast" is not supposed to be used as a replacement for local blast for large amount of sequences. You may get errors or at worse you may get IP banned.
It's possible to remotely send blast commands and to receive results with SequenceServer blast. We've made examples for blasting from within python. And for remote blasting in the unix command-line.
The same concept (i.e. server-side API) should be accessible from R too. But we have yet to make example code for R.