Hi everyone: I'd like to know if is there any way to perform blast analysis on the internet using UniRef90 (UniProt) as database in biopython? Thanks!
Hi everyone: I'd like to know if is there any way to perform blast analysis on the internet using UniRef90 (UniProt) as database in biopython? Thanks!
The official BLAST against UniRef and all UniProt data is hosted by the EBI this is actually the API that UniProt.org uses as well.
You can find the documentation in the EBI services pages. If you do wish to run 7000 jobs please be considerate and run one after the other.
This is quite old, but if time / speed is an issue, Diamond is an accelerated BLAST aligner that is very fast. I've used it with a downloaded UniRef90 database and it works both for local testing & development (2015 Macbook Pro) as well as on the production servers. I can share code with you if you want.
UniProt offer a webform for BLAST here, http://www.uniprot.org/blast/ - but this is as far as I know for use 'by hand' only rather than scripting from something like Biopython. I'm not aware of anyone else hosting a free to use BLAST service against UniProt.
The alternative would be to download the UniProt database (in FASTA format), make a local BLAST database with makeblastdb, and then search it locally with blastp (from the NCBI BLAST+ suite). Your local Linux systems administrator may already have local copies of important BLAST databases like the NCBI NR database, and could add the UniProt databases for you perhaps?
How many query sequences do you have? Even the NCBI online BLAST service is not suitable for large numbers of queries, and best done locally (on your cluster if needed).
Thanks Peter. Actually I have downloaded UniRef90 database (more than 7 Gb unzipped) on my computer, I created the database, but it takes too long for my computer to perform blast analysis for one single query. So I was wondering how can I do this using the uniprot server from biopython. I have already did this analysis using the nr database from ncbi for more than 7,000 sequences, using biopython.
I doubt the NCBI would be very happy with your for BLAST'ing 7000 sequences against the NR like that, but at least they do officially expose this service. I've asked UniProt via Twitter if they have an official API, https://twitter.com/pjacock/status/383191702322020352
Jerven from UniProt has replied (as an answer to your question), https://twitter.com/jervenbolleman/status/383239462475399168
If it helps UniProt BLAST has an undocumented API through the GET-requests. And it actually works quite stable. Here is the sample URL: http://www.uniprot.org/blast/?query=TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN&dataset=uniprotkb&threshold=10&matrix=&filter=false&gapped=true&numal=250 With this link you start a job and the server sends you its ID (20130926517MACV51A). The status of the job is located by the URL like that: http://www.uniprot.org/blast/uniprot/20130926517MACV51A.stat.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thanks very much. I'll be considerated.