What I am trying to do is blast about 50 RNA sequences against the genomes of various organisms (All large genomes for vertebrates, other animals). I am trying to do this locally because it takes an enormous amount of time otherwise.
How I am trying to do this is by running update_blast.pl (a script provided in blast+ to locally download databases via ftp).
The problem is this is taking very long and the files that I am downloading are taking up a lot of space. I'm wondering if there is a more sensible way to do this because the databases are taking up about 100 gb on my computer right now.
Heres how I am running it (If I don't adjust the timeout, it usually doesn't finish)
update_blastdb.pl --timeout 800 nt.
I am also downloading
refseq_genomic
nr
And have completed
refseq_rna
I will want to perform these blast searches for multiple organisms, but each individual blast search will be against one organism
On a side note, I've noticed if I need to ^C certain downloads and restart them, they will finish but not otherwise
What is the reason for doing that? What kind of RNA sequences are these? Are you trying to identify what genome those 50 sequences are from or the actual identity of the genes?
The genes are all human, I'm trying to determine if there is a significant match for each gene in a number of other organisms as determined by the e value.
If you want to do this as practice then great but NCBI/EBI has likely done this work for you. You can check NCBI's homologene section to access multiple alignments or alignments available in UCSC genome browser as a track.
Thanks, yeah it is basically for practice right now, I'm going to be making adjustments in the future though so I need to do it this way
Depending on your goal, there may be faster solutions than BLAST. BBMap's SendSketch tool can taxonomically classify an organism in a few seconds, depending on the genomic size; it can compare your data to nt, RefSeq, and Silva for that purpose. You don't need to download any big files.
As Genomax asked... what are you trying to accomplish? Also, what kind of data do you have?