You can use the remote flag to run the query on the nt database without having to download it. Otherwise you have to download it, or at least a portion of it.
Yes, you should have a FASTA file of the sequences you intend to search with. BLAST will query with all of the sequences in the input file.
ADD COMMENT
• link
updated 4.9 years ago by
Ram
44k
•
written 8.9 years ago by
pld
5.1k
1
Entering edit mode
You can also download subsets of nt. So if you only need certain species for instance. Just be aware that this can affect your e-values as they are based on database size. But if you are going to be doing a lot of high-throughput BLAST analyses you really should just download the database. It will go much, much faster than using the remote option. Not sure what the size of NT is these days but it has always been worth it in my experience. All through my PhD we maintained our own local versions of blast databases, including some custom subsets because it was just so much faster than anything else.
Yes. We always found it useful (at the time) to just feed arbitrarily large database sizes (we all used the same number at the time), because we frequently needed to do cross database comparisons.
ADD REPLY
• link
updated 4.9 years ago by
Ram
44k
•
written 8.9 years ago by
DG
7.3k
0
Entering edit mode
Little note about using local fasta files: you can either have the databases generated from fasta files in the working directory or in a separate directory that the environmental variable BLASTDB is set to include.
ADD REPLY
• link
updated 4.9 years ago by
Ram
44k
•
written 8.9 years ago by
novice
★
1.1k
0
Entering edit mode
Sure, you can specify the path to the file, just like the vast majority of programs.
I just edited my comment. After -db you need to specify a database name, not a specific file name. The database name prefixes the several database files. These files are searched for in the environment variable BLASTDB. If BLASTDB is not setup, they're searched for in the current directory that you ran the blast command from. You cannot provide the path to the database on the command line, as you can do with the majority of other programs.
ADD REPLY
• link
updated 4.9 years ago by
Ram
44k
•
written 8.9 years ago by
novice
★
1.1k
0
Entering edit mode
Yes, you can. As you said, just the name of the database, not any specific file comprising the database.
For reference: I was wrong about this. I said you can't because I tried it and it gave me an error. Turns out the error was due to a space in the name of a directory (at least I know blast won't let you escape these). I tried changing the directory name and it worked.
Sorry but I can't understand very well! This is my first time with blast and bioinformatic!
I have to blast something like 1000 sequences so it is not a big dataset but I have not a cluster or a server so I don't know if the nt database is too much heavy for my computer.
I can put in -db a name of a blast database and automatically blast tool will search it on internet? Or I have to put in BLASTDB and it finally do the blast with all nt database? Or download the database is indispensable?
BLAST isn't that resource heavy, especially for 1000 input sequences. If you're only doing this the once then running with the remote option might be a good idea and saves you having to do as much work on your end, but it will be slower. If you are going to start doing BLAST analyses fairly routinely you really should just download it for local use. You'll find it much less headache. I don't know how big NT is these days in terms of the file size, but you don't need a very fast machine to do BLAST.
You can also download subsets of nt. So if you only need certain species for instance. Just be aware that this can affect your e-values as they are based on database size. But if you are going to be doing a lot of high-throughput BLAST analyses you really should just download the database. It will go much, much faster than using the remote option. Not sure what the size of NT is these days but it has always been worth it in my experience. All through my PhD we maintained our own local versions of blast databases, including some custom subsets because it was just so much faster than anything else.
You can also manually adjust the search space size in cases where the database size has changed.
Yes. We always found it useful (at the time) to just feed arbitrarily large database sizes (we all used the same number at the time), because we frequently needed to do cross database comparisons.
Little note about using local fasta files: you can either have the databases generated from fasta files in the working directory or in a separate directory that the environmental variable BLASTDB is set to include.
Sure, you can specify the path to the file, just like the vast majority of programs.
I just edited my comment. After
-db
you need to specify a database name, not a specific file name. The database name prefixes the several database files. These files are searched for in the environment variable BLASTDB. If BLASTDB is not setup, they're searched for in the current directory that you ran the blast command from. You cannotprovide the path to the database on the command line, as you can do with the majority of other programs.Yes, you can. As you said, just the name of the database, not any specific file comprising the database.
For reference: I was wrong about this. I said you can't because I tried it and it gave me an error. Turns out the error was due to a space in the name of a directory (at least I know blast won't let you escape these). I tried changing the directory name and it worked.
Sorry but I can't understand very well! This is my first time with blast and bioinformatic!
I have to blast something like 1000 sequences so it is not a big dataset but I have not a cluster or a server so I don't know if the nt database is too much heavy for my computer.
I can put in -db a name of a blast database and automatically blast tool will search it on internet? Or I have to put in BLASTDB and it finally do the blast with all nt database? Or download the database is indispensable?
Thank you very much
BLAST isn't that resource heavy, especially for 1000 input sequences. If you're only doing this the once then running with the remote option might be a good idea and saves you having to do as much work on your end, but it will be slower. If you are going to start doing BLAST analyses fairly routinely you really should just download it for local use. You'll find it much less headache. I don't know how big NT is these days in terms of the file size, but you don't need a very fast machine to do BLAST.
I don,t have to do this analysis routinely so the remote option is the best for me but...how can a I run BLAST in remote? This is the question!
Thanks
One of the command-line options is -remote
thak you very much! I,ll try