Hi,
I'm thinking of splitting the database to smaller chunks. And, blast my sequences against them each on a separate process. My only concern is the results (which I will merge later).
Would the resulting e-value be affected by database content when smaller subsets are used? I have a hunch that it would not matter when all the subset results later becomes concatenated. Please correct me if I'm wrong.
Totally agree with the answer but you can set manually the database size using the parameter "-z". On that way, you can split the db file into smaller pieces, make your queries and then merge results.
I think you also need to set the number of sequences in the database (N) to calculate the edge adjustment parameter (l or "ell"). The adjustment is done for you if you use NOBLAST
try grep -v '^>' something.fasta | grep -o [ACTGNactg] | wc -l for fasta files before building database
thanks much appreciated
Oops my hunch was wrong. Anyways any easy way to count the number of letters, N (total letters) of a database?
Please validate if the nr database atm is 5784003470 letters in size
When I said "database size" I refered to the total number of sequences in your database, I didn't refer to the total number of residues on it.