How to perform BLAST against segmented database
0
0
Entering edit mode
8.2 years ago
User 6777 ▴ 20

Hi all,

Sorry for this long question, but I have facing this issue due to my hardware limitations(I am using windows 7 machine (32 bit) with 4 gb of ram).

I have a random number (and with random name) of .fa files within a folder named 'seq', each of which containing only a single fasta protein sequence, as:

NP_4500.1.fa
NP_4568.1.fa
NP_45981.3.fa
XM_we679.fa
36498746.fa

in another folder named 'db', I made a database fragmented in 200 segments (due to my computational limitations) which are arranged as:

hg.part-001.db
hg.part-002.db
hg.part-003.db
..
..
hg.part-200.db

now I want to run usearch of each sequence against the fragmented database and generate fragmented result, as for one fa file (NP_4500.1.fa):

usearch -ublast ./seq/NP_4500.1.fa -db ./db/hg.part-001.db -evalue 1e-10 -accel 0.5 -blast6out NP_4500.1_part-001.out
usearch -ublast ./seq/NP_4500.1.fa -db ./db/hg.part-002.db -evalue 1e-10 -accel 0.5 -blast6out NP_4500.1_part-002.out
usearch -ublast ./seq/NP_4500.1.fa -db ./db/hg.part-003.db -evalue 1e-10 -accel 0.5 -blast6out NP_4500.1_part-003.out
...
...
usearch -ublast ./seq/NP_4500.1.fa -db hg.part-00200.db -evalue 1e-10 -accel 0.5 -blast6out NP_4500.1_part-00200.out

After that, I want to merge the results in a single file as:

join NP_4500.1_part-001.out NP_4500.1_part-002.out .. NP_4500.1_part-00200.out > NP_4500.1.out

similarly for next seq:

NP_4568.1.fa

...

Now, I can run a cmd script for each fasta fike as:

for %%F in ("*.fa") do usearch -ublast ./seq/%%F .......

But my question is, how can I integrate this command with each of the fragmented database and merge the .out files to generate result for a single sequence before proceeding to the next.

I can use cmd, perl or python script. Thanks for ur consideration.

cmd perl python • 2.1k views
ADD COMMENT
0
Entering edit mode

Apart from the original problem that should be solvable by a batch script, I would consider to simplify your life. I propose you can spare yourself a lot of hassle by upgrading to a better computer. A few aspects that make your setting much more difficult than it had to be:

  • Windows and cmd, cmd is not specifically powerful or easy to use for scripting (when compared to bash)
  • using usearch free 32 bit version (instead of NCBI blast+), which requires the split of databases, I do not understand fully why you have to split the db really or what you are trying to search, are the data so big? I think with NCBI blast you don't need much more than 4GB of RAM for a blast search against even NR.
ADD REPLY
0
Entering edit mode

Thanks for reply.. I'll upgrade my machine soon, but for now i need to split the db as -makedb in 32 bit usearch cant handle my uniref database (20 gb). And I am avoiding ncbi-blast simply because it is too slow for my requirement (vs ublast)

ADD REPLY

Login before adding your answer.

Traffic: 2490 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6