I would like to implement BLAST module into my bash-based pipeline.
Based on multi-fasta query and a genome of interest I run BLAST (managed with makeblastdb for genome and then blastn with output as a table). The next step is to extract hits with 100bp upstream and downstream and save them as fasta file. As far, I rearrange the output table from BLAST to get new sstart and send positions, as well as the information about strand orientation (3 last columns).
After rearrangement the table looks as follow:
queryid subjectid identity sstart send strand new_sstart new_send
MRP CM008973.1 100 6219401 6219677 plus 6219301 6219777
SRP CM008972.1 100 1943089 1942802 minus 1943189 1942702
As @lieven.sterck suggested I may use now blastdbcmd
.
If I understand correctly, it is possible to fetch the table with entry_batch
I modified the table to get format as required by command:
ID range strand
CM008973.1 6219301-6219777 plus
CM008972.1 1943189-1942702 minus
But when I use the blastdbcmd -db ./database/database.fna \
-entry_batch ./test.txt
I get error:
Error: MRP: OID not found
Error: SRP: OID not found
blastdbcmd -entry_batch
is not compatible withrange
. Which is what you are looking to do by providing a start and stop. You will need to extract these entries individually.Okey, so I understand I would need to put the
blastdbcmd
command into a loop over the rows of a table? Am I right?Then my table should looks like this?
BTW Primarly I was suggested by this:
And that's why I modified the table to have those 3 columns: ID, range and strand.
A command like following should work and would be straight forward :
Edit: Based on @Lieven's comment below (and program message you posted above)
-entry_batch
may be supported in spite of what the in-line help says. Your input file does not seem to be in the right format though.as @genomax already mentions, not sure if this will works as you hope.
in any case, your lines need to start with the hit ID (subject ID). The error you report does not fit with the file you described as input (the MRP and SRP seem to be your query IDs, not your hit IDs)