Hey all,
I was wondering if there was a way to use an alignment as a local blast database. My problem is as follows: I have an old alignment of concatenated protein sequences that have CHARSET definitions defining the protein beginnings and endings at the bottom of the NEXUS file. Some folks doing method development type stuff created a method that does a different sort of alignment and they also removed some of the sequences from the alignment, changing the length of the overall alignment. Thus, the CHARSET definitions no longer delimit the genes. But I need them to for a downstream step. I have all the protein sequences, both as they were in their old alignment and also just the sequences themselves. My natural thought was to make a BLAST database out of the alignment and query the individual sequences to get an answer. But I don't seem to be able to make such a local database the old fashioned way. Is there a way with BLAST?
Barring that, I think MAFFT has experimental 'addsequences' and 'addfragments' functionality. But they are terribly slow and I don't know if that's appropriate for my goal. Anybody have any insight? Is MAFFT a reasonable tool for this? Is there a BLAST method? Perhaps a more traditional Comp Sci string based distance minimization approach comes to mind? I really appreciate any help/insight you all can offer. Ideally, I'd do the alignment traditionally, but, like I said, this is downstream of some folks working on some algorithmic method development--they haven't implemented a way to keep track of this stuff quite yet.
Best!
makeblastdb
is the current way of making local blast databases (run the command with-help
flag to get inline help). If you are familiar with BLAST you should be able to pick up the differences and run with BLAST+ easily.Well, that is what I consider the old fashioned way. But in any case, you've completely missed the spirit of my question. Can't make a blast database with an alignment that way--the help flag doesn't specify such a method, at least.
https://ibb.co/cSLNKF https://ibb.co/kmcf6v
If you want to make a blast database out of an alignment you will need to take out the gaps from that fasta file. You could easily use sed to replace the - with nothing.If you wish to preserve the alignment then you will have to use the add sequence/add fragment method with a multiple sequence alignment program as you noted above.While I thank you for your response, you again are missing the spirit of the question. The gaps are necessary. It would not be an alignment if I removed the gaps, nor would it inform me to the delineations of the protein sequences in the alignment.
See the modified comment above. Someone else may be along with a new comment/answer.