Hi,
I have developed a software for finding local alignments of a sequence into a larger database. Like BLAST but making use of a different technology.
I have found that it apparently has sometimes some advantages over BLAST. It also has some disadvantages given that it's still less developed and not running on the best hardware.
Some apparent advantages are:
- It finds different results than BLAST. Combining BLAST and this tool you can get a more complete set of results.
- The results often get a better score than BLAST results. Sometimes BLAST finds subsequences that match partially the seeked sequence but with awful coincidence in the rest of the sequence.
- It seems to me better suited for making alignments of homolog or related sequences given that it doesn't focus primarily on partial results but on the best local alignments for the full sequence.
For example: this is an alignment of a seeked sequence (1st), the best result found by my software (2nd) and the best BLAST result (3rd):
GGCCGGGC-G-CGGTGGCTCACGCC-TGTAATCCCAGCACTTTGGGAGGC-CGAGGC-GGGCGGA--TC-ACGAG-GTCAGGAGATCGAGACCA-TCCTGG-C-CAACACG-GTGA
G-CCGGGCGGT-GGTGGCTCACGCCTT-TAATCCCAGCACTT-GGGAGGCA-GAGGCAGG-CGGATTTCT--GAGT-TCA--AG---G---CCAG-CCTGGTCT-A-CA-GAGTGA
GGCCGGGC-G-TGGTGGCGCACGCC-TTTAATCCCAGCAC-TTGGGAGGC-AGAGGC-AGGCGGA--T---------TTCTGAGTTCGAGGCCA-GCCTGG-T-CTACAAA-GTGA
Clearly the first result (the one found with this tool) is a better match and you would miss it if only using BLAST.
Another example. If you make a search for an homolog of the BRCA2 human gene in the chimpanzee genome you will find that with BLAST (despite not always returns the same):
Chromosome 13 Range: 13386740 to 13413300
Chromosome 9 Range: 109512074 to 109513130
Chromosome 15 Range: 29133481 to 29134527
Meanwhile, with my algorithm you will find these results in chromosome 13. The first one corresponds to the official homolog. The first result from BLAST also seems to match the right sequence, but the sequence position does not match the reference genome. The second result from my algorithm is a better alignment than the alternative ones found by BLAST. There are indeed many other better results in other chromosomes. For example here.
My question is. Does that software worth that I continue developing it? Is there a need for alternative BLAST results? Is it really better in some cases or I am missing some BLAST parameters that would improve the results? Is there a need for a tool for finding complete local alignments (not only subsequence alignment as BLAST does)?
Thanks.
BLAST is not a global alignment heuristic (it's in the name: Basic Local Alignment Search Tool) so if that's what you're trying to do, that's the wrong tool to use for comparison. To be a fair comparison, the implementations you compare should use the best settings for the case at hand. If you optimize the parameters for your tool but choose bad ones for the others then the comparison is worthless. You may also want to compare with other tools that implement other approaches such as exonerate and baseline algorithms like Needleman-wunsch.
I'm not trying to do a global alignment. You can see that I copy-pasted a local alignment. But I am looking for local alignments of the full sequence, not partial local alignments. Unfortunatelly I have not compared it against all possible parameters of BLAST nor using all possible parameters of my algorithm. That could take me months. I don't either know what uses all people do of BLAST. Maybe it's useful for some people but not for me. Anyway, after a filter phase the alignment is done using Needleman-wunsch. That's not the important point.
No you can not see that. If some one really need to guess what kind of alignment it is I think most people would say a multiple sequence alignement because it is more then one sequence. Maybe your tool works good but you explain it wrong now.
You correctly understood what I was saying... And a multiple alignment is still a LOCAL alignment. That can not in any way be a global alignment to a reference genome. Come on...
"come one..." I am not going to start a discussion but you really need to look up what the difference between local and global is. Think you also need to look up that there is a difference between pairwise and multiple alignment. And you are the one that posted this on this forum... Giving little and weird sounding information. And on every reply you acting pretty rude.
EDIT:
Just saw your reply explaining local to some one else and saying that blast is not really local. So yes, my reply stands and you need look up some stuff.
Don't know if I understand but the first sequence was your input (query), the second is the best hit of your tool and the third was the best hit of blast?
So "the first result" is the second sequence, I don't see why this hit is clearly better can you explain that?
And not completely fair (apples and oranges) but if I do a global alignment, the blast hit has a higher identity.
That's right.
I say that the second sequence is a better alignment to the first one because with any scoring method that you use it gets a better score. The Levenshtein distance of the first match is 34 and for the second it's 40.
Sorry, I don't get what you mean with your last sentence...
I don't get those scores and I used this website: https://planetcalc.com/1721/ and this one http://www.unit-conversion.info/texttools/levenshtein-distance/
I also don't think this is the right way to test something like this. Also if you want to publish something which this looks like you need to give much more information. A key thing is that others need to be able to reproduce it. If this is not "publishing" maybe you need to add to your post what the goal is. It also does not look like a question.
This is also confusing, this almost sounds like you are talking about a mapping tool.
I'm with gb on this. I don't see why 'your' hit is a better one than the blast one ?
of course if you use some weird/wrong/other/... scoring schema you will see score differences. The one blast uses are on the other hand well established ones based on empirical observations.
I would say yes (though being a big blast believer as well) but mainly on the speed side of things, not really on the quality of results returned by blast.
I think it might also be worth pointing out that blast is a search tool (cfr "google for sequences") NOT an alignment tool! so in that sense there are others that do a much better job at creating the best/good alignment but those come with a tremendous 'cost' being much slower than blast.