Standalone Blast Options
3
5
Entering edit mode
14.1 years ago
Carol ▴ 130

Hi all , I'm using local BLAST to retreive the single top most hit invoking "-v" option of blastall/RPSBLAST, from the local database sequences. I'm getting the top hit in most of the cases but in some cases there are more than one HSP fetched which are mostly the repeat of the same query sequence.Can anybody advise me any option of BLASTALL/RPSBLAST to restrict this redundancy. Thanks in advance.

blast • 9.9k views
ADD COMMENT
9
Entering edit mode
14.1 years ago

If you're using the XML output for BLAST, the following XSLT stylesheet only prints the first hit.

usage:

    xsltproc --novalid firsthit.xsl blast.xml

Of course, you can easily modify the rule 'select="Hit[1]""' to match your needs. eg.

    select="Hit[number(Hsp_score)>100]"
ADD COMMENT
7
Entering edit mode
14.1 years ago
Neilfws 49k

I don't think there is an option to blastall/rpsblast which will fix this issue. This usage guide states, for the -b flag, that "This is not the number of alignment segments or HSPs, since a given domain may have more than one portion aligned to the query."

You could get a list with only the top hit, ignoring the composite HSPs, by parsing the BLAST output. Using the SearchIO library from Bioperl, something like this should work:

#!/usr/bin/perl -w

use strict;
use Bio::SearchIO;
my $searchio = Bio::SearchIO->new(-file => "myblastfile", -format => "blast");

while(my $result = $searchio->next_result) {
  while(my $hit = $result->next_hit) {
    my @output = ($result->query_name, $hit->name, $hit->raw_score,
                  $hit->bits, $hit->significance);
    print join("\t", @output), "\n";
  }
}

This is just an example with some selected BLAST statistics (raw score, bit score etc.); see the documentation for how to access other parts of the BLAST report.

ADD COMMENT
3
Entering edit mode

It's not really a bug. If there is no "best" HSP (since they're identical) then in effect, they are all the "top hit". If the raw output isn't what you want, the solution is to parse it.

ADD REPLY
1
Entering edit mode

Thanks for the advice.Same output can be retrived using blastall option m-7 which generates output in xml format and can be viewed using MS excel.However the redundancy still remains.I think this is bug with BLAST and should be improved.

ADD REPLY
1
Entering edit mode

I agree with peirre and neilfws that parsing is the only option to remove composite HSP's. Thanks for all your suggestions.

ADD REPLY
4
Entering edit mode
14.1 years ago

If you want to reduce redundancy of your hits, you may pre-process your target database using CD-HIT. For example you may use a threshold (say 40%, so that no 2 sequences in your dataset will be of more than 40% similar). Depending up on your need you may use a stringent threshold (<=40%) or lenient (>=40%). For the statistical detail of the CD-HIT algorithm you may refer to the following papers (1, 2 and 3)

ADD COMMENT

Login before adding your answer.

Traffic: 1545 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6