Convert Blast Table With Hsps To Alignment
3
0
Entering edit mode
13.7 years ago
Empyrean ▴ 170

Hello

I have the following BLAST table format output:

GRMZM2G5555   GRMZM2G5555   100.00  5197    0       0       2886    8082    2886    8082    0.0     9880
GRMZM2G5555   GRMZM2G5555   100.00  1418    0       0       1       1418    1       1418    0.0     2769
GRMZM2G5555   GRMZM2G5555   100.00  1377    0       0       1445    2821    1445    2821    0.0     2688
GRMZM2G5555   GRMZM2G3333   99.13   1268    9       2       1445    2711    759     2025    0.0     2369
GRMZM2G5555   GRMZM2G3333   99.17   724     6       0       687     1410    1       724     0.0     1388
GRMZM2G5555   GRMZM2G2222   95.73   1310    23      8       4045    5333    647     1944    0.0     2068
GRMZM2G5555   GRMZM2G1111   96.41   668     15      5       4677    5342    8063    7403    0.0     1031
GRMZM2G5555   GRMZM2G1111   95.52   67      3       0       5428    5494    3703    3637    1e-21    109

I ran BLAST on a FASTA file against the same FASTA file. So the first hit will be itself. Now the BLAST output is providing several different HSPs. But I wanted all the HSPs to be combined and display as one, so that I can find the whole coverage of subject on query. Can anyone suggest me how to achieve this or a place to find the scripts for this problem.

thanks

blast format alignment • 4.9k views
ADD COMMENT
1
Entering edit mode

futher give example of hsp output you want!

ADD REPLY
1
Entering edit mode
13.7 years ago

BLAST stands for Basic Local Alignment Search Tool, which is exactly what it does: searching for local alignments between two sequences.

What you seem to want is a global alignment. There are many tools available that do that for you, however, BLAST is not one of them.

I suggest you get the sequence IDs from your file and use e.g. CLUSTALW, MUSCLE, or T-COFFEE to globally align them.

ADD COMMENT
1
Entering edit mode
13.7 years ago

To answer your question (but not your title) I suggest using BLAT, which automatically chains HSPs from the same target as part of its PSL format.

ADD COMMENT
1
Entering edit mode
13.7 years ago
Marina Manrique ★ 1.3k

Maybe you could try with the pairwise format instead (flag -m 0 I think). I think that in this format all the HSPs that make up a hit are all represented together so you'd need only to look at the query start position and end position to get the whole coverage.

Anyway, if you don't mind to use Java and XML format, I'd recommend to take a look at this Java project: Era7BioinfoXML. It's got a set of XML wrapper classes based on Jdom library. In particular the class Hsp.java is a model of the HSP element, using the methods getQueryFrom and getQueryTo you can get the positions and thus calculate the coverage.

Here you are a sample of the pairwise format

BLASTP 2.0.9 [May-07-1999]
Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, 
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), 
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.
Query= X52524 LOCUS       PFEBA175_1
         (501 letters)
Database: swissprot
           80,000 sequences; 29,085,965 total letters
Searching.................................................done
                                                                   Score     E
Sequences producing significant alignments:                        (bits)  Value
SWISSPROT:EBA1_PLAFC P19214 plasmodium falciparum (isolate camp...   950  0.0
SWISSPROT:PVDB_PLAKN P50493 plasmodium knowlesi. duffy receptor...    96  1e-19
SWISSPROT:PVDR_PLAVI P22290 plasmodium vivax. duffy receptor pr...    96  1e-19
SWISSPROT:PVDG_PLAKN P50494 plasmodium knowlesi. duffy receptor...    95  2e-19
SWISSPROT:PVDA_PLAKN P22545 plasmodium knowlesi. duffy receptor...    83  8e-16
SWISSPROT:SCP1_RAT Q03410 rattus norvegicus (rat). synaptonemal...    46  1e-04
SWISSPROT:SCP1_MOUSE Q62209 mus musculus (mouse). synaptonemal ...    43  0.001
SWISSPROT:EBA1_PLAFC P19214 plasmodium falciparum (isolate camp /
            malaysia). erythrocyte-binding antigen eba-175. 2/1996
            Length = 1435
Score =  950 bits (2430), Expect = 0.0
 Identities = 461/501 (92%), Positives = 461/501 (92%)
Query: 1    NIDRIYDKNLLMIKEHILAIAIYESRILKRKYKNKDDKEVCKIINKTFADIRDIIGGTDY 60
            NIDRIYDKNLLMIKEHILAIAIYESRILKRKYKNKDDKEVCKIINKTFADIRDIIGGTDY
Sbjct: 500  NIDRIYDKNLLMIKEHILAIAIYESRILKRKYKNKDDKEVCKIINKTFADIRDIIGGTDY 559
Query: 61   WNDLSNRKLVGKINTNSKYVHRNKKNDKLFRDEWWKVIKKDVWNVISWVFKDKTVCKEDD 120
            WNDLSNRKLVGKINTNSKYVHRNKKNDKLFRDEWWKVIKKDVWNVISWVFKDKTVCKEDD
Sbjct: 560  WNDLSNRKLVGKINTNSKYVHRNKKNDKLFRDEWWKVIKKDVWNVISWVFKDKTVCKEDD 619
Query: 121  IENIPQFFRWFSEWGDDYCQDKTKMIETLKVECKEKPCEDDNCKSKCNSYKEWISKKKEE 180
            IENIPQFFRWFSEWGDDYCQDKTKMIETLKVECKEKPCEDDNCKSKCNSYKEWISKKKEE
Sbjct: 620  IENIPQFFRWFSEWGDDYCQDKTKMIETLKVECKEKPCEDDNCKSKCNSYKEWISKKKEE 679
Query: 181  YNKQAKQYQEYQKGNNYKMYSEFKSIKPEVYLKKYSEKCSNLNFEDEFKEELHSDYKNKC 240
            YNKQAKQYQEYQKGNNYKMYSEFKSIKPEVYLKKYSEKCSNLNFEDEFKEELHSDYKNKC
Sbjct: 680  YNKQAKQYQEYQKGNNYKMYSEFKSIKPEVYLKKYSEKCSNLNFEDEFKEELHSDYKNKC 739
Query: 241  TMCPEVKDVPISIIRNNEQTSQEAVPEENTEIAHRTETPSISEGPKGNEQKERDDDSLSK 300
            TMCPEVKDVPISIIRNNEQTSQEAVPEENTEIAHRTETPSISEGPKGNEQKERDDDSLSK
Sbjct: 740  TMCPEVKDVPISIIRNNEQTSQEAVPEENTEIAHRTETPSISEGPKGNEQKERDDDSLSK 799
Query: 301  ISVSPENSRPETDAKDTSNLLKLKGDVDISMPKAVIGSSPNDNINVTEQGDNISGVNSKP 360
            ISVSPENSRPETDAKDTSNLLKLKGDVDISMPKAVIGSSPNDNINVTEQGDNISGVNSKP
Sbjct: 800  ISVSPENSRPETDAKDTSNLLKLKGDVDISMPKAVIGSSPNDNINVTEQGDNISGVNSKP 859
Query: 361  LSDDVRPDKKELEDQNSDESEETVVNHISKSPSIXXXXXXXXXXXXXXXXXXXXXXXXID 420
            LSDDVRPDKKELEDQNSDESEETVVNHISKSPSI                        ID
Sbjct: 860  LSDDVRPDKKELEDQNSDESEETVVNHISKSPSINNGDDSGSGSATVSESSSSNTGLSID 919
Query: 421  DDRNGDTFVRTQDTANTEDVIRXXXXXXXXXXXXXXXXRHSTSESLSSPEEKMLTDNEGG 480
            DDRNGDTFVRTQDTANTEDVIR                RHSTSESLSSPEEKMLTDNEGG
Sbjct: 920  DDRNGDTFVRTQDTANTEDVIRKENADKDEDEKGADEERHSTSESLSSPEEKMLTDNEGG 979
Query: 481  NSLNHEEVKEHTSNSDNVQQS 501
            NSLNHEEVKEHTSNSDNVQQS
Sbjct: 980  NSLNHEEVKEHTSNSDNVQQS 1000

HTH,
Marina

ADD COMMENT

Login before adding your answer.

Traffic: 1916 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6