I have a blast output file output.txt
with multiple hits in it.
Score = 885 bits (479), Expect = 0.0
Identities = 613/676 (91%), Gaps = 16/676 (2%)
Strand=Plus/Minus
Query 1 ATGATAATTGATACGACAGAAGTACAAACTATCAATTCTTTTTCTATATTAGAATCCTTA 60
|||||||||||||||||||||||||||||||| |||||||||||| |||||||||||||
Sbjct 116345 ATGATAATTGATACGACAGAAGTACAAACTATTAATTCTTTTTCTGGATTAGAATCCTTA 116286
Query 61 AAAGAAGTCTATGGACTCATATGGATTTTTGTCCCCATTTTCACCCTTGTCTTAGGAATC 120
|||||||| |||||||||||||||||||||||||||||||| ||||||||||||||||||
Sbjct 116285 AAAGAAGTATATGGACTCATATGGATTTTTGTCCCCATTTTAACCCTTGTCTTAGGAATC 116226
Query 121 ACAATGGGGGTATTAGTAATTGTGTGGTTAGAAAGAAAAATATCCGCAGCAATACAACAA 180
||||||||| ||||||| |||||||| ||||||||||||||||| ||| ||| |||||||
Sbjct 116225 ACAATGGGG-TATTAGTCATTGTGTGATTAGAAAGAAAAATATCTGCAACAACACAACAA 116167
Query 181 CGTATTGGACCTGAATATGCCGGCCCATTAGGAATTCTTCAAGCTTTAGCGGATGGGACC 240
|||||||||||||||| |||||||||||||||||||||||||||||||||||||||||
Sbjct 116166 TGTATTGGACCTGAATAGGCCGGCCCATTAGGAATTCTTCAAGCTTTAGCGGATGGGACG 116107
Query 241 AAACTATTTTTGAAGGAGGATCTTCTTCCTTCTAGAGGGAATATTCGTTTGTTTAGCGTC 300
|||||| ||||||||||| || ||||||| |||||||| |||||||| || |||| |||
Sbjct 116106 AAACTACTTTTGAAGGAGAATATTCTTCCGTCTAGAGGTAATATTCGCTTATTTAAGGTC 116047
Query 301 GGACCTTCTATAGGGGTTATATCAATTCTACTAAGTTATTTAGTAATTCCTTTTGGATAT 360
||||| ||||||||| |||||||||| |||||||||| |||||||||||||
Sbjct 116046 GGACCCTCTATAGGGTTTATATCAATCCTACTAAGTT----------TCCTTTTGGATAT 115997
Query 361 CACCTTGTTTTAGCTGATCTCAGTATAGGTGtttttttATGGATTGCCATTTCAAGTATT 420
|||||||||||||||||| ||||||||||||||||||||||||||||| |||||||||||
Sbjct 115996 CACCTTGTTTTAGCTGATTTCAGTATAGGTGTTTTTTTATGGATTGCCTTTTCAAGTATT 115937
Query 421 GTCCCCATTGGTCTTCTTATGTCAGGATATGGATCAAATAATAAGTATTCCTTTTCAGGC 480
||||| ||||||||||||||||||||||||| ||||||||||||||||||||||||||||
Sbjct 115936 GTCCCTATTGGTCTTCTTATGTCAGGATATGAATCAAATAATAAGTATTCCTTTTCAGGC 115877
Query 481 GGTCTACGAGCTGCAGCTCAATCGATTAGTTATGAAATACCATTAACTCTATGTGTGTTA 540
||||||||||||| || ||||| ||||||||||||||||||||||||||||||||||||
Sbjct 115876 GGTCTACGAGCTGTAGATCAATAAATTAGTTATGAAATACCATTAACTCTATGTGTGTTA 115817
Query 541 GCAATATCTCTACGTGCGATTCGTTTGAACATGAACtttttttCTCTATTTTCTAGAAAA 600
|||||||||||||||| |||||||| ||||||||||| |||| ||| ||||||||||||
Sbjct 115816 GCAATATCTCTACGTGTGATTCGTTAGAACATGAACTCTTTT-CTC--TTTTCTAGAAAA 115760
Query 601 GAgaaaagaaatgaattgaaatttcaatacaatataaatagaattcaatatgtaaatatg 660
|||| |||||||||||| |||| ||||| |||| | ||| ||| |||||||||||||||
Sbjct 115759 GAGATAAGAAATGAATTTAAATA-CAATAAAATAGAGATATAATGCAATATGTAAATATG 115701
Query 661 aa-ataaaaaaaaaGA 675
|| || || ||||||
Sbjct 115700 AATATGAACGAAAAGA 115685
Score = 845 bits (457), Expect = 0.0
Identities = 603/670 (90%), Gaps = 24/670 (4%)
Strand=Plus/Minus
Query 678 ttttttATTCAACATTTCAGTTCGATGAGTTAAACCAGATAGTTATATGAGTGAAA-CAA 736
||||||||| || |||| ||||||||||||||||||||| |||||||||||||||| ||
Sbjct 115566 TTTTTTATTAAAAATTTTAGTTCGATGAGTTAAACCAGAGAGTTATATGAGTGAAAAAAA 115507
Query 737 AACTGCTCCTCAATTTGCAGTAAAACAAGAAAAATCTCATTCCCTAGGTACAAGAATGAA 796
|||||||||||||||||||||||||||| ||||||||||||||||||||||||||| |||
Sbjct 115506 AACTGCTCCTCAATTTGCAGTAAAACAATAAAAATCTCATTCCCTAGGTACAAGAA-GAA 115448
Query 797 A-TTGAAGTAAACATAAGTTGTTTACCCCAAGATTGAGATTCTTTGATTAGTCGTCATAT 855
| ||||||||||||||||||||||||||||| ||||||||| ||| |||||||||||||
Sbjct 115447 ATTTGAAGTAAACATAAGTTGTTTACCCCAATATTGAGATTATTTTCTTAGTCGTCATAT 115388
Query 856 CTTGAAGCGGATGCAAAAGATCAACTGTATTTATTACTATACTGGGGATCAATCAAAAAG 915
|||||||||||||||||||||| ||| |||||||||||||||||| ||||||||||||||
Sbjct 115387 CTTGAAGCGGATGCAAAAGATCCACTTTATTTATTACTATACTGGAGATCAATCAAAAAG 115328
Query 916 AAGTGGGTAGTTAGGAACACCAAAGTACACAAAGGATGAGTAATGGAAATAATGTAAGGT 975
||||| |||||||||||||||||||| |||||||||||||||| |||||||||||| |
Sbjct 115327 AAGTGAC-AGTTAGGAACACCAAAGTACGCAAAGGATGAGTAATGAAAATAATGTAAGAT 115269
Query 976 ATCaaa-a-aa-aGGG---GTT-TTTG--CATAAAACTTTGCATAAAACGAATCATAAT- 1025
|||||| | || | ||| ||| ||||||||||| ||||||||||||| ||||
Sbjct 115268 ATCAAAGATAACAAAAAAAGTTATTTTTTCATAAAACTTTCCATAAAACGAATCCTAATT 115209
Query 1026 AAGGGCTTGAAGTTGGTAGAAATGATCAAGCAGTACTTCCCCACGATTCCAATCTAGAGT 1085
||||||||| | |||||||||||||||||||||||||| ||||||||| | |||||||||
Sbjct 115208 AAGGGCTTGTAATTGGTAGAAATGATCAAGCAGTACTTTCCCACGATTACGATCTAGAGT 115149
Query 1086 ATGCTACTATTCGCTGATTAAAGAAATGACTATCAAGAACGAATTAATCCTTTATTTTAT 1145
|||||||||||||||||||||| ||||||||||||||||| ||| ||||||||||||| |
Sbjct 115148 ATGCTACTATTCGCTGATTAAATAAATGACTATCAAGAACAAATGAATCCTTTATTTTCT 115089
Query 1146 TTCCtttttttttttAGTTTTCagaaagaagaacaggaacaagacaaatagaatgcaata 1205
| ||||||||||||||||||||||| ||||||||||||||||||||||||||||||||
Sbjct 115088 TGACTTTTTTTTTTTAGTTTTCAGAAGAAAGAACAGGAACAAGACAAATAGAATGCAATA 115029
Query 1206 caataatagaataaaa--aagaataaaacgggaataataagaaaataTTTAGTTCTTCGT 1263
||||||| ||||||| |||||||||||||||||||||||||||||||| ||| | | |
Sbjct 115028 TAATAATATAATAAAATAAAGAATAAAACGGGAATAATAAGAAAATATTT-GTT-T-C-T 114973
Query 1264 TTCTTCATACATATGCATATGGGAATTCTTATCATGATTCATTAACTAATGCCCAATTCT 1323
| |||||||||||||||| |||| ||||||||||||||||||||||| ||||||||
Sbjct 114972 TA----ATACATATGCATATGGAAATTTTTATCATGATTCATTAACTAATGTCCAATTCT 114917
Query 1324 TTTTATTTAT 1333
||||||||||
Sbjct 114916 TTTTATTTAT 114907
Score = 721 bits (390), Expect = 0.0
Identities = 446/473 (94%), Gaps = 4/473 (1%)
Strand=Plus/Minus
Query 1737 GTGGCGTCAGCCCATAGGGTTTCTAGTTTTTCTAATGTCTTCTCTAGCAGAATGTGAAAG 1796
||||||||| |||||||||||||||||| |||||||||||||||||||||||||||| ||
Sbjct 114885 GTGGCGTCAACCCATAGGGTTTCTAGTTCTTCTAATGTCTTCTCTAGCAGAATGTGAGAG 114826
Query 1797 ATTACCCTTTGATTTACCGGAAGCAGAGGAGGAATTAGTAGCAGGTTATCAAACCGAATA 1856
|||||| ||| ||||||| |||| ||||||| |||||||||||||||||||||| ||||
Sbjct 114825 ATTACCTTTTAATTTACCAGAAGTAGAGGAGATATTAGTAGCAGGTTATCAAACCAAATA 114766
Query 1857 TTCAGGTATAAAATACGGGTTATTTTATCTTGCTTCTTACCTAAATCTATTAGTTTCTTC 1916
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 114765 TTCAGGTATAAAATACGGGTTATTTTATCTTGCTTCTTACCTAAATCTATTAGTTTCTTC 114706
Query 1917 ATT----ATTTGTAACAGTTCTTTACTTAGGTGGGTGGAATTTTTCTATTCCGTACATAT 1972
||| ||||||||||||||||||||||||| |||||||||| ||||||||||||||||
Sbjct 114705 ATTCATTATTTGTAACAGTTCTTTACTTAGGTAGGTGGAATTTCTCTATTCCGTACATAT 114646
Query 1973 CTATTACTGAACTTTTTGGAATAAATAAAATGTTTAGAGTCTTTGTAATAGCAATTGGTA 2032
|| ||||||||||||||||||||||||||||||| ||||||||||||||||||||| ||
Sbjct 114645 CTCTTACTGAACTTTTTGGAATAAATAAAATGTTGAGAGTCTTTGTAATAGCAATTAATA 114586
Query 2033 TCTTTATTACATTAGCTAAAGCTTATTTGTTTCTGTTCATTCCTATCACAACAAGATGGA 2092
||||||| |||||||||||||||||||||||||||||||||||||| |||||||||||||
Sbjct 114585 TCTTTATCACATTAGCTAAAGCTTATTTGTTTCTGTTCATTCCTATAACAACAAGATGGA 114526
Query 2093 CTTTACCTAGGATGAGAATGGATCAGTTATTAAATCTTGGATGGAAATTTCTTTTACCTA 2152
||||||||||||||||||||||||| ||||||||||||||||| ||||||| ||||||||
Sbjct 114525 CTTTACCTAGGATGAGAATGGATCAATTATTAAATCTTGGATGAAAATTTCCTTTACCTA 114466
Query 2153 TTTCTCTAGGTAATCTATTATTGACAACTTCTTCTCAACTTGTTTCACTATAA 2205
|||||||||||||||||||||| |||||||||| |||||||||||||||||||
Sbjct 114465 TTTCTCTAGGTAATCTATTATTAACAACTTCTTTTCAACTTGTTTCACTATAA 114413
The hits from the subject sequence are quite close together, so I would like to join the hits from the subject sequences together into one long sequence from 116345 to 114413 bp in the subject sequence together, with the gaps in between the hips marked by n
or -
characters
Does anyone know of a method to concatenate multiple blast hits into one long sequence?
If you are using a spliced gene to identify its location in genome then using GMAP (LINK) may be a better option.
That said you could simply use
blast -outfmt 6/7
to get coordinates in result output. Take outer bound coordinates and pull out the relevant sequences from the database.I am not sure why you want to do that when there is real sequence in the genome there.
There might be tools to do this. But here is the quick fix for OP data (please check for 1 off coordinates in the code):
can be fed to TPA: