How To Merge Contiguous Blast Hsps! (-M 8 Tab)
1
2
Entering edit mode
11.5 years ago
xiongtl2013 ▴ 40

Hi, guys!

I performed blastx (-m 8) using a query file of many sequences, and for each target sequence, the output contains many fragmental hsps of significance, and these hsps have overlap positions or not.

so, how can i merge those closely related hsps into one via setting a flanking value (e.g <300bp) when these hsps match the same subject (different regions).

In the following figure, I want to transform the upper results to the lower ones

http://www.imagebam.com/image/0c13b2256142668

Thanks in advance!

Transform

scaffold16:1661-2239(+)       gi|471236998|ref|YP_007641386.1|  50.00  122  52  3  **225  578**  603  719  2e-53   126                   
scaffold16:1661-2239(+)       gi|471236998|ref|YP_007641386.1|  75.00  76   19  0  **1    228**  528  603  2e-53   108
scaffold16:1661-2239(+)       gi|333951646|gb|AEG25349.1|       52.10  119  54  2  **225  578**  604  720  7e-53   124
scaffold16:1661-2239(+)       gi|333951646|gb|AEG25349.1|       77.63  76   17  0  **1    228**  529  604  7e-53   109
scaffold28:2776872-2777385(-) gi|327335359|gb|AEA49877.1|       70.18  57   17  0  **173  343**  554  610  3e-30   90.5
scaffold28:2776872-2777385(-) gi|327335359|gb|AEA49877.1|       72.22  54   15  0  **1    162**  497  550  3e-30   67.0

To

scaffold16:1661-2239(+)       gi|471236998|ref|YP_007641386.1|   .      .    .  .  **1    578**   .    .   2e-53   **234**
scaffold16:1661-2239(+)       gi|333951646|gb|AEG25349.1|        .      .    .  .  **1    578**   .    .   7e-53   **233**
scaffold28:2776872-2777385(-) gi|327335359|gb|AEA49877.1|        .      .    .  .  **1    343**   .    .   3e-30   **157.0**
merge blast • 4.4k views
ADD COMMENT
3
Entering edit mode
11.5 years ago
brentp 24k

You can get most of the way there doing this:

awk 'BEGIN{FS=OFS="\t"}{ a=$0; gsub(/\t/, "ZZZ", a); print $1,$7,$8,a }' blast.txt \
         | sort -k1,1 -k2,2n \
         | bedtools merge -nms > out.bed

with your input in blast.txt. The output will have the lines above, you'll just have to do a bit of parsing to split on "ZZZ" and put the appropriate start (2nd column) and end (3rd) into the right places.

ADD COMMENT
0
Entering edit mode

What a clever skill! Thank you very much. It really solve the problem, but to a certain extent. Cause the script will merge all of HSPs which have the same query but different targets, that is to say, it will merge all the first four records listed above. What I want to merge are HSPs with the same target, but different positions (also the same query).

hope some more helps! Any ideas are welcome...

ADD REPLY

Login before adding your answer.

Traffic: 3002 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6