Hey all, I'm getting blastp results in a JSON, and want to turn them into a readable format to display on a website
to formulate this as a general question about pairwise alignment, I'm looking for a function (don't matter which programming language) that takes in the following arguments: Positions of the alignment on sequence_a (QUERY_FROM, QUERY_TO) Positions of the alignment on sequence_b (HIT_FROM, HIT_TO) The alignment length (ALIGN_LEN), sequence_a, sequence_b (QSEQ,HSEQ) Plus the middle line (MIDLINE)
and returns a long string with line breaks that is a readable representation of the alignment with positions on the sequences.
Sample input:
{
"query_from": 4,
"query_to": 96,
"hit_from": 1,
"hit_to": 99,
"align_len": 100,
"qseq": "MSDYSTMSSGYCSLEVELEDCFFTAK----RNLQSKQPTKNLCKAVEETWHPPTIQEIKQKIDSY---EKFCLGMKLSEDGYYTGFIKVVGLKLRRPVTV",
"hseq": "MTVDSSMSSGYCSLDEELEDCFFTAKTTFFRNLQSKQPSKNVCKAVEETQHPPTIQEIKQKIDSYNSREKHCLGMKLSEDGTYTGFIK-VHLKLRRPVTV",
"midline": "M+ S+MSSGYCSL+ ELEDCFFTAK RNLQSKQP+KN+CKAVEET HPPTIQEIKQKIDSY EK CLGMKLSEDG YTGFIK V LKLRRPVTV"
}
Requested output:
Query 4 MSDYSTMSSGYCSLEVELEDCFFTAK----RNLQSKQPTKNLCKAVEETWHPPTIQEIKQ 59
M+ S+MSSGYCSL+ ELEDCFFTAK RNLQSKQP+KN+CKAVEET HPPTIQEIKQ
Sbjct 1 MTVDSSMSSGYCSLDEELEDCFFTAKTTFFRNLQSKQPSKNVCKAVEETQHPPTIQEIKQ 60
Query 60 KIDSY---EKFCLGMKLSEDGYYTGFIKVVGLKLRRPVTV 96
KIDSY EK CLGMKLSEDG YTGFIK V LKLRRPVTV
Sbjct 61 KIDSYNSREKHCLGMKLSEDGTYTGFIK-VHLKLRRPVTV 99
Thanks!
You can use
blast_formatter
program included inblast+
package, if you save the output in-outfmt 11
first.Take a look at biopython blast parser as well.
Thank you for the quick reply. I need a solution for the input I've written, as I can only use the JSON data I'm getting. I'm looking for some actual source code I can rewrite into my application