Entering edit mode
16 months ago
dec986
▴
380
I'm trying to visualize multiple sequence alignments (MSAs) of tblastn. My query was a protein sequence, against a genome with no proteome available.
The output JSON from tblastn shows:
"hits": [
{
"num": 1,
"description": [
{
"id": "gnl|BL_ORD_ID|29",
"accession": "29",
"title": "LSRD01000030.1 Fusarium sambucinum strain F-4 contig00030, whole genome shotg
un sequence"
}
],
"len": 274085,
"hsps": [
{
"num": 1,
"bit_score": 45.0542,
"score": 105,
"evalue": 0.00043176,
"identity": 29,
"positive": 45,
"query_from": 85,
"query_to": 169,
"hit_from": 235535,
"hit_to": 235762,
"hit_frame": 2,
"align_len": 85,
"gaps": 9,
"qseq": "VGITEDSLWTLLTGYTKKESTIGNSAFELLLEVAKSGEKGINTMDLAQVTGQDPRSVTGRIKKINHLLTSSQLIYKGH
VVKQLKL",
"hseq": "VRASEDTMWESLTGHAVDYKRVPKSEWMLLLGIASTTTQGILQGDLGRLTDQDKRSVPKR---------TDSLLKKGY
IVKRTTL",
"midline": "V +ED++W LTG+ + S + LLL +A + +GI DL ++T QD RSV R + L+
KG++VK+ L"
}
]
},
but the problem is that I don't know the start and end of the hit sequence (hseq
above). I know the query's start and end.
I like to m, how can I make output like clustalo's multiple sequence alignment output from the tblastn that I already ran? Should I simply search for start codons?
You know this
Since this is a tblastn result those are nucleotide positions on the hit.
You are referring to a MSA so is that an MSA for all hits with this query or just a gross visual representation like one NCBI shows for coverage on web blast?
sorry, I should've looked more carefully with
hit_from
andhit_to
, I'd like to make a visual representation like