Question

Plotting multiple sequence alignments from tblastn

0

Entering edit mode

2.0 years ago

dec986 ▴ 380

I'm trying to visualize multiple sequence alignments (MSAs) of tblastn. My query was a protein sequence, against a genome with no proteome available.

The output JSON from tblastn shows:

          "hits": [
            {
              "num": 1,
              "description": [
                {
                  "id": "gnl|BL_ORD_ID|29",
                  "accession": "29",
                  "title": "LSRD01000030.1 Fusarium sambucinum strain F-4 contig00030, whole genome shotg
un sequence"
                }
              ],
              "len": 274085,
              "hsps": [
                {
                  "num": 1,
                  "bit_score": 45.0542,
                  "score": 105,
                  "evalue": 0.00043176,
                  "identity": 29,
                  "positive": 45,
                  "query_from": 85,
                  "query_to": 169,
                  "hit_from": 235535,
                  "hit_to": 235762,
                  "hit_frame": 2,
                  "align_len": 85,
                  "gaps": 9,
                  "qseq": "VGITEDSLWTLLTGYTKKESTIGNSAFELLLEVAKSGEKGINTMDLAQVTGQDPRSVTGRIKKINHLLTSSQLIYKGH
VVKQLKL",
                  "hseq": "VRASEDTMWESLTGHAVDYKRVPKSEWMLLLGIASTTTQGILQGDLGRLTDQDKRSVPKR---------TDSLLKKGY
IVKRTTL",
                  "midline": "V  +ED++W  LTG+      +  S + LLL +A +  +GI   DL ++T QD RSV  R         +  L+ 
KG++VK+  L"
                }
              ]
            },

but the problem is that I don't know the start and end of the hit sequence (hseq above). I know the query's start and end.

I like to m, how can I make output like clustalo's multiple sequence alignment output from the tblastn that I already ran? Should I simply search for start codons?

tblastn visualization blast msa • 883 views

ADD COMMENT • link 2.0 years ago by dec986 ▴ 380

0

Entering edit mode

You know this

"hit_from": 235535,
"hit_to": 235762,
"hit_frame": 2,

Since this is a tblastn result those are nucleotide positions on the hit.

You are referring to a MSA so is that an MSA for all hits with this query or just a gross visual representation like one NCBI shows for coverage on web blast?

ADD REPLY • link 2.0 years ago by GenoMax 153k

0

Entering edit mode

sorry, I should've looked more carefully with hit_from and hit_to, I'd like to make a visual representation like enter image description here

ADD REPLY • link 2.0 years ago by dec986 ▴ 380