Question

How to remove transcripts that have poor alignment scores in exonerate analysis

0

Entering edit mode

9.1 years ago

Ginsea Chen ▴ 140

Dear all

I am a new user of exonerate. I tried to map protein-evidences to whole genome assembly by using exonerate with protein2genome model. After protein-evidences mapping, I wanted to filter all obtained transcripts (exonerate output file) that have poor alignment scrores. In Liang et al article (Liang C, Mao L, Ware D, et al. Evidence-based gene predictions in plant genomes[J]. Genome research, 2009, 19(10): 1912-1923.), they generally use a sequence identity threshold of 90% for same-species alignment and of 30% (protein sequence similarity) for cross-species alignments, while I only found raw alignment score (such as 805) in output file of exonerate.

So I don't know how to filter my transcripts based on exonerate results. In other words, I can't find any sequence identity value (i.e 90%) in exonerate results. So I doubt that if there were some ways to transfer raw alignment score (i.e. 805) to sequence identity value (i.e. 90%).

Thanks all

genome alignment • 2.8k views

ADD COMMENT • link updated 9.1 years ago by Michael 55k • written 9.1 years ago by Ginsea Chen ▴ 140

Ram · Accepted Answer · 2015-11-09

From the manpage:

--ryo <format>
              Roll-your-own  output  format.  This allows specification of a printf-esque format line which is used
              to specify which information to include in the output, and how it is to be shown.  The  format  field
              may contain the following fields:

              %[qt][idlsSt]
                     For  either  {query,target},  report the {id,definition,length,sequence,Strand,type} Sequences
                     are reported in a fasta-format like block (no headers).
              %[qt]a[bels]
                     For   either   {query,target}   region   which   occurs   in   the   alignment,   report   the
                     {begin,end,length,sequence}
              %[qt]c[bels]
                     For  either {query,target} region which occurs in the coding sequence in the alignment, report
                     the {begin,end,length,sequence}
              %s     The raw score
              %r     The rank (in results from a bestn search)
              %m     Model name
     --->     %e[tism]
                     Equivalenced {total,id,similarity,mismatches} (ie. %em == (%et - %ei))
     --->     %p[is] Percent {id,similarity} over the equivalenced portions of the alignment.  (ie. %pi == 100*(%ei
                     / %et))