Find the best hit of heterologous protein in blastx results

0

Entering edit mode

11.1 years ago

arronar ▴ 290

Hello!

Lets take the story from the beginning.

I am running a blastx for the query (contig) above using only the Oryza taxa and evalue 3.

Contig:

>Contig375
CGGGGATCTGAATGGACTTCTCTCATTTCTACCAGCATGCTGGTGGGAATCTTGTATATATAGAGATTTG
ACAATCAAGTAAGAAGTTTAAATAATTTGTAGCTTTCTTTTGTAATGCATACTTTTATCGATACCTAGAA
AAAATTACGTTTAGATCACTTATTAGAGTGACATTGTTGTCATACATTGGATGTTTATAAACCTGATGAT
CTGTTTGCATATTCCTGAACCAATGCCCCAAAGAGTGAGGGCTTCTCAATCAAACGTGAAGGCTTGTCAA
ATTCTTTTGCATACCCTGCATCAATGACTAAAACCCGATCACAGTCCATGACAGTAGGTATCCTATGAGC
TATGCTAACGATGGTA

As you can see (if you run the same job) it returns a numerous hits as a result and the first is the one with the smallest evalue.

So what I want is to get as first result the sequence with the above characteristics:

Its length to be as greater as possible. e.g in our example the first hit has length of 251 while the second one has a length of 1278 amino acids.
To be as possible near to the 5' end. By this i mean to be closer to the first amino acid (methionine) e.g in our example some hits start from the 20th amino acid while others start from the 1200th.

In a nutshell I want to filter the results of blastx to return me, as bigger (in length) as possible protein but in the same time that sequence to be close or identical to the beginning of the protein.

So is there any way to filter the results in such a way ? Or maybe there is another database rather than this of NCBI to search for more completed protein sequences .

Thank you.

hit blastx heterologous-proteins • 2.5k views

ADD COMMENT • link updated 3.7 years ago by Ram 45k • written 11.1 years ago by arronar ▴ 290

2

Entering edit mode

11.1 years ago

Pierre Lindenbaum 166k

The following XSLT sort a XML output of blastx on Hit/Hit_len and then on Hsp/Hsp_hit_from

	<?xml version="1.0" encoding="UTF-8"?>
	<xsl:stylesheet
	version="1.0"
	xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
	>

	<xsl:output method='xml' indent="yes"/>

	<xsl:template match="/">
	<xsl:apply-templates select="*"/>
	</xsl:template>



	<xsl:template match="\|text()\|@">
	<xsl:copy>
	<xsl:apply-templates select="\|text()\|@"/>
	</xsl:copy>
	</xsl:template>


	<xsl:template match="Iteration_hits">
	<Iteration_hits>
	<xsl:for-each select="Hit">
	<xsl:sort select="number(Hit_len)" data-type="number" order="descending"/>
	<xsl:sort select="number(Hit_hsps/Hsp[1]/Hsp_query-from)" data-type="number" order="ascending"/>
	<xsl:apply-templates select="."/>
	</xsl:for-each>
	</Iteration_hits>
	</xsl:template>


	<xsl:template match="Hit_hsps">
	<Hit_hsps>
	<xsl:for-each select="Hsp">
	<xsl:sort select="number(Hsp_query-from)" data-type="number" order="ascending"/>
	<xsl:apply-templates select="."/>
	</xsl:for-each>
	</Hit_hsps>
	</xsl:template>


	</xsl:stylesheet>

view raw blastsort.xsl hosted with ❤ by GitHub

Usage:

xsltproc --novalid blastsort.xsl blastx.xml

ADD COMMENT • link updated 3.7 years ago by Ram 45k • written 11.1 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

Are you sure that this works ? I am running it and it returns me back the results in the same order.

Here is my initial XML file.

ADD REPLY • link updated 3.7 years ago by Ram 45k • written 11.1 years ago by arronar ▴ 290

0

Entering edit mode

ah yes, sorry, I forgot the attribute "data-type="number"' . I updated the code

xsltproc --novalid stylesheet.xsl blastx.xml | grep -E '(Hit_len|Hsp_query\-from)'

  <Hit_len>1489</Hit_len>
      <Hsp_query-from>8</Hsp_query-from>
      <Hsp_query-from>17</Hsp_query-from>
  <Hit_len>1468</Hit_len>
      <Hsp_query-from>2</Hsp_query-from>
      <Hsp_query-from>23</Hsp_query-from>
  <Hit_len>1451</Hit_len>
      <Hsp_query-from>5</Hsp_query-from>
      <Hsp_query-from>8</Hsp_query-from>
  <Hit_len>1444</Hit_len>
      <Hsp_query-from>5</Hsp_query-from>
      <Hsp_query-from>8</Hsp_query-from>
  <Hit_len>1441</Hit_len>
      <Hsp_query-from>5</Hsp_query-from>
      <Hsp_query-from>8</Hsp_query-from>
  <Hit_len>1441</Hit_len>
      <Hsp_query-from>5</Hsp_query-from>
      <Hsp_query-from>8</Hsp_query-from>
  <Hit_len>1356</Hit_len>
      <Hsp_query-from>8</Hsp_query-from>
      <Hsp_query-from>17</Hsp_query-from>
  <Hit_len>1199</Hit_len>
      <Hsp_query-from>8</Hsp_query-from>
      <Hsp_query-from>293</Hsp_query-from>
  <Hit_len>763</Hit_len>
      <Hsp_query-from>8</Hsp_query-from>
  <Hit_len>517</Hit_len>
      <Hsp_query-from>8</Hsp_query-from>

ADD REPLY • link updated 3.7 years ago by Ram 45k • written 11.1 years ago by Pierre Lindenbaum 166k

Login before adding your answer.