Blastp filter by Identity
2
0
Entering edit mode
6.2 years ago

I want to filter blast results by identity. I can find an option/parameter in blastn which is -perc_identity. but it is not available in blastp. I can get the blast result and the filter out results as a post process. but can't I do it by specifying a parameter like -perc_identity in blastp? why that option is missing?

BLAST 2.7.1+

sequence • 3.8k views
ADD COMMENT
3
Entering edit mode
6.2 years ago

I think it's mainly because identity does not mean much on protein level (at least not as it does for blastn, nucleotide level), for protein comparison % positives is more informative. Nonetheless that's also not an option for blastP though.

In general it's not the best approach to let blast "filter" your results (look around on biostar and/or internet for reasons why) , so doing it in post processing is the preferred way , and easily done with a bit of awk for instance

ADD COMMENT
0
Entering edit mode

Thanks. I will check what is % positives

ADD REPLY
0
Entering edit mode
6.2 years ago

if you're using a xml output, using a xslt stylesheet, one can filter the Hsp on the identity.

xsltproc --novalid  biostar362495.xsl input.blastp.xml

<?xml version='1.0' encoding="UTF-8" ?>
<xsl:stylesheet
xmlns:xsl='http://www.w3.org/1999/XSL/Transform'
version='1.0'
>
<xsl:output method="xml" encoding="UTF-8"/>
<xsl:template match="*">
<xsl:copy>
<xsl:apply-templates select="@*|text()|*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="Hsp">
<xsl:choose>
<xsl:when test="number(Hsp_identity/text()) &lt; 13">
<xsl:copy-of select="."/>
</xsl:when>
<xsl:otherwise>
<xsl:comment>Ignoring <xsl:value-of select="Hsp_identity"/></xsl:comment>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
<?xml version="1.0"?>
<!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN" "http://www.ncbi.nlm.nih.gov/dtd/NCBI_BlastOutput.dtd">
<BlastOutput>
<BlastOutput_program>blastp</BlastOutput_program>
<BlastOutput_version>BLASTP 2.8.1+</BlastOutput_version>
<BlastOutput_reference>Stephen F. Altschul, Thomas L. Madden, Alejandro A. Sch&amp;auml;ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), &quot;Gapped BLAST and PSI-BLAST: a new generation of protein database search programs&quot;, Nucleic Acids Res. 25:3389-3402.</BlastOutput_reference>
<BlastOutput_db>nr</BlastOutput_db>
<BlastOutput_query-ID>Query_316121</BlastOutput_query-ID>
<BlastOutput_query-def>unnamed protein product</BlastOutput_query-def>
<BlastOutput_query-len>16</BlastOutput_query-len>
<BlastOutput_param>
<Parameters>
<Parameters_matrix>PAM30</Parameters_matrix>
<Parameters_expect>200000</Parameters_expect>
<Parameters_gap-open>9</Parameters_gap-open>
<Parameters_gap-extend>1</Parameters_gap-extend>
<Parameters_filter>F</Parameters_filter>
</Parameters>
</BlastOutput_param>
<BlastOutput_iterations>
<Iteration>
<Iteration_iter-num>1</Iteration_iter-num>
<Iteration_query-ID>Query_316121</Iteration_query-ID>
<Iteration_query-def>unnamed protein product</Iteration_query-def>
<Iteration_query-len>16</Iteration_query-len>
<Iteration_hits>
<Hit>
<Hit_num>1</Hit_num>
<Hit_id>gi|1432231202|gb|AXF43121.1|</Hit_id>
<Hit_def>truncated NSP3 [Bovine rotavirus]</Hit_def>
<Hit_accession>AXF43121</Hit_accession>
<Hit_len>38</Hit_len>
<Hit_hsps>
<Hsp>
<Hsp_num>1</Hsp_num>
<Hsp_bit-score>49.8412</Hsp_bit-score>
<Hsp_score>110</Hsp_score>
<Hsp_evalue>3.16775e-06</Hsp_evalue>
<Hsp_query-from>1</Hsp_query-from>
<Hsp_query-to>16</Hsp_query-to>
<Hsp_hit-from>7</Hsp_hit-from>
<Hsp_hit-to>22</Hsp_hit-to>
<Hsp_query-frame>0</Hsp_query-frame>
<Hsp_hit-frame>0</Hsp_hit-frame>
<Hsp_identity>15</Hsp_identity>
<Hsp_positive>15</Hsp_positive>
<Hsp_gaps>0</Hsp_gaps>
<Hsp_align-len>16</Hsp_align-len>
<Hsp_qseq>TQQMASSIINTSFEAA</Hsp_qseq>
<Hsp_hseq>TQQMVSSIINTSFEAA</Hsp_hseq>
<Hsp_midline>TQQM SSIINTSFEAA</Hsp_midline>
</Hsp>
</Hit_hsps>
</Hit>
<Hit>
<Hit_num>2</Hit_num>
<Hit_id>gi|23198438|gb|AAN15755.1|</Hit_id>
<Hit_def>protein 7, partial [Human rotavirus]</Hit_def>
<Hit_accession>AAN15755</Hit_accession>
<Hit_len>58</Hit_len>
<Hit_hsps>
<Hsp>
<Hsp_num>1</Hsp_num>
<Hsp_bit-score>49.8412</Hsp_bit-score>
<Hsp_score>110</Hsp_score>
<Hsp_evalue>6.5624e-06</Hsp_evalue>
<Hsp_query-from>1</Hsp_query-from>
<Hsp_query-to>16</Hsp_query-to>
<Hsp_hit-from>4</Hsp_hit-from>
<Hsp_hit-to>19</Hsp_hit-to>
<Hsp_query-frame>0</Hsp_query-frame>
<Hsp_hit-frame>0</Hsp_hit-frame>
<Hsp_identity>15</Hsp_identity>
<Hsp_positive>15</Hsp_positive>
<Hsp_gaps>0</Hsp_gaps>
<Hsp_align-len>16</Hsp_align-len>
<Hsp_qseq>TQQMASSIINTSFEAA</Hsp_qseq>
<Hsp_hseq>TQQMVSSIINTSFEAA</Hsp_hseq>
<Hsp_midline>TQQM SSIINTSFEAA</Hsp_midline>
</Hsp>
</Hit_hsps>
</Hit>
<Hit>
<Hit_num>3</Hit_num>
<Hit_id>gi|30027639|gb|AAP13880.1|</Hit_id>
<Hit_def>gene 7 [Human rotavirus]</Hit_def>
<Hit_accession>AAP13880</Hit_accession>
<Hit_len>22</Hit_len>
<Hit_hsps>
<Hsp>
<Hsp_num>1</Hsp_num>
<Hsp_bit-score>40.0857</Hsp_bit-score>
<Hsp_score>87</Hsp_score>
<Hsp_evalue>0.00577974</Hsp_evalue>
<Hsp_query-from>4</Hsp_query-from>
<Hsp_query-to>16</Hsp_query-to>
<Hsp_hit-from>1</Hsp_hit-from>
<Hsp_hit-to>13</Hsp_hit-to>
<Hsp_query-frame>0</Hsp_query-frame>
<Hsp_hit-frame>0</Hsp_hit-frame>
<Hsp_identity>12</Hsp_identity>
<Hsp_positive>12</Hsp_positive>
<Hsp_gaps>0</Hsp_gaps>
<Hsp_align-len>13</Hsp_align-len>
<Hsp_qseq>MASSIINTSFEAA</Hsp_qseq>
<Hsp_hseq>MVSSIINTSFEAA</Hsp_hseq>
<Hsp_midline>M SSIINTSFEAA</Hsp_midline>
</Hsp>
</Hit_hsps>
</Hit>
<Hit>
<Hit_num>4</Hit_num>
<Hit_id>gi|1539474422|emb|VEB38454.1|</Hit_id>
<Hit_def>Uncharacterised protein [Legionella sainthelensi]</Hit_def>
<Hit_accession>VEB38454</Hit_accession>
<Hit_len>29</Hit_len>
<Hit_hsps>
<Hsp>
<Hsp_num>1</Hsp_num>
<Hsp_bit-score>22.6954</Hsp_bit-score>
<Hsp_score>46</Hsp_score>
<Hsp_evalue>27356.3</Hsp_evalue>
<Hsp_query-from>2</Hsp_query-from>
<Hsp_query-to>8</Hsp_query-to>
<Hsp_hit-from>9</Hsp_hit-from>
<Hsp_hit-to>15</Hsp_hit-to>
<Hsp_query-frame>0</Hsp_query-frame>
<Hsp_hit-frame>0</Hsp_hit-frame>
<Hsp_identity>6</Hsp_identity>
<Hsp_positive>7</Hsp_positive>
<Hsp_gaps>0</Hsp_gaps>
<Hsp_align-len>7</Hsp_align-len>
<Hsp_qseq>QQMASSI</Hsp_qseq>
<Hsp_hseq>EQMASSI</Hsp_hseq>
<Hsp_midline>+QMASSI</Hsp_midline>
</Hsp>
</Hit_hsps>
</Hit>
</Iteration_hits>
<Iteration_stat>
<Statistics>
<Statistics_db-num>187417574</Statistics_db-num>
<Statistics_db-len>68351197366</Statistics_db-len>
<Statistics_hsp-len>0</Statistics_hsp-len>
<Statistics_eff-space>0</Statistics_eff-space>
<Statistics_kappa>0.11</Statistics_kappa>
<Statistics_lambda>0.294</Statistics_lambda>
<Statistics_entropy>0.61</Statistics_entropy>
</Statistics>
</Iteration_stat>
</Iteration>
</BlastOutput_iterations>
</BlastOutput>
<?xml version="1.0" encoding="UTF-8"?>
<BlastOutput>
<BlastOutput_program>blastp</BlastOutput_program>
<BlastOutput_version>BLASTP 2.8.1+</BlastOutput_version>
<BlastOutput_reference>Stephen F. Altschul, Thomas L. Madden, Alejandro A. Sch&amp;auml;ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402.</BlastOutput_reference>
<BlastOutput_db>nr</BlastOutput_db>
<BlastOutput_query-ID>Query_316121</BlastOutput_query-ID>
<BlastOutput_query-def>unnamed protein product</BlastOutput_query-def>
<BlastOutput_query-len>16</BlastOutput_query-len>
<BlastOutput_param>
<Parameters>
<Parameters_matrix>PAM30</Parameters_matrix>
<Parameters_expect>200000</Parameters_expect>
<Parameters_gap-open>9</Parameters_gap-open>
<Parameters_gap-extend>1</Parameters_gap-extend>
<Parameters_filter>F</Parameters_filter>
</Parameters>
</BlastOutput_param>
<BlastOutput_iterations>
<Iteration>
<Iteration_iter-num>1</Iteration_iter-num>
<Iteration_query-ID>Query_316121</Iteration_query-ID>
<Iteration_query-def>unnamed protein product</Iteration_query-def>
<Iteration_query-len>16</Iteration_query-len>
<Iteration_hits>
<Hit>
<Hit_num>1</Hit_num>
<Hit_id>gi|1432231202|gb|AXF43121.1|</Hit_id>
<Hit_def>truncated NSP3 [Bovine rotavirus]</Hit_def>
<Hit_accession>AXF43121</Hit_accession>
<Hit_len>38</Hit_len>
<Hit_hsps>
<!--Ignoring 15-->
</Hit_hsps>
</Hit>
<Hit>
<Hit_num>2</Hit_num>
<Hit_id>gi|23198438|gb|AAN15755.1|</Hit_id>
<Hit_def>protein 7, partial [Human rotavirus]</Hit_def>
<Hit_accession>AAN15755</Hit_accession>
<Hit_len>58</Hit_len>
<Hit_hsps>
<!--Ignoring 15-->
</Hit_hsps>
</Hit>
<Hit>
<Hit_num>3</Hit_num>
<Hit_id>gi|30027639|gb|AAP13880.1|</Hit_id>
<Hit_def>gene 7 [Human rotavirus]</Hit_def>
<Hit_accession>AAP13880</Hit_accession>
<Hit_len>22</Hit_len>
<Hit_hsps>
<Hsp>
<Hsp_num>1</Hsp_num>
<Hsp_bit-score>40.0857</Hsp_bit-score>
<Hsp_score>87</Hsp_score>
<Hsp_evalue>0.00577974</Hsp_evalue>
<Hsp_query-from>4</Hsp_query-from>
<Hsp_query-to>16</Hsp_query-to>
<Hsp_hit-from>1</Hsp_hit-from>
<Hsp_hit-to>13</Hsp_hit-to>
<Hsp_query-frame>0</Hsp_query-frame>
<Hsp_hit-frame>0</Hsp_hit-frame>
<Hsp_identity>12</Hsp_identity>
<Hsp_positive>12</Hsp_positive>
<Hsp_gaps>0</Hsp_gaps>
<Hsp_align-len>13</Hsp_align-len>
<Hsp_qseq>MASSIINTSFEAA</Hsp_qseq>
<Hsp_hseq>MVSSIINTSFEAA</Hsp_hseq>
<Hsp_midline>M SSIINTSFEAA</Hsp_midline>
</Hsp>
</Hit_hsps>
</Hit>
<Hit>
<Hit_num>4</Hit_num>
<Hit_id>gi|1539474422|emb|VEB38454.1|</Hit_id>
<Hit_def>Uncharacterised protein [Legionella sainthelensi]</Hit_def>
<Hit_accession>VEB38454</Hit_accession>
<Hit_len>29</Hit_len>
<Hit_hsps>
<Hsp>
<Hsp_num>1</Hsp_num>
<Hsp_bit-score>22.6954</Hsp_bit-score>
<Hsp_score>46</Hsp_score>
<Hsp_evalue>27356.3</Hsp_evalue>
<Hsp_query-from>2</Hsp_query-from>
<Hsp_query-to>8</Hsp_query-to>
<Hsp_hit-from>9</Hsp_hit-from>
<Hsp_hit-to>15</Hsp_hit-to>
<Hsp_query-frame>0</Hsp_query-frame>
<Hsp_hit-frame>0</Hsp_hit-frame>
<Hsp_identity>6</Hsp_identity>
<Hsp_positive>7</Hsp_positive>
<Hsp_gaps>0</Hsp_gaps>
<Hsp_align-len>7</Hsp_align-len>
<Hsp_qseq>QQMASSI</Hsp_qseq>
<Hsp_hseq>EQMASSI</Hsp_hseq>
<Hsp_midline>+QMASSI</Hsp_midline>
</Hsp>
</Hit_hsps>
</Hit>
</Iteration_hits>
<Iteration_stat>
<Statistics>
<Statistics_db-num>187417574</Statistics_db-num>
<Statistics_db-len>68351197366</Statistics_db-len>
<Statistics_hsp-len>0</Statistics_hsp-len>
<Statistics_eff-space>0</Statistics_eff-space>
<Statistics_kappa>0.11</Statistics_kappa>
<Statistics_lambda>0.294</Statistics_lambda>
<Statistics_entropy>0.61</Statistics_entropy>
</Statistics>
</Iteration_stat>
</Iteration>
</BlastOutput_iterations>
</BlastOutput>

ADD COMMENT
0
Entering edit mode

Thanks for the reply, BTW I am using TAB format

ADD REPLY

Login before adding your answer.

Traffic: 1780 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6