blastp word_size parameter seems to be ignored - BLASTP 2.4.0+
0
0
Entering edit mode
8.1 years ago

Hi Everyone.

I am trying to blast many short peptide sequences against a protein database. I am looking for nearly exact matches and selected word_size of 4.

However, there are some hsp matches where the longest stretch of consecutive amino acids that are identical between query and subject is 3. Please can someone clarify why this is, as I though every hsp match sequence should have at least one stretch of identical consecutive amino acids equal or greater than the word_size.

Here is my command that demonstrates this:

blastp -query testPeptide.fasta -matrix PAM30 -outfmt 5 -word_size 4 -subject test.fasta

Query:

">peptide FTDFQGGV"

Subject:

">S507_scaffold13_size114854|S507_scaffold13_size114854_recno_56.0|(+)20770:21546 WVVVDRGVDRGARRAAGSGMQLRPPSGVLHAGAGTAQPVGSAPLAVLITGHDLEPIAAQV TGLAELDRLAKHPGAARPPIGHVPDCPHRAGSPDLAGGDDTGGVVQQGAQRTGRCRRGAQ RRRNDAKTQHARSRRREFEHITPRDRHMPQGTTKTTTVTLVSVVTDASHWQNTCMRPYRH RCGLGQAASPCDHYYGVIAYAPNGAMGKIVAPPHSRPGGYRRIRTLRRLSCKVLSNFTNY HGGVRRSRPLAEPGRATS"


http://www.ncbi.nlm.nih.gov/dtd/NCBI_BlastOutput.dtd">
<BlastOutput>
  <BlastOutput_program>blastp</BlastOutput_program>
  <BlastOutput_version>BLASTP 2.4.0+</BlastOutput_version>
  <BlastOutput_reference>Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a n
ew generation of protein database search programs", Nucleic Acids Res. 25:3389-3402.</BlastOutput_reference>
  <BlastOutput_db></BlastOutput_db>
  <BlastOutput_query-ID>Query_1</BlastOutput_query-ID>
  <BlastOutput_query-def>peptide <unknown description=""></BlastOutput_query-def>
  <BlastOutput_query-len>8</BlastOutput_query-len>
  <BlastOutput_param>
    <Parameters>
      <Parameters_matrix>PAM30</Parameters_matrix>
      <Parameters_expect>10</Parameters_expect>
      <Parameters_gap-open>9</Parameters_gap-open>
      <Parameters_gap-extend>1</Parameters_gap-extend>
      <Parameters_filter>F</Parameters_filter>
    </Parameters>
  </BlastOutput_param>
<BlastOutput_iterations>
<Iteration>
  <Iteration_iter-num>1</Iteration_iter-num>
  <Iteration_query-ID>Query_1</Iteration_query-ID>
  <Iteration_query-def>peptide <unknown description=""></Iteration_query-def>
  <Iteration_query-len>8</Iteration_query-len>
<Iteration_hits>
<Hit>
<Hit_num>1</Hit_num>
  <Hit_id>S507_scaffold13_size114854|S507_scaffold13_size114854_recno_56.0|(+)20770:21546</Hit_id>
  <Hit_def>S507_scaffold13_size114854|S507_scaffold13_size114854_recno_56.0|(+)20770:21546 Six_Frame_ORF</Hit_def>
  <Hit_accession>Subject_1</Hit_accession>
  <Hit_len>258</Hit_len>
  <Hit_hsps>
    <Hsp>
      <Hsp_num>1</Hsp_num>
      <Hsp_bit-score>20.5747</Hsp_bit-score>
      <Hsp_score>41</Hsp_score>
      <Hsp_evalue>0.000156356</Hsp_evalue>
      <Hsp_query-from>1</Hsp_query-from>
      <Hsp_query-to>8</Hsp_query-to>
      <Hsp_hit-from>237</Hsp_hit-from>
      <Hsp_hit-to>244</Hsp_hit-to>
      <Hsp_query-frame>0</Hsp_query-frame>
      <Hsp_hit-frame>0</Hsp_hit-frame>
      <Hsp_identity>5</Hsp_identity>
      <Hsp_positive>8</Hsp_positive>
      <Hsp_gaps>0</Hsp_gaps>
      <Hsp_align-len>8</Hsp_align-len>
      <Hsp_qseq>FTDFQGGV</Hsp_qseq>
      <Hsp_hseq>FTNYHGGV</Hsp_hseq>
      <Hsp_midline>FT+++GGV</Hsp_midline>
    </Hsp>
  </Hit_hsps>
</Hit>
</Iteration_hits>
  <Iteration_stat>
    <Statistics>
      <Statistics_db-num>0</Statistics_db-num>
      <Statistics_db-len>0</Statistics_db-len>
      <Statistics_hsp-len>0</Statistics_hsp-len>
      <Statistics_eff-space>2064</Statistics_eff-space>
      <Statistics_kappa>0.11</Statistics_kappa>
      <Statistics_lambda>0.294</Statistics_lambda>
      <Statistics_entropy>0.61</Statistics_entropy>
    </Statistics>
  </Iteration_stat>
</Iteration>
</BlastOutput_iterations>
</BlastOutput>

..... More specifically:

FTDFQGGV

FT+++GGV

FTNYHGGV

Any advice would be much appreciated!

Kind regards

Thys

blast • 1.6k views
ADD COMMENT
0
Entering edit mode

Not a solution to this particular problem but adding -task blastp-short to the blastp command could be tested as in described in the NCBI Blast help page

ADD REPLY

Login before adding your answer.

Traffic: 2827 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6