Question

incomplete (all-vs-all) blastp results

0

Entering edit mode

8.2 years ago

a.abnousi ▴ 30

I have a fasta file with 254 sequences. I created a blast database with masking and then ran the blastp using that database and the input fasta (with masking again). (commands shown below)

But in the results there are only 203 sequences used as query (while the subjects are correct 254). The output of my tabular blastp looks like this, while I would expect the last line to be "254 ... ... ... ...":

qry id, sbj id, % identity, length, mismatches, gap_opens, q_start, q_end, s_start, s_end, evalue, bit_score

1     30     29.42  673    409     19      4       643     7       646     2e-66   242
1     30     38.26  115    71      0       781     895     645     759     1e-22   106
1     185    27.99  661    350     20      289     889     322     916     2e-59   223
2     253    28.86  648    366     20      267     895     209     780     9e-58   216
.
.
.
203     16     41.30  293    148     3      607    895     529     801     2e-57   216
203    16     29.75  511    305     13     44      542     64      532     5e-40   162

Note that query sequence #2 is matched against sequence #253, but sequence #253 is not queried at all, the last sequence being 203.

I'm not sure if I'm expecting the right thing? Shouldn't the last line be sequence #254 queried against some matching subjects? (the sequences are mostly similar it is very unlikely that 204-254 don't align with anything). Or is this the correct result that I should have? If so, can you explain what happens to #204-#254? Thanks!

Here is how I have ran my blast:

./segmasker -in my_fasta.fasta -infmt fasta -outfmt maskinfo_asn1_bin -out my_seg_output.asnb

./makeblastdb -in my_fasta.fasta -input_type fasta -dbtype prot -mask_data my_seg_output.asnb -out my_db -title my_db

./blastp -query my_fasta.fasta -out my_fasta_blasted -evalue 1.0 -dbsize $db_size -max_hsps $hsps -seg "yes" -db_hard_mask 21 -db my_db -outfmt 6

blastp • 2.7k views

ADD COMMENT • link 8.2 years ago by a.abnousi ▴ 30

0

Entering edit mode

It turned out that some time ago I asked a similar question.

A: each protein with each protein

The answers may be helpful.

ADD REPLY • link 8.2 years ago by natasha.sernova ★ 4.0k

0

Entering edit mode

Thanks for your reply! I looked into that question but they are explaining how to do the all-vs-all blast, I have done that (I have additionally done masking using segmasker, which might have caused the problem!?).

ADD REPLY • link 8.2 years ago by a.abnousi ▴ 30

0

Entering edit mode

Look at this post:

A: How To Mask Low-Complexity Regions In Proteins?

I propose you may loose some proteins when you mask your data.

What happen when you omit masking?

ADD REPLY • link 8.2 years ago by natasha.sernova ★ 4.0k