Question

Help to filter blastp results

0

Entering edit mode

8.2 years ago

guillaume.rbt ★ 1.0k

Hi all,

I'm trying to blast two sets of protein against each other to find similarities.

I'm using this command to do so : blastall -d set1.fasta -i set2.fa -p blastp -m 9 -e 0.01 -o results.blast

As the two sets are from the same sepcies, I would like to filter results to get only > 99% identity matching sequences, and with query and subject of same length. After filtering for % of identity sometimes I get results like this one:

Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score

protein_1 protein_2 100.00 76 0 0 1 76 1 76 3e-46 154

protein_1 protein_2 100.00 76 0 0 77 152 1 76 3e-46 154

protein_1 protein_2 100.00 76 0 0 153 228 1 76 3e-46 154

protein_1 protein_2 100.00 76 0 0 229 304 1 76 3e-46 154

Here 4 parts of the protein 1 blast to the same sequence of protein 2. As I only want Hits with protein of the same length I would like to filter out those kinds of results, but I don't know how. Would anyone know a parameter that could do that, or a way to filter the result file?

Thanks,

blast filter fasta • 2.9k views

ADD COMMENT • link 8.2 years ago by guillaume.rbt ★ 1.0k

1

Entering edit mode

You don't have information of query and subject sequence lengths in that table so it's not possible. With blast+ you could include qlen and slen in your output rows. I don't know if you can do that with legacy blast..

ADD REPLY • link 8.2 years ago by 5heikki 11k

0

Entering edit mode

Thanks, it works well with blast+.

ADD REPLY • link 8.2 years ago by guillaume.rbt ★ 1.0k

0

Entering edit mode

How large are your two sets? Possibly its easier to make simple pairwise alignments of those proteins which have the same length. In Biopython you may use the pairwise2 module for this task (e.g. alignment = pairwise2.align.globalxx(seq1, seq2, score_only=True). For this example the score of the alignment should equal the lenght of the protein if the two proteins are 100% identical).

ADD REPLY • link 8.2 years ago by Markus ▴ 320