Question

Blastp against CAZy database

0

Entering edit mode

4.3 years ago

huiyus97 • 0

Hi,

I want to blast several proteomes against CAZy database on my terminal, and I downloaded the CAZy database from dbCAN2:

http://bcb.unl.edu/dbCAN2/download/

And here is my blast code after building my own database:

$blastp -query input.fasta -db cazydatabase.fa -evalue 1e-5 -outfmt 6 -out output -max_target_seqs 1

I got over 900 sequences in my result with 1e-5. However, other people's work showed 300-400 is the appropriate significant sequences number. Could someone please help me solve this problem?

Thanks in advance!

blasp cazy databse • 1.7k views

ADD COMMENT • link updated 4.3 years ago by Mensur Dlakic ★ 28k • written 4.3 years ago by huiyus97 • 0

0

Entering edit mode

I got over 900 sequences in my result with 1e-5. However, other people's work showed 300-400 is the appropriate significant sequences number. Could someone please help me solve this problem?

Your data does not need to show identical results as others. Results are a characteristics of the data going into the analysis. If your data was identical to what others have used (which I assume is not the case) then this would be a problem.

ADD REPLY • link 4.3 years ago by GenoMax 147k

0

Entering edit mode

Hi, Thank you for your response! I tested using the same proteome (i retrieved it from NCBI) which other people used in their paper, and the result differs a lot.

ADD REPLY • link 4.3 years ago by huiyus97 • 0

score 0 · Answer 1 · 2020-08-16

Most likely the reason for this problem is the same as in your other post: you are using -outfmt 6 instead of pairwise alignment. Since blast is a local aligner, it will often find multiple high-scoring pair segments (HSPs) between two proteins, rather than a single global alignment. If there are 3 HSPs between a query and its match, that counts as a single hit and will be shown as a single line in pairwise alignment output (though it will be shown as 3 alignments in the alignment part of the output). Since -outfmt 6 doesn't show alignments, that single hit will actually be shown as 3 lines. Even though you are asking only for a top hit with -max_target_seqs 1, it will often show multiple lines because of HSPs. As I suggested to you before, try removing -outfmt 6 from your command-line just to see how that output looks like.