Blastp against CAZy database
1
0
Entering edit mode
4.3 years ago
huiyus97 • 0

Hi,

I want to blast several proteomes against CAZy database on my terminal, and I downloaded the CAZy database from dbCAN2:

http://bcb.unl.edu/dbCAN2/download/

And here is my blast code after building my own database:

$blastp -query input.fasta -db cazydatabase.fa -evalue 1e-5 -outfmt 6 -out output -max_target_seqs 1

I got over 900 sequences in my result with 1e-5. However, other people's work showed 300-400 is the appropriate significant sequences number. Could someone please help me solve this problem?

Thanks in advance!

blasp cazy databse • 1.7k views
ADD COMMENT
0
Entering edit mode

I got over 900 sequences in my result with 1e-5. However, other people's work showed 300-400 is the appropriate significant sequences number. Could someone please help me solve this problem?

Your data does not need to show identical results as others. Results are a characteristics of the data going into the analysis. If your data was identical to what others have used (which I assume is not the case) then this would be a problem.

ADD REPLY
0
Entering edit mode

Hi, Thank you for your response! I tested using the same proteome (i retrieved it from NCBI) which other people used in their paper, and the result differs a lot.

ADD REPLY
0
Entering edit mode
4.3 years ago
Mensur Dlakic ★ 28k

Most likely the reason for this problem is the same as in your other post: you are using -outfmt 6 instead of pairwise alignment. Since blast is a local aligner, it will often find multiple high-scoring pair segments (HSPs) between two proteins, rather than a single global alignment. If there are 3 HSPs between a query and its match, that counts as a single hit and will be shown as a single line in pairwise alignment output (though it will be shown as 3 alignments in the alignment part of the output). Since -outfmt 6 doesn't show alignments, that single hit will actually be shown as 3 lines. Even though you are asking only for a top hit with -max_target_seqs 1, it will often show multiple lines because of HSPs. As I suggested to you before, try removing -outfmt 6 from your command-line just to see how that output looks like.

ADD COMMENT

Login before adding your answer.

Traffic: 1947 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6