I was doing command line blast for around 40000 sequences. I downloaded the protein databases for Arabidopsis thaliana, viridiplantae and swissprot. Then I did the blastp against all these databases. I found that 46 sequences have no blast hit against Arabidopsis database 96 sequence for viridiplantae and 159 sequences for swissprot database.
I wonder how the no of sequences has increased in viridiplantae from 46 to 96 while it contains all the proteins of plants including the Arabidopsis proteins and similarly these numbers increased in swissprot from 96 to 159 while swissprot contains all the proteins including the viridiplantae.
Now the question is how it is possible that a sequence have blast hit in the Arabidopsis database and the same sequence have no blast hit in viridiplantae and swissprot database.
Is there something wrong with the blast?
@tcf.hcdg: and note that BLAST always has e-Value cutoff set! It defaults to 10 which is a very bad hit. I just say this to prevent you from writing a statement like "I do not have a cutoff". ;-)
@michael: i think you missed "cutoff" at the end of "... and thereby fewer significant hits at the same e-value"
thanx, fixed
Thank you! I did not meant to be picky but people frequently get confused when talking/reading about p/e-Values. Therefore, I like it to be precise ;-)
Yes in all three cases I have the same cutoff value and same parameters for blast.
Did you had a look at the "unmatched" sequences? How good do they match when blasting against the Ath database?