Hi,
I have a protein sequence which I first do a blastp against NR database without any limitations. It returns a possible protein family and an organism in the output with a very low E-value (something like 3E-143). This all works fine, but I am told that there's a very high possibility that the sequence comes from a nematode. So I redo blastp now limiting the taxa to nematodes. I get just one decent hit, with the follwing header
ENA|CL652565|CL652565.1 PRI0115a_H01 - PRI0115a.B21 (762) Mixed stage fosmid library of P. pacificus var. California Pristionchus pacificus genomic, genomic survey sequence.
I don't understand this header completely (what does PRI0115a_H01 - PRI0115a.B21 (762) mean?), but from what I can tell it is not showing me the protein family, just the name of the organism. My question is, should the protein family differ from the first case (I tried all reading frames), it shouldn't right? So why is it not shown this time? So can I deduce the organism from my second query and the protein family from the first? I am kinda new to bio and bioinformatics and have been directly put into all this, so don't mind if I have gotten something completely wrong.
Adding one thing: if you're searching for protein families you should take a look at http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi
No, blastp and tblastn do not necessarily give the same result. When you search a protein database (blastp), you are searching sequences that the database provider believes to be genuine proteins. When you use tblastn, you are searching a 6-frame translation of a nucleotide database, which is different to a database of known (or putative) proteins.
Thank you for the reply. Yes, you are right. I was using tblastn and thinking it was blastp. But even in this case, tblastn tells me that there is a nematode nucleotide sequence similar to mine in the database (with a good E-value). But when I do a blastx with the same (nematode limited), I get no hits. What should I infer from this? The nematode sequence that tblastn showed doesn't encode anything?
@Michael Thanks, yes I did that. In fact blastp result includes a conserved domain analysis as well.
@neilfws Thanks. Yes, I was using tblastn thinking it was blastp. Let me rephrase what's happening. I have a sequence and first I do a blastp, which gives me a possible protein and the organism. I then do a tblastn on nematode database and get a hit for an organism. Then I change it to blastp on nematode database, and get no hits. What does this mean? Shouldn't the nematode for which I got a hit in tblastn also show here? Or it means that according to the database there is no nematode that encodes this protein?
Also I wanted to ask again, if blastp tells me that this sequence might encode for protein X (irrespective of the organism name blastp results in, I don't think the organism name that blastp gives is so important because the same protein might be encoded by multiple organisms). And a blastn tells me that this sequence might come from organism Y. Can I not say then say that my sequence might come from organism Y and encode for protein X?