Hi all,
this must be very basic, but still. I have a protein sequence for which I want to find homologs. I go to BLAST and do, for simplicity here, a regular BLASTp.
I know that blasting against refseq_protein or swissprot is common practice, but how about nr (non-redundant protein sequences)? This includes "All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF excluding environmental samples from WGS projects", and as far as I've seen, it includes not only hypothetical proteins, but also different instances of the same protein (e.g. different combinations of PDB chains, etc.)
Would you guys consider a BLAST search against nr a proper "finding-homologs" exercise?
Thanks!
Miquel
Thanks John. I want to see if there are "any" homologs (and, ideally, I'd like to find as many as possible). The problem I've found with "nr" is that sometimes I retrieve several instances of the same protein, perhaps with different lengths for whatever reason, which makes me doubt about its validity to find a proper collection of homologs.
Any idea about command line options for blasting against protein db of specific organism (e.g. Homo sapiens)
Thanks