Hi,
Which sequence should one consider if protein sequences of different length but with the same function are available for a particular locus tag in NCBI. For eg. if one searches for the locus tag Rv3399 in NCBI protein, two protein sequences of different lengths are available, which one should I consider for further studies the one with the longest sequence or the most recently annotated one? Also, which is the best method to get hold of all recently updated non-redundant proteins of a particular species through NCBI search or its ftp site? Does the data in NCBI ftp gets daily updated?
Thanks for replying and sorry for the late response as I was away for a while. Actually, I have to identify different class of enzymes in the whole genome, so every time I should look into the literature how that enzyme was identified. There would be then too many keywords to search for, do you know any way in which I can cover all the search terms programmatically?
I am sure that there are DBs where you can only look for proteins for which the function was experimentally validate. However I am not aware of them. You might want to open a new question.
Ok, thanks. I will then look for such databases.