Hi
[I'm a newbie in bioinformatics, my apologies for misusage of terms, if any..]
I need to decide which resource to use, to download many species full protein fasta files, in order to run many blastp queries for all human proteins against each of the species. I would like to download most of the Eukaryotes species files that exist. I checked some species from both Ensembl and NCBI latest releases, and saw that there are big differences between them.
For example, when I downloaded the protein fasta file of "Otolemur garnettii", The Ensembl fasta has 19986 proteins, whereas the NCBI fasta has 26925. When running a sample blastp for some human protein sequence against each of these protein files (after running makeblastdb of course), the highest bitscore is very different between Ensembl & NCBI.
Also, when I run blastp for the same species, Ensembl vs NCBI and vice versa, I get > 1000 proteins with %identity < 30, which I understand as proteins that exists in one resource and not in the other one (?)
I know they use different gene annotation methods, so it makes sense there are differences, but my question is, did you have experience with working with both resources, and do you have any recommendations, which resource to chose to work with?
Thanks a lot,
Idit
Uniprot has built a database of reference proteomes for most organisms sequences today: http://www.uniprot.org/proteomes/
Thanks, I downloaded all the Eukaryotes I needed from the UniProt FTP site