Dear all, I have some NCBI nucleotide ID, about 4,000. I want to download it's protein sequence from NCBI. I know we can download the fasta format using "NCBI batch", but the problem is that I can not download the protein sequence using the nucleotide ID, unless I download that one by one, but it is impossible for 4,000 sequence....
So I just want to know if there is any method that I can transfer the NCBI nucleotide ID to protein ID? or if any resolution for that? the NCBI nucleotide ID looks like this: XM_017496492.1, and it's relative protein ID is: XP_017351981.1
Any advices will be greatly appreciated!
interesting question.
Would simply blasting the nucleotide sequences you are interested in against nrprot (or a subset of it if the required IDs are from a single species or some other taxonomic subset), be an option.
Hi, thanks for suggestions! I didn't address it clearly..this is actually a RNA sequencing data has a reference genome, I just find some interested gene and want to get the protein ID, then download the protein sequences, after that, I will use those protein sequences as the input file for Orthofinder. So in this case, maybe transfer the ID is much easier..?
OK, different issue indeed. I will leave it up to others to chip in here.
One remark though is that running things like Orthofinder on a subset of proteins will technically work but might (will?) bias the results. It's advisable to run those tools with an as complete set of proteins as you can.
EDIT : et voila, genomax has already provided a solution for this.