Entering edit mode
7.7 years ago
adeel.maliks20
▴
10
I have a tab delimited file generated from blastp diamond.
I want to add the description of each protein against their accession number.
The file looks like this;
BalsamFir1001 gi|672108306|ref|XP_008784482.1| 53.3 1188 487 16 150 1299 32 1189 0.0e+00 1236.9
BalsamFir10022 gi|586769306|ref|XP_006856185.1| 43.9 471 227 12 73 522 23 477 2.1e-76 290.8
BalsamFir10042 gi|586694060|ref|XP_006843464.1| 84.6 468 58 1 16 483 9 462 4.4e-230 801.2
I fetch the accession number from above mentioned file and then i entered in batch entrez to get the description of these proteins. I got the output (.txt) which have irregular descriptions not sorted according to the input file. There are more than 16k description of proteins to add in this column.
File generated from Genbank looks like this;
1. translationally controlled tumor protein [Arabidopsis thaliana]
168 aa protein
NP_188286.1 GI:15228276
2. Pyridoxal phosphate (PLP)-dependent transferases superfamily protein [Arabidopsis thaliana]
194 aa protein
NP_188399.1 GI:15229510
What is the best way to solve this problem?
Shouldn't you be looking at protein database instead of genbank? https://www.ncbi.nlm.nih.gov/protein
Examples posted are XP_* records which are part of
nr
protein database.