Extracting assembly accession number and protein ID using e-utilities
1
0
Entering edit mode
5.1 years ago

Hi Folks,

I am trying to use my list of protein IDs (e.,g VUX63899.1,QDO61010.1, QDO50771.1) to retrieve the assembly accession number.

Using the follow commands ..

epost -input teste.txt -db protein | elink -target nuccore -db protein | elink -target assembly |esummary | xtract -pattern AssemblyAccession -element AssemblyAccession > assemblyAccession.txt

However, I need two columns with assembly accession with its corresponding protein IF. I could not figure out yet what the command I need to add.

Any tips?

Thanks in advance.

sequence assembly genome e-utilities • 1.7k views
ADD COMMENT
0
Entering edit mode

Hi, did you ever solve this?

ADD REPLY
0
Entering edit mode

Hi Morgan,

Yes, I did. Thanks. Best,

ADD REPLY
4
Entering edit mode
2.3 years ago
GenoMax 147k

Using EntrezDirect:

$ more id
VUX63899.1
QDO61010.1
QDO50771.1

Since epost method does not keep track of the original ID, use the following:

$ for i in `cat id`; do printf ${i}"\t"; esearch -db protein -query ${i} | elink -target nuccore | elink -target assembly |  esummary | xtract -pattern AssemblyAccession -element AssemblyAccession; done
VUX63899.1      GCF_902167535.1
QDO61010.1      GCF_007113405.2
QDO50771.1      GCF_007113445.1
ADD COMMENT

Login before adding your answer.

Traffic: 2597 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6