Extracting multiple features using NCBI's e-utilities?
1
0
Entering edit mode
6.6 years ago
ThePresident ▴ 180

I have a list of protein accession identifiers such as "CBE06962.1". I would like to automatically extract several features such as locus_tag, start and stop positions of the corresponding genes, UniProt tags etc. Is it possible to do it with by combining esearch and efetch from e-utilities, something like:

esearch -db protein -query "CBE06962.1" | efetch ???

If not, I am thinking of downloading all gbanks files and then parsing it for the info I want.

Any suggestions? Thanks in advance,

TP

e-utilities perl • 1.7k views
ADD COMMENT
3
Entering edit mode
6.6 years ago
GenoMax 147k

What about this report:

efetch -db protein -id "CBE06962.1" -format ipg

Id  Source  Nucleotide Accession    Start   Stop    Strand  Protein Protein Name    Organism    Strain  Assembly
18688109    INSDC   FN545816.1  3730243 3731418 -   CBE06962.1  sensor protein  Clostridioides difficile R20291 R20291  GCA_000027105.1
18688109    INSDC   FN538970.1  3649468 3650643 -   CBA66243.1  sensor protein  Clostridioides difficile CD196  CD196   GCA_000085225.1
ADD COMMENT
0
Entering edit mode

It's a good start, I can at least have the start/stop positions, the strand and nucleotide accession. One thing that would be really useful is a locus_tag.

Thanks, this is still pretty good though.

TP

ADD REPLY
1
Entering edit mode

A single clean solution may be possible but at least this will get you started

efetch -db protein -id "CBE06962.1" -format gp | grep locus
                         /locus_tag="CDR20291_3124"

Since this brings back a GenPept format record you can grep for several other pieces of information.

efetch -db protein -id "CBE06962.1" -format gp | grep -e "locus" -e "coded"
                     /locus_tag="CDR20291_3124"
                     /coded_by="complement(FN545816.1:3730243..3731418)"
ADD REPLY
0
Entering edit mode

Many thanks, this looks pretty good.

ADD REPLY

Login before adding your answer.

Traffic: 3025 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6