Question

Is there any written script to extract Organism name,SRA information and Assembly from accession id?

0

Entering edit mode

6.1 years ago

saadleeshehreen ▴ 140

Hi, I am working on 200 different organisms from NCBI. I have Accession id ( e.g. NZ_KQ956078.1) of those organisms but need the corresponding information too (such as SRA, Assembly number, Organism name). Getting that information from NCBI is easy but doing one by one is time killing. So, it would be helpful to have a script to obtain such information from NCBI at once. Is there any script available?

I have very little basic on shell scripting, so can't write my own script to do this job. Expert help is needed!

assembly NCBI extractinformation • 2.2k views

ADD COMMENT • link 6.1 years ago by saadleeshehreen ▴ 140

0

Entering edit mode

Have you check if eutils does this? You can use web-based eutils querying, the eutils command line or even the R package reutils.

ADD REPLY • link 6.1 years ago by Ram 44k

0

Entering edit mode

Hi, I was trying the following command to get information on Assembly and organism name, but it retrieved nothing. How can I map back the NZ id with the assembly? And any idea how can I download *assembly_report.txt" file?

elink -db nuccore -id "NZ_KQ956078.1" -target assembly|esummary|\
    xtract -pattern DocumentSummary -element AssemblyAccession

ADD REPLY • link updated 6.1 years ago by Ram 44k • written 6.1 years ago by saadleeshehreen ▴ 140

0

Entering edit mode

This works fine for me. What are you seeing?

ADD REPLY • link 6.1 years ago by Ram 44k

0

Entering edit mode

Just nothing. No error message but it produces no results. I expect it will retrieve the corresponding GCF id, but I didn't get that.

ADD REPLY • link 6.1 years ago by saadleeshehreen ▴ 140

0

Entering edit mode

Can you run the elink alone and check if you get results? Maybe then incrementally add each command in the pipeline.

ADD REPLY • link 6.1 years ago by Ram 44k

0

Entering edit mode

The output

-bash-4.2$ elink -db nuccore -id "NZ_KQ956078.1" -target Assembly

<ENTREZ_DIRECT>
    <Db>Assembly</Db>
    <WebEnv>NCID_1_111176213_130.14.18.34_9002_1540939702_1874716631_0MetA0_S_MegaStore</WebEnv>
    <QueryKey>2</QueryKey>
    <Count>1</Count>
    <Step>1</Step>
</ENTREZ_DIRECT>

ADD REPLY • link updated 6.1 years ago by Ram 44k • written 6.1 years ago by saadleeshehreen ▴ 140

0

Entering edit mode

incrementally add each command in the pipeline

like I said :-)

ADD REPLY • link 6.1 years ago by Ram 44k

0

Entering edit mode

I think most, if not all of that information would be available in the assembly_summary files. I’m not aware of any preexisting tools, but parsing that file would be fairly trivial as it is just a tabular file.

You could probably even read it in to Excel if you’re really that averse to scripting anything.^

^Disclaimer: I do not endorse the use Excel for bioinformatics.

ADD REPLY • link 6.1 years ago by Joe 21k

0

Entering edit mode

This should get you what you want: A: To get the name of the strains by searching assembly genome number GCF_

ADD REPLY • link 6.1 years ago by GenoMax 147k