Question

How To Retrieve Data From Jgi Automatically Given A Set Of Ids?

2

Entering edit mode

11.0 years ago

Manu Prestat 4.1k

Hi, I need to retrieve genomes and metagenomes (assemblies or raw sequences) from JGI DBs. It is doable (not easy though) using HTML forms and following links. However, I need to repeat this process hundreds of times, and I would appreciate to not waste my time anymore.

JGI does provide users with an (very brief) API documentation and an API XML schema (XSD) (usually understood only by Pierre alias @yokofakun ;-) ) and I cannot even make the curl "signing on" command work. Do you know a way to process this task automatically (e.g. using R, python, or any GNU tool...) given some IDs (like project or sample ID)?

Thanks, Manu

r python api xml • 6.8k views

ADD COMMENT • link updated 18 months ago by GenoMax 147k • written 11.0 years ago by Manu Prestat 4.1k

0

Entering edit mode

Hi glarue,

I have read the script "jgi-query.py" from https://github.com/glarue/jgi-query, but I don't understand it yet.

I want to download metagenomes from JGI using API.

Does your script work for downloading metagenomes from JGI?

Best, Bing

ADD REPLY • link 6.0 years ago by bison100 • 0

0

Entering edit mode

Geez, sorry to have missed this for so long—my notification settings must not be set up correctly.

The answer to your question depends on what you mean by "metagenome", and the way in which JGI structures its databases, although I fear the answer may be "no". Basically, you have to provide a category to jgi-query, and all of the files organized under that category will be listed. If you are interested in multiple fungal genomes, for example, you can use the query fungi to retrieve a (huge) list of all available files, and then download individual files from within that set (probably using the regex option r at the prompt). If the species you are interested in are not in fungi, you will have to experiment to identify a sufficiently broad query that includes everything you're interested in.

jgi-query was originally designed for grabbing files on a per-species basis. It can download large file sets, however, but how well that will work depends on your specific needs. Hope that helps clarify things.

ADD REPLY • link 5.5 years ago by glarue ▴ 70

Ram · Answer 1 · 2015-07-25

3

Entering edit mode

9.3 years ago

glarue ▴ 70

I know this is a late response, and it may not do exactly what you need, but feel free to check out a script I wrote to do something similar here: https://github.com/glarue/jgi-query

It's written in Python and runs from the command line. I haven't tested it on Mac or Windows, but it should (theoretically) work there as well as long as cURL and Python are installed.

Hope it helps, if you still need it!

EDIT: while jgi-query was designed primarily to download various files for a single organism, you can download very large datasets with it as well by using higher-level phylum names and range-formatted file selection syntax. For example, you can retrieve the entire fungal database with the command "jgi-query fungi", although selecting specific subsets of files can become onerous with large databases.

ADD COMMENT • link updated 2.1 years ago by Ram 44k • written 9.3 years ago by glarue ▴ 70

1

Entering edit mode

Hello @glarue

I am launching the script to retrieve all the fungal assembly sequences in fasta format, but it is showing me this error:

python3 jgi-query.py fungi

Retrieving information from JGI for query 'fungi' using command 'curl 'https://genome.jgi.doe.gov/portal/ext-api/downloads/get-directory? 
organism=fungi' -L -b cookies > fungi_jgi_index.xml'

% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                             Dload  Upload   Total   Spent    Left  Speed
 100    92    0    92    0     0      0      0 --:--:--  0:10:00 --:--:--    28

Traceback (most recent call last):
   File /JGI-db/jgi-query-main/jgi-query.py", line 1151, in <module>
   if not any(v["results"] for v in list(file_list.values())):
AttributeError: 'NoneType' object has no attribute 'values'

Do you have a solution?

ADD REPLY • link updated 18 months ago by Ram 44k • written 18 months ago by AbdelAbdel ▴ 30

0

Entering edit mode

Do not add answers unless you're answering the top level question. Use Add Comment or Add Reply instead. I've moved your post to the appropriate spot this time.

Also, I've removed my name as I did not write the answer you are asking clarification on. Plus, the answer is 8 years old so I would not expect any follow up on it.

ADD REPLY • link 18 months ago by Ram 44k

0

Entering edit mode

Script appears to have been last updated in Oct 2022 on GitHub so it appears to be reasonably maintained and current.

AbdelAbdel: Create an issue on GitHub in case you don't get an answer here.

ADD REPLY • link 18 months ago by GenoMax 147k