Hi, I need to retrieve genomes and metagenomes (assemblies or raw sequences) from JGI DBs. It is doable (not easy though) using HTML forms and following links. However, I need to repeat this process hundreds of times, and I would appreciate to not waste my time anymore.
JGI does provide users with an (very brief) API documentation and an API XML schema (XSD) (usually understood only by Pierre alias @yokofakun ;-) ) and I cannot even make the curl "signing on" command work.
Do you know a way to process this task automatically (e.g. using R, python, or any GNU tool...) given some IDs (like project or sample ID)?
Geez, sorry to have missed this for so long—my notification settings must not be set up correctly.
The answer to your question depends on what you mean by "metagenome", and the way in which JGI structures its databases, although I fear the answer may be "no". Basically, you have to provide a category to jgi-query, and all of the files organized under that category will be listed. If you are interested in multiple fungal genomes, for example, you can use the query fungi to retrieve a (huge) list of all available files, and then download individual files from within that set (probably using the regex option r at the prompt). If the species you are interested in are not in fungi, you will have to experiment to identify a sufficiently broad query that includes everything you're interested in.
jgi-query was originally designed for grabbing files on a per-species basis. It can download large file sets, however, but how well that will work depends on your specific needs. Hope that helps clarify things.
I know this is a late response, and it may not do exactly what you need, but feel free to check out a script I wrote to do something similar here: https://github.com/glarue/jgi-query
It's written in Python and runs from the command line. I haven't tested it on Mac or Windows, but it should (theoretically) work there as well as long as cURL and Python are installed.
Hope it helps, if you still need it!
EDIT: while jgi-query was designed primarily to download various files for a single organism, you can download very large datasets with it as well by using higher-level phylum names and range-formatted file selection syntax. For example, you can retrieve the entire fungal database with the command "jgi-query fungi", although selecting specific subsets of files can become onerous with large databases.
I am launching the script to retrieve all the fungal assembly sequences in fasta format, but it is showing me this error:
python3 jgi-query.py fungi
Retrieving information from JGI for query 'fungi' using command 'curl 'https://genome.jgi.doe.gov/portal/ext-api/downloads/get-directory?
organism=fungi' -L -b cookies > fungi_jgi_index.xml'
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 92 0 92 0 0 0 0 --:--:-- 0:10:00 --:--:-- 28
Traceback (most recent call last):
File /JGI-db/jgi-query-main/jgi-query.py", line 1151, in <module>
if not any(v["results"] for v in list(file_list.values())):
AttributeError: 'NoneType' object has no attribute 'values'
Do not add answers unless you're answering the top level question. Use Add Comment or Add Reply instead. I've moved your post to the appropriate spot this time.
Also, I've removed my name as I did not write the answer you are asking clarification on. Plus, the answer is 8 years old so I would not expect any follow up on it.
Hi glarue,
I have read the script "jgi-query.py" from https://github.com/glarue/jgi-query, but I don't understand it yet.
I want to download metagenomes from JGI using API.
Does your script work for downloading metagenomes from JGI?
Best, Bing
Geez, sorry to have missed this for so long—my notification settings must not be set up correctly.
The answer to your question depends on what you mean by "metagenome", and the way in which JGI structures its databases, although I fear the answer may be "no". Basically, you have to provide a category to
jgi-query
, and all of the files organized under that category will be listed. If you are interested in multiple fungal genomes, for example, you can use the queryfungi
to retrieve a (huge) list of all available files, and then download individual files from within that set (probably using the regex optionr
at the prompt). If the species you are interested in are not in fungi, you will have to experiment to identify a sufficiently broad query that includes everything you're interested in.jgi-query
was originally designed for grabbing files on a per-species basis. It can download large file sets, however, but how well that will work depends on your specific needs. Hope that helps clarify things.