Question

Downloading Data From Mg-Rast

2

Entering edit mode

10.7 years ago

bioinfo ▴ 840

Does anyone familiar with downloading data from MG-RAST? I have more than 100 metagenome ids that need to be downloaded in an efficient way. I found this link at MG-RAST (http://api.metagenomics.anl.gov/1/api.html#download) but couldn't manage to download those 100 metagenomes using their ids (e.g. 4441908.3). I dont want to download one by one with individual ids as it will take ages..!!

parsing • 17k views

ADD COMMENT • link updated 2.4 years ago by Ram 44k • written 10.7 years ago by bioinfo ▴ 840

Ram · Answer 1 · 2014-03-08

4

Entering edit mode

10.7 years ago

5heikki 11k

I did something like this a while back:

cat keywordTableSortedUniqueIds.txt
mgm4440036.3
mgm4440037.3
mgm4440038.3
mgm4440039.3
mgm4440040.3
mgm4440041.3
mgm4440055.3
mgm4440056.3
..

while read line
do
curl http://api.metagenomics.anl.gov/1/download/"$line"?file=425.1 > $line.gz
done

The "file=XXX" part specifies what exactly you want to download from the given metagenome, e.g. 425.1 here specifies predicted rRNA.

ADD COMMENT • link 10.7 years ago by 5heikki 11k

0

Entering edit mode

that was very helpful. I have just gone through the MG-RAST manual but didn't get much info about the "download stages or file=xxx/stage=xxx". As you mentioned, file=425.1 for predicted rRNA, Do you know what file no. should I use for raw original submitted metagenome fasta sequences? I tried "file=100.2" but not sure if it is right..!!

ADD REPLY • link 10.7 years ago by bioinfo ▴ 840

0

Entering edit mode

Hey, I'm not sure you can gain access to the raw data by the api, however, I think file=100.2 contains the reads/contigs that passed quality filtering. There's probably also a file that contains the reads/contigs that didn't pass QC, so you could combine those if you really wanted them. You could always ask at the mg-rast mailing list..

ADD REPLY • link 10.7 years ago by 5heikki 11k

0

Entering edit mode

Thanks. Now I have decided to go for reads that passed QC filtering and dereplication stages..!!

ADD REPLY • link 10.7 years ago by bioinfo ▴ 840

0

Entering edit mode

Hi, Thanks for these details about how to download data from the MG-RAST api. Did you add your webkey to access data that is not public yet? Or were these public metagenomes? I tried adding my webkey:

curl -H "auth: XXX" http://api.metagenomics.anl.gov/1/download/"$line"?file=100.2

but I just get a summary of the file info (bp_count etc), and I'm unable to download the fasta file.

Thank you!
Katrine

ADD REPLY • link updated 4.9 years ago by Ram 44k • written 10.5 years ago by Katrine ▴ 20

0

Entering edit mode

For downloading raw data (data uploaded by MGRAST user as input data), use following: file=050.1 Or you can check for yourself how is the download address constructed by inspecting the "Download" button element and the url to which it leads.

Example: In the download page "http://www.mg-rast.org/mgmain.html?mgpage=download&metagenome=mgm4549958.3/MG_RAST_sub/BLANES_2010_cDNA_SURFACE_0.8.3__ILLUMINA.fna" go to the "Processing step" -> "0. Upload" and inspect the download button element on the right. Here it is "http://api-ui.mg-rast.org/download/mgm4549958.3?file=050.1"

ADD REPLY • link 4.8 years ago by al-ash ▴ 210

0

Entering edit mode

File 050.2 - This is the unfiltered metagenome that was originally uploaded to MG-RAST
File 100.1 - preprocess.passed.fna
File 100.2 - preprocess.removed (low quality)
File 350.2 & 350.3 - These are the protein coding genes (amino acids and nucleotides)
File 440.1 - These are predicted rRNA sequences (I do not recommend using MG-RAST for sensitive rRNA annotation. It does not use the internal structure of the gene, which other programs appropriately use for classification)
File 550.1 - This file shows clustered sequences which are 90% identical, to reduce the number of sequences that need to be annotated. Many folks don’t even know that this happens within MG-RAST.
File 650.1 & 650.2 - These files are essentially the blat tabular output from comparing your sequence to the database.

see example: http://metagenomics.anl.gov/metagenomics.cgi?page=DownloadMetagenome&metagenome=4447943.3

http://api.metagenomics.anl.gov/1/download/mgm4447943.3

ref: http://adina-howe.readthedocs.io/en/latest/mgrast/index.html

ADD REPLY • link 6.4 years ago by Zhilong Jia ★ 2.2k

0

Entering edit mode

Hello, what language is this besides curl? I have a windows computer and am using curl through the command prompt, and would like to do something similar to what you described.

ADD REPLY • link updated 2.5 years ago by Ram 44k • written 9.7 years ago by kbrannen • 0

1

Entering edit mode

It's Bash. I presume you could get Bash working on Windows with Cygwin or something but I haven't used Windows since XP so I can't really say. Alternatively, it probably doesn't take much effort to make a simple while read line loop with something that works on Windows by default like Python (?) or Java (?). If you plan to do lots of bioinformatics in the future, I suggest you ditch Windows for Linux or OS X.

ADD REPLY • link 9.7 years ago by 5heikki 11k

0

Entering edit mode

Update- I downloaded Git Bash for windows, and I think I am having success using bash and curl with the command you listed. I am not very experienced with the Bash language yet, so I haven't added in any echo tests to see if the script is working the way I think it is. I do agree with you that windows is a hassle, but many times there are work-arounds. I will continue down this line for the people who use Windows and can not afford to/do not want to switch operating systems.

ADD REPLY • link updated 2.4 years ago by Ram 44k • written 9.7 years ago by kbrannen • 0

score 0 · Answer 2 · 2018-07-05

0

Entering edit mode

6.4 years ago

Dattatray Mongad ▴ 380

I have a question... I have downloaded project files using MG-RAST tools mg-download.py --project projectid

But, now I want metadata for particular project (which file corresponds to which sample type?). How to download metadata?