Question

Parsing a Megan (RMA) File

0

Entering edit mode

8.4 years ago

irazoqui.matias ▴ 10

Hi,

I having a problem with MEGAN 5. I'm working with some quite large RMA files (40 Gb aprox), built using the trimmed reads and a blast run. The problem is that my (poor little) PC always hangs up when I try to open them. So, I was wondering if there's a way to parse a file via Megan command line, dividing it in "Bacteria", "Archaeas" & "Viruses" or whatever, so the files become a little bit smaller. Thanks!

Megan Metagenomics • 4.7k views

ADD COMMENT • link updated 20 months ago by Ram 44k • written 8.4 years ago by irazoqui.matias ▴ 10

score 0 · Answer 1 · 2016-07-13

0

Entering edit mode

8.4 years ago

Charles Warden 8.3k

I'm not very familiar with MEGAN, but I would imagine a BLAST search could get quite time-consuming if searching a database of all known metagenomics sequences, especially if you have over 1 million reads.

Did you amplify ribosomal RNA sequences? If so, these are some programs that should provide less computationally intensive options:

1) RDPclassifier (web-based or local .jar file) - https://rdp.cme.msu.edu/classifier/classifier.jsp

2) mothur - http://www.mothur.org/

3) QIIME - http://qiime.org/

They don't work with RMA files, but you must have had some sort of sequence to produce the RMA file. If you have .fastq files, mothur and QIIME can take those as an input (and convert to .fasta file, if you wanted to try the RDPclassifier as a standalone tool).

ADD COMMENT • link 8.4 years ago by Charles Warden 8.3k

0

Entering edit mode

Thanks Charles for replying. Yes, I have the 16S sequences and I've already used QIIME. But now I'm working on the WGS reads and I wanted to do another taxonomical analysis (beacuse by using 16S, you leave behind viruses and eukaryotas). That's why I tried MEGAN. For the BLAST part, I used DIAMOND, which is waaay faster than regular BLAST (although, each search takes almost a day). I got that one covered, but the results I get are killing my PC (still waiting for budget approval to buy a new one...)

ADD REPLY • link 8.4 years ago by irazoqui.matias ▴ 10

0

Entering edit mode

Ok - I haven't tested any of the following programs, but it is possible that it might help you to use a different method to quantify species abundances that doesn't depend upon your BLAST file:

http://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0997-x

http://www.ccb.jhu.edu/software/centrifuge/

This one is really for transcriptomes, but it might still work:

https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0969-1 http://taxonomer.iobio.io/

ADD REPLY • link 8.4 years ago by Charles Warden 8.3k

0

Entering edit mode

Hi Did you find out any solution to this problem? I am facing the same problem. My rma files are almost 2 GB to 3GB in size but still, my pc hangs. If you found any solution please explain because that would be really helpful.

Thanks

ADD REPLY • link 3.6 years ago by serene.s • 0

0

Entering edit mode

I am not sure what to tell you about RMA files, since I don't typically work with Megan.

In general, there was some public eDNA re-analysis where I tried out various options:

https://github.com/cwarden45/PRJNA513845-eDNA_reanalysis/blob/master/metagenomics/README.md

Running MEGABLAST does take a while, even with prioritizing more highly expressed sequences (unless you go even further). In that situation, I was specifically looking for artifacts, so I specifically was trying to look at less common things.

However, in other situations, maybe looking something like those present at >1% (or even >1/10,000, for identical sequences) might help?

Also, in that situation, the SRA has some taxonomy assignments.

Assuming that you don't have human reads without consent for public deposit (or you filter the human reads), the SRA has some taxonomy assignments. For that eDNA project, you can see some selected notes here:

https://github.com/cwarden45/PRJNA513845-eDNA_reanalysis/blob/master/extended_summary.xlsx (if you download the file to view locally).

In other words, if you haven't already deposited your data in the SRA, that is generally a good practice and might be helpful for analysis in some situations?

ADD REPLY • link 3.6 years ago by Charles Warden 8.3k