Forum:Unexplored Space In Metagenomics
2
0
Entering edit mode
12.1 years ago
vijay ★ 1.6k

Dear all,

I just want to throw this topic for open discussion. We have humpty number of tools for performing Next generation sequence data analyses. However, I would like to get an opinion from this forum on what more can be done to address the needs from a bioinformatics point of view.

In simple words, what novelty could be shown in developing a tool that would address the unexplored areas in ngs/metagenomics analysis.

Do contribute with your opinions.

-vijay

metagenomics next-gen • 2.7k views
ADD COMMENT
1
Entering edit mode
12.1 years ago
Josh Herr 5.8k

I agree with Michael that the issue of database development and bias are very important issues in metagenomics at the moment. These are also issues that are currently being addressed by large research groups (and multiple large research groups working together in research consortia, here's just one example) with lots of resources ($$$).

I think right now there are a lot of groups focusing on the development of packages for metagenomic data analysis. Some examples include packages for general data analysis (MG-RAST, QIIME, MOTHUR, MEGAN) and there are newer platforms to incorporate post-clustering statistical analysis (Huttenhower Lab Tools, METAGENassist, Metastats). Another area is the development of assembly programs with the intent to assemble many genomes from metagenomic data (you can search for that stuff).

Right now, I think the tools are there and there are a lot of them, but they are not perfect, so I think there is room for improvement. The learning curve is high here as there is a lot to take into consideration when developing a metagenomics package. I also think there are a lot of people working in this area. If you are thinking of jumping into the arena, I think there could be a high risk for spending time in developing a project when there are so many other projects in development. Many packages are being developed by research groups, such as Rob Knight’s group working on QIIME, that have multiple people at work on multiple scripts and programs.

If you are not working with a large group, I think you would be most successful if you chose a specific problem that you could try to tackle on your own and become an expert in. Just one of many examples would be developing a Chimera checking program: There are already many out there, some of them quite good, but there is always the problem of adding speed, quality, reduction of false positives, etc., that can be developed into current algorithms and scripts.

When you become familiar in the field of metagenomics you will recognize areas which you think you can contribute to. If you are really interested in this area then you need to really dive into the literature to figure out where you can contribute.

ADD COMMENT
0
Entering edit mode
12.1 years ago
Michael 55k

The first contribution to this discussion could be to identify the 'unexplored areas', what did you exactly have in mind? Could you give examples?

However, when talking about metagenomics, we can take the term 'unexplored space' literally. What comes to mind is the problem of coverage of the natural sequence variability by current databases. When doing metagenome analysis based on sequence similarity, blast can only get hits to what is actually contained in the database. As a crude estimate from me at least 99.9% percent of naturally occurring sequence is still uncovered (e.g. unculturable) or no taxa assigned (e.g. env_nr). The coverage is even different for different taxa, e.g. viruses where there is not even a single common evolutionary clock (e.g. 16s) or even a common gene at all. Therefore, the largest hurdles for metagenomics are imo:

  • Lack of coverage in current sequence databases
  • Database bias (due to uneven ratio of coverage/natural variation between genomes: e.g. viruses (low coverage/high variation) vs. bacteria (bit better coverage/bit lower variation))

An important question would be to firstly get an estimate of how large these two quantities (lack of coverage and database bias) are, and how much they influence the analysis. As this is a forum entry I am not doing a complete literature search right now, but leave it to later or others to put some references.

ADD COMMENT

Login before adding your answer.

Traffic: 2538 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6