Question

What Is Your Experience With Gmod Tools Or Alternatives?

10

Entering edit mode

12.8 years ago

Michael 55k

I am going to implement a (non)-model-organism (does that matter) genome database very soon. It will present a newly sequenced animal genome with annotation to the community. We are looking for software that will support the development of such a tool without 're-inventing the wheel'.

We will need a genome-browser for sure and a database backend for the annotation data. We also will have use for integration of

ontologies (anatomy, developmental stage, etc.)
phenotype-genotype data, including images of phenotypes and stages
authentication/authorization
some users should be able to contribute by doing remote annotation
RNA-seq and other experimental data should be visualized as tracks
there should be private and public tracks for the genome
Blast interface to the genome

So I thought the GMOD tools will do the trick, as what we are trying to build is pretty much similar to FlyBase or WormBase, etc. Now my questions, excuse me if it is a bit vague:

How good is GMOD for this purpose and what are your experiences with it mainly as a developer? Are there good and viable alternatives to it (e.g. UCSC Genome browser + their data model)? I was planning to use G/JBROWSE and the CHADO database models, is that a good idea? Can such a system be administered by 1-1.5 persons?

We are still in the phase of summing up project requirements, but please contribute your valuable experiences, every input counts!

non database • 10k views

ADD COMMENT • link 12.8 years ago by Michael 55k

0

Entering edit mode

Good question, I'm very interested in the answers. I've heard that Jbrowse is preferred over Gbrowse, even by the Gbrowse developer. May be a good idea to focus on Jbrowse.

ADD REPLY • link 12.8 years ago by Qdjm 1.9k

0

Entering edit mode

@qdjm: I will post my progress on the topic as I go along, so just have a look from time to time. I will also put a link to the page here as soon as I can (and are allowed to do so).

ADD REPLY • link 12.8 years ago by Michael 55k

0

Entering edit mode

I don't think I'd put it that way exactly, qdjm. JBrowse is definitely the future of GMOD genome browsers, but it is no where near as feature complete as GBrowse yet. Generally, I suggest that people install both, as you will generally need GBrowse for the functionality it provides, and once you have that installed, installing JBrowse is quite easy, and you can provide both to your users, as different people like different things.

ADD REPLY • link 12.8 years ago by Scott Cain ▴ 770

0

Entering edit mode

I was thinking about setting up a BioDAS server too, that would allow to connect any genome browser that supports it. Would that be an option?

ADD REPLY • link 12.8 years ago by Michael 55k

0

Entering edit mode

If you want to go the DAS route, I really like the dalliance genome browser: http://www.biodalliance.org/. I come from a web developer background, so I really enjoy the interactive javascript stuff.

ADD REPLY • link 12.8 years ago by Damian Kao 16k

7

Entering edit mode

12.8 years ago

Taner Sen ▴ 80

Hi Michael,

I am the lead author in the "Choosing a genome browser for a Model Organism Database: surveying the Maize community" paper. Not everybody in the paper worked on implementing the GMOD tools directly. Some are curators, for example. Also, because we connected elements in the MaizeGDB Genome Browser to appropriate MaizeGDB pages, we also acknowledged Programmers/Database administrators working on the main MaizeGDB site as well. I was the main person who did the GBrowse implementation; but note that our web developer established BLAST tools separately, and we already have a separate curation server, which is not connected to GBrowse MySQL backend. We decided to keep Oracle database (the MaizeGDB main backend database), and GBrowse MySQL separate; and transferred data from time to time from one database to another when we needed to do so.

So the answer as to how many people you'll need: I think 1-2 programmers are sufficient for a small-scale project; you'll be able to do what you want to do with GBrowse2. You can easily set up a BLAST server separately, like we did. But with more people, you can add more functionalities to your site, etc.

It is also possible for you to use GBrowse system separately and direct the users to pages that use a different backend database like we do: so if you want to opt out of Chado, you can. Or a separate server that uses the same database. If you follow that route, setting up GBrowse and connecting to MySQL is usually straightforward and the scripts are already available to transfer gff files to MySQL. If you have any problem at that stage, the GMOD community will be more than happy to help you. After you do this, then you can assume your web developer mode and start working on better looking pages for your genetic elements; setting up BLAST server etc.

ADD COMMENT • link 12.8 years ago by Taner Sen ▴ 80

0

Entering edit mode

I'm impressed, that's community spirit. I'll take this as a serious affirmation that we are on the right track. We can likely not afford an Oracle database for now, but the concept of keeping two instances of the annotation one for curation and one for presentation is very promising, too.

ADD REPLY • link 12.8 years ago by Michael 55k

5

Entering edit mode

12.8 years ago

Gjain 5.8k

Hi Michael,

You may look for Manatee from IGS:

Manatee is a web-based tool used to perform manual functional annotation. It has been specifically designed to optimize the ability of curators to evaluate all available sequence-based and experimental data to assign the best possible annotation to a given gene product.

It is an open source tool from IGS:

At IGS we have developed a version of Manatee that uses the chado relational database schema; the schema developed by the Generic Model Organism Database GMOD group and which is the standard used by many bioinformatics tools (such as Apollo and Artemis). This version of Manatee includes several tools and features not found in the original software. These include: the ability to automatically create Gene Ontology association and GenBank files, the availability of downloadable annotation and sequence files, and the ability to Blast sequences against the predicted proteins, predicted coding sequences, or whole genome sequence of your organism.

Using CHADO database is a good idea. Its schema is pretty comprehensive and there is support available from GMOD.

Please let me know if you need more details.

ADD COMMENT • link updated 5.2 years ago by Ram 44k • written 12.8 years ago by Gjain 5.8k

0

Entering edit mode

This might solve the 'remote annotation part' very fine, especially in combination with CHADO. Do you think a genome browser like Gbrose/Jbrowse, might be good in addition for pupose of 'read-only' browsing, or will Manatee do fine as a Browser for genome tracks. Thank you very for your feed-back!

ADD REPLY • link 12.8 years ago by Michael 55k

0

Entering edit mode

Michael,

Manatee is essentially for annotating genomes. While it does provide a genome viewer, it's really only used for gene context. There is no ability to load/display user tracks. As for the future, I wouldn't count on us providing that functionality. Manatee is an old man and I (i.e. IGS) will not be dedicating a tremendous amount of resources adding new features :) Look for us to start work on a new tool that integrates many of the new datatypes out there with Chado as a back end.

Thanks!

todd

ADD REPLY • link 12.8 years ago by Todd Creasy • 0

0

Entering edit mode

thank you Todd for the insights and future directions.

ADD REPLY • link 12.8 years ago by Gjain 5.8k

5

Entering edit mode

12.8 years ago

Yannick Wurm ★ 2.5k

Beyond the tools that have already been mentioned, the Rover genome browser seems promising:

http://chmille4.github.com/Rover/

And an idea to get up and running quickly is http://webgbrowse.cgb.indiana.edu/ (automagic set up of a hosted gbrowse)

Also, keep in mind that many users don't need the complexity of a full-fledged genome browser. For many, it may be sufficient to just "run BLAST". I've been contributing to the development of a simple way of setting up a bare bones blast server. This can also represent a quick-and-dirty sufficient solution while something bigger-scale is being set up...

Cheers, y

ADD COMMENT • link 12.8 years ago by Yannick Wurm ★ 2.5k

3

Entering edit mode

12.8 years ago

Ahdf-Lell-Kocks ★ 1.6k

GMOD basic installation used to be relatively easy, and one would be able to have it running for simple fasta files with chromosomes and their annotations in BED/GFF files in an afternoon. I do remember back in the days that other more complicated data structures would take a long time to hook in, but it may have changed for the better lately.

ADD COMMENT • link 12.8 years ago by Ahdf-Lell-Kocks ★ 1.6k

3

Entering edit mode

12.8 years ago

Michael 55k

As I said in the beginning, I'll try to give feedback on my progress and experiences as I go along. I group this by components.

GBrowse2

I started with a simple test installation of GBrowse2 on my Macbook. The test installation took about 3 hours, most of the time spent installing dependencies using CPAN and MacPorts. I followed the instructions in the HowTo. An important point to note: the instructions are quite good, and you should follow them as close as possible! At times I messed up but then I noticed I didn't read carefully and trying to jump ahead.
GBrowse is written in Perl and requires BioPerl, other requirements include Apache2 (installed via MacPorts), MySQL (don't install via Macports, compiling takes too long, download it). I needed to add 2 lines to my apache.conf, copy some files and restart the web-server.

After that, I had a local install up, and spent another day playing with the configuration, and working through the configuration tutorial, which I also recommend. GBrowse is highly flexible and configurable with respect to appearance and tracks displayed, everything is done using configuration files, no programming necessary so far.

The easy install is based solely on files and 'in-memory' data; while that works fine for small (test-data) sets, it doesn't scale to a >100 Mbp real genome for me. Next step is to make an install on a dedicated server with a data-base backend.

As the configuration files for GBrose and your project data are the artifacts that contain most modifications and adaptations, it might be advisable to put them under revision control. The easiest way is to use RCS in-place, but once the install gets larger and many people work on it, putting the whole configuration directory tree into revision control (git, SVN) might be preferable.

Installing CHADO

Chado is a database model for genomic data. It is meant to be used with PostgreSQL (recommended to use postgres 8.4 though I am testing with postgres 9.1). I had experience with MySQL before, but I took a little time to figure out what is different. Chado is bundled with an installer like a normal Perl module:

> perl Makefile.PL
> make
> make install
> make prepdb... etc.

Installation took about 3 days, but only because there was a single missing dependencies, I sent a support mail and got an answer within few hours. The problem should be fixed in the documentation and now installation should procede without problems.

Loading Data in CHADO

Possibly the most complicated part, there is a bulk loader script but after one week I haven't succeeded loading a single GFF3 file in the database. This is mainly because of format problems in the GFF3 files. First I tried with Daphnia pulex gff3 files (from JGI) then with D. melanogaster (from NCBI genomes). Both files will need repairs to be loadable into chado. That might be mainly due to the weak format definition of GFF3.

Done, I finally managed to import the GFF3 annotation file of Daphnia pulex (FrozenGeneCatalogue). It needed edits though, found via trial and error:

Needed to edit sequence type terms to comply with the Sequence Ontology (e.g. three_prime_utr vs. three_prime_UTR)
All source sequences (chromosomes, contigs, scaffolds) need to be contained in the GFF file, wrote a perl script that generates GFF3 source entries from the genome fasta file and concatenates with the original file.

Lessons learned: most annotation files will likely need sanitizing, thus some scripting capabilities (perl, python, awk) and good understanding of formats is required to work with chado.

ADD COMMENT • link 12.1 years ago by Michael 55k

1

Entering edit mode

Out of curiosity, why did you install Apache via MacPorts instead of using the Apache that is already installed on your Macbook? If it says to do so in the directions then there may be a good reason, I'm just interested to know.

ADD REPLY • link 12.7 years ago by SES 8.6k

0

Entering edit mode

Of course, now the question arises, what is a reasonable specification for such a dedicated system.

ADD REPLY • link 12.8 years ago by Michael 55k

0

Entering edit mode

@SES, for the same reason I don't use the system perl, I don't want to mess with the system installed software (I have less control over update, might eventually break stuff needed elsewhere), and I don't like how apple messed up bsd paths (/Library/WebServer/...).

ADD REPLY • link 12.7 years ago by Michael 55k

1

Entering edit mode

@Michael, this is really great advice and I would also like to point out that a major advantage to using your own Perl (or Apache) is so the system updates do not clobber your own configuration. After working with GBrowse on a Mac for a little over 4 years I think it is probably inevitable that an update (trying to keep up with the OS upgrades for example will definitely wipe out your old Perl configuration) will cause problems. So, I rescind my previous statement! After having gone through this a few times I highly recommend everyone trying to build a framework that is isolated as possible from the system updates, whether you are on a Mac or Linux. I guess this is why perlbrew was invented.

ADD REPLY • link 12.6 years ago by SES 8.6k

0

Entering edit mode

@Michael, that is definitely a good reason but it is probably overkill /unnecessary for testing a GBrowse installation. See the HOWTO for instructions: http://gmod.org/wiki/GBrowse_MacOSX_HOWTO

ADD REPLY • link 12.7 years ago by SES 8.6k

score 8 · Accepted Answer · 2012-02-06

Hi Michael,

While I obviously have a biased opinion, I thought I'd at least share some information on what can be done with GMOD tools. I'll take your bullet points and give some explanation or guidance on each:

ontologies (anatomy, developmental stage, etc.) -- Chado is based on the use of ontologies and controlled vocabularies; that's where it gets its considerable flexibility. Tripal (which I will refer to a lot here) is a web front end for Chado that is based on Drupal that also has ontology awareness.
phenotype-genotype data, including images of phenotypes and stages -- Chado has a module for genotypes and phenotypes. While I don't think the Tripal modules that work with these Chado modules is publicly available yet, the Tripal group is actively working on them. I don't know much about what they are supporting interns of images, but I'm reasonably sure that they are supported.
authentication/authorization -- The nice thing about basing Tripal on Drupal is that it has all of the user stuff already built in.
some users should be able to contribute by doing remote annotation -- Apollo and Artemis are good for this for sequence annotation, and when WebApollo is available (based on JBrowse), it will be even better. Tripal allows authorized users to make changes to text-based annotations.
RNA-seq and other experimental data should be visualized as tracks -- GBrowse and JBrowse both support displaying nextgen sequence data via SAMtools (though I'm not positive the JBrowse support is public yet).
there should be private and public tracks for the genome -- GBrowse supports user authentication as well, and tracks can be configured to either be public or shown only to authorized users. The data upload functionality of GBrowse works the same way--it can be public, private, or shared with select users.
Blast interface to the genome -- This is a main missing item for the GMOD community. A tool called Mimosa was recently developed, though it hasn't been released and it's future is somewhat uncertain at the moment (due to developers' comings and goings).

What is important to realize about GMOD is that it is a collaborative community. Each one of the tools I mentioned above are developed by different groups, though we all communicate about what were doing and how to best interoperate. In fact, two good places to learn about the community are at community meetings (the next one is in DC in April and the GMOD Summer Schools (the next one will likely be in August in North Carolina, but the details aren't firm yet).

score 7 · Accepted Answer · 2012-02-06

7

Entering edit mode

12.8 years ago

Mary 11k

*cough*

Have you asked your main local end users what they prefer? I mean, of course they want everything, but they might have some preference for one or the other.

But I will also point you to an interesting survey that was done by the MaizeGDB folks when they were considering this question too: Choosing a genome browser for your organism…

And they put it all into a paper too. Choosing a genome browser for a Model Organism Database: surveying the Maize community

ADD COMMENT • link 12.8 years ago by Mary 11k

0

Entering edit mode

Thank you for the paper recommendation, I will study it. No, I haven't met the users yet, but will soon. I guess it is good to ask them what functionality they want, for a proper specification. Some of the points mentioned in my requirement list are quite clear as some users will know genome-browsers, so they will want one. Otherwise, as you mention, if I'd show them all probable features, they might want all features,for sure. I guess it's a matter of prioritizing. But I wouldn't want to ask my users which toolkit or genome browser they prefer, if they don't come up with one. sneeze

ADD REPLY • link 12.8 years ago by Michael 55k

0

Entering edit mode

Btw, it looks like the MaizeGDB was set up by a large group of people. Have you taken part in this paper? If so I would be highly interested in how many core programmers were working on it? If it was set up by 7+ people (n authors), then my milage may vary, because I'm more or less alone on this. But sometimes it might ofc be like mainly one good programmer is doing most of the stuff, while others 'donate' data and others 'Write the Introduction' of the paper or are simply the boss of the lab. Thanks for this interesting paper again.

ADD REPLY • link 12.8 years ago by Michael 55k

0

Entering edit mode

Indeed, I would like to do such a survey, now it seems more intuitive to ask them which browser they prefer.

ADD REPLY • link 12.8 years ago by Michael 55k