Automating Gorilla And/Or Revigo Analysis
5
7
Entering edit mode
12.0 years ago

I have a ranked list of genes for my organism and experiment of interest. I have many such experiments, however.

Is there a way to easily automate use of GOrilla and/or REViGO tools to generate analysis results; particularly, if I can run the tools locally, that would help me prevent putting unnecessary load on their servers.

Is this commonly done, or do people populate "faked-out", reverse-engineered web forms with wget or curl and retrieve results that way?

go • 12k views
ADD COMMENT
9
Entering edit mode
12.0 years ago
Stephen 2.8k

A local version of their tools would be optimal but Aaron Mackey here has written a script to do this:

#!/usr/bin/perl
use strict;
use warnings;
use WWW::Mechanize;
use Getopt::Long;
my $GOrillaURL = "http://cbl-gorilla.cs.technion.ac.il/";
my @organisms = qw(ARABIDOPSIS_THALIANA
SACCHAROMYCES_CEREVISIAE
CAENORHABDITIS_ELEGANS
DROSOPHILA_MELANOGASTER
DANIO_RERIO
HOMO_SAPIENS
MUS_MUSCULUS
RATTUS_NORVEGICUS
);
my %organisms; @organisms{@organisms} = (1) x @organisms;
my $organism = "HOMO_SAPIENS";
my @runmodes = qw(mhg hg);
my %runmodes; @runmodes{@runmodes} = (1) x @runmodes;
my $runmode = "mhg";
my @ontologies = qw(proc func comp all);
my %ontologies; @ontologies{@ontologies} = (1) x @ontologies;
my $ontology = "all";
my $pvalue = "0.001";
my $name = "";
my $email = "";
my $includedups = 0;
my $revigo = 1;
my $fast = 1;
my ($targets, $background);
my $result = GetOptions("organism=s" => $organism,
"runmode=s" => $runmode,
"targets=s" => $targets,
"background=s" => $background,
"ontology=s" => $ontology,
"pvalue=f" => $pvalue,
"name=s" => $name,
"email=s" => $email,
"includedups!" => $includedups,
"fast!" => $fast,
);
die "No such organism $organism\n" unless $organisms{$organism};
die "No such runmode $runmode\n" unless $runmodes{$runmode};
die "No such ontology $ontology\n" unless $ontologies{$ontology};
die "Must supply both target and background files with runmode hg\n"
unless ($runmode eq "mhg" || ($targets && $background));
die "Must supply target file with runmode mhg\n"
unless ($runmode eq "hg" || $targets);
my $mech = WWW::Mechanize->new();
$mech->get($GOrillaURL);
$mech->form_name("gorilla");
$mech->select("species" => $organism);
$mech->set_fields("run_mode" => $runmode);
$mech->set_fields("target_file_name" => $targets);
if ($runmode eq "hg") {
$mech->set_file("background_file_name" => $background);
}
$mech->set_fields("db" => $ontology);
$mech->select("pvalue_thresh" => $pvalue);
$mech->set_fields("analysis_name" => $name);
$mech->set_fields("user_email" => $email);
$mech->set_fields("output_excel" => 1);
$mech->set_fields("output_unresolved" => $includedups);
$mech->set_fields("output_revigo" => $revigo);
$mech->set_fields("fast_mode" => $fast);
$mech->click("run_gogo_button");
my $res = $mech->response();
my $base = $res->base();
my ($id) = $base =~ m/id=(.*)/;
warn "Results can be found at:
http://cbl-gorilla.cs.technion.ac.il/GOrilla/${id}/GOResults.html\n";
print "# Results can be found at:
# http://cbl-gorilla.cs.technion.ac.il/GOrilla/${id}/GOResults.html\n";
do $mech->get($base)
until $mech->response->base() ne $base;
my %pages = (proc => "PROCESS",
func => "FUNCTION",
comp => "COMPONENT");
my @pages = $ontology eq "all" ? values(%pages) : $pages{$ontology};
for my $page (@pages) {
my $excel = "${GOrillaURL}/GOrilla/${id}/GO${page}.xls";
$mech->get($excel);
my $content = $mech->content();
print $content;
}
view raw GOrilla.pl hosted with ❤ by GitHub

ADD COMMENT
2
Entering edit mode

Thanks, this script worked very well.

After retrieving the ${id} value, curl or wget can be used to retrieve these eight files into their own sub-folder:

GOCOMPONENT.png
GOFUNCTION.png
GOPROCESS.png
GOResults.html
GOResultsCOMPONENT.html
GOResultsFUNCTION.html
GOResultsPROCESS.html
top.html

These results can be loaded by a local web-browser by loading GOResults.html, and they can be kept indefinitely.

Here are my modifications to this script, which put the GO analysis results in the folder specified with --outputDir:

#!/usr/bin/env perl
#
# via: http://www.biostars.org/p/70064/#70085
#
use strict;
use warnings;
use File::Path;
use WWW::Mechanize;
use Getopt::Long;
my $GOrillaURL = "http://cbl-gorilla.cs.technion.ac.il/";
my @organisms = qw(ARABIDOPSIS_THALIANA
SACCHAROMYCES_CEREVISIAE
CAENORHABDITIS_ELEGANS
DROSOPHILA_MELANOGASTER
DANIO_RERIO
HOMO_SAPIENS
MUS_MUSCULUS
RATTUS_NORVEGICUS
);
my %organisms; @organisms{@organisms} = (1) x @organisms;
my $organism = "MUS_MUSCULUS";
my @runmodes = qw(mhg hg);
my %runmodes; @runmodes{@runmodes} = (1) x @runmodes;
my $runmode = "mhg";
my @ontologies = qw(proc func comp all);
my %ontologies; @ontologies{@ontologies} = (1) x @ontologies;
my $ontology = "all";
my $pvalue = "0.001";
my $name = "";
my $email = "";
my $includedups = 0;
my $revigo = 1;
my $fast = 1;
my ($targets, $background, $outputDir);
my $result = GetOptions("organism=s" => $organism,
"runmode=s" => $runmode,
"targets=s" => $targets,
"background=s" => $background,
"ontology=s" => $ontology,
"pvalue=f" => $pvalue,
"name=s" => $name,
"email=s" => $email,
"includedups!" => $includedups,
"fast!" => $fast,
"outputdir=s" => $outputDir,
);
die "No such organism $organism\n" unless $organisms{$organism};
die "No such runmode $runmode\n" unless $runmodes{$runmode};
die "No such ontology $ontology\n" unless $ontologies{$ontology};
die "Must supply both target and background files with runmode hg\n"
unless ($runmode eq "mhg" || ($targets && $background));
die "Must supply target file with runmode mhg\n"
unless ($runmode eq "hg" || $targets);
die "No output directory specified\n" unless $outputDir;
if (! -d $outputDir) { mkpath $outputDir; }
my $mech = WWW::Mechanize->new();
$mech->get($GOrillaURL);
$mech->form_name("gorilla");
$mech->select("species" => $organism);
$mech->set_fields("run_mode" => $runmode);
$mech->set_fields("target_file_name" => $targets);
if ($runmode eq "hg") {
$mech->set_file("background_file_name" => $background);
}
$mech->set_fields("db" => $ontology);
$mech->select("pvalue_thresh" => $pvalue);
$mech->set_fields("analysis_name" => $name);
$mech->set_fields("user_email" => $email);
$mech->set_fields("output_excel" => 1);
$mech->set_fields("output_unresolved" => $includedups);
$mech->set_fields("output_revigo" => $revigo);
$mech->set_fields("fast_mode" => $fast);
$mech->click("run_gogo_button");
my $res = $mech->response();
my $base = $res->base();
my ($id) = $base =~ m/id=(.*)/;
print STDERR "Results can be found at: http://cbl-gorilla.cs.technion.ac.il/GOrilla/${id}/GOResults.html\n";
do $mech->get($base)
until $mech->response->base() ne $base;
my %pages = (proc => "PROCESS",
func => "FUNCTION",
comp => "COMPONENT");
my @pages = $ontology eq "all" ? values(%pages) : $pages{$ontology};
for my $page (@pages) {
print STDERR "trying to retrieve ${page} records...\n";
my $excel = "${GOrillaURL}/GOrilla/${id}/GO${page}.xls";
my $connected = eval {
$mech->get($excel);
1
};
if ($mech->success()) {
my $content = $mech->content();
my $outputFn = "$outputDir/GO${page}.txt";
open my $outputFh, ">", $outputFn or die "could not open handle to GO output: $outputFn\n";
print $outputFh $content;
close $outputFh;
my $pngUri = "${GOrillaURL}/GOrilla/${id}/GO${page}.png";
my $pngFn = "$outputDir/GO${page}.png";
$mech->get($pngUri, ':content_file' => $pngFn);
}
my $resUri = "${GOrillaURL}/GOrilla/${id}/GOResults${page}.html";
my $resFn = "$outputDir/GOResults${page}.html";
$mech->get($resUri, ':content_file' => $resFn);
}
print STDERR "trying to retrieve root results record...\n";
my $rootResUri = "${GOrillaURL}/GOrilla/${id}/GOResults.html";
my $rootResFn = "$outputDir/GOResults.html";
$mech->get($rootResUri, ':content_file' => $rootResFn);
print STDERR "trying to retrieve top bar record...\n";
my $topUri = "${GOrillaURL}/GOrilla/${id}/top.html";
my $topFn = "$outputDir/top.html";
$mech->get($topUri, ':content_file' => $topFn);

ADD REPLY
1
Entering edit mode

Thanks for the script! Save me a lot of manual work. Here are few modifications of a couple of lines which I believe are not correct. I am not a Perl guy but it seems it's working better after the changes.

Line 80 should be

$mech->set_fields("background_file_name" => $background);

instead of

$mech->set_file("background_file_name" => $background);

Lines 110-113 should be

my $excelUri = "${GOrillaURL}/GOrilla/${id}/GO${page}.xls";
my $excelFn = "$outputDir/GO${page}.xls";
my $connected = eval {
    $mech->get($excelUri, ':content_file' => $excelFn);
    1
};

instead of

my $excel = "${GOrillaURL}/GOrilla/${id}/GO${page}.xls";
my $connected = eval {
    $mech->get($excel);
    1
};
ADD REPLY
0
Entering edit mode

this does not capture the Excel output. How can this be modified so the 3 excel files can be saved?

ADD REPLY
0
Entering edit mode

Perhaps I can help modify this to add a few more GETs, assuming I understand your question correctly. What are their filenames? I just haven't looked at this in some years, so filenames would be useful.

ADD REPLY
0
Entering edit mode

Link broken for Gist! link

#!/usr/bin/perl
use strict;
use warnings;
use WWW::Mechanize;
use Getopt::Long;
my $GOrillaURL = "http://cbl-gorilla.cs.technion.ac.il/";
my @organisms = qw(ARABIDOPSIS_THALIANA
SACCHAROMYCES_CEREVISIAE
CAENORHABDITIS_ELEGANS
DROSOPHILA_MELANOGASTER
DANIO_RERIO
HOMO_SAPIENS
MUS_MUSCULUS
RATTUS_NORVEGICUS
);
my %organisms; @organisms{@organisms} = (1) x @organisms;
my $organism = "HOMO_SAPIENS";
my @runmodes = qw(mhg hg);
my %runmodes; @runmodes{@runmodes} = (1) x @runmodes;
my $runmode = "mhg";
my @ontologies = qw(proc func comp all);
my %ontologies; @ontologies{@ontologies} = (1) x @ontologies;
my $ontology = "all";
my $pvalue = "0.001";
my $name = "";
my $email = "";
my $includedups = 0;
my $revigo = 1;
my $fast = 1;
my ($targets, $background);
my $result = GetOptions("organism=s" => $organism,
"runmode=s" => $runmode,
"targets=s" => $targets,
"background=s" => $background,
"ontology=s" => $ontology,
"pvalue=f" => $pvalue,
"name=s" => $name,
"email=s" => $email,
"includedups!" => $includedups,
"fast!" => $fast,
);
die "No such organism $organism\n" unless $organisms{$organism};
die "No such runmode $runmode\n" unless $runmodes{$runmode};
die "No such ontology $ontology\n" unless $ontologies{$ontology};
die "Must supply both target and background files with runmode hg\n"
unless ($runmode eq "mhg" || ($targets && $background));
die "Must supply target file with runmode mhg\n"
unless ($runmode eq "hg" || $targets);
my $mech = WWW::Mechanize->new();
$mech->get($GOrillaURL);
$mech->form_name("gorilla");
$mech->select("species" => $organism);
$mech->set_fields("run_mode" => $runmode);
$mech->set_fields("target_file_name" => $targets);
if ($runmode eq "hg") {
$mech->set_file("background_file_name" => $background);
}
$mech->set_fields("db" => $ontology);
$mech->select("pvalue_thresh" => $pvalue);
$mech->set_fields("analysis_name" => $name);
$mech->set_fields("user_email" => $email);
$mech->set_fields("output_excel" => 1);
$mech->set_fields("output_unresolved" => $includedups);
$mech->set_fields("output_revigo" => $revigo);
$mech->set_fields("fast_mode" => $fast);
$mech->click("run_gogo_button");
my $res = $mech->response();
my $base = $res->base();
my ($id) = $base =~ m/id=(.*)/;
warn "Results can be found at:
http://cbl-gorilla.cs.technion.ac.il/GOrilla/${id}/GOResults.html\n";
print "# Results can be found at:
# http://cbl-gorilla.cs.technion.ac.il/GOrilla/${id}/GOResults.html\n";
do $mech->get($base)
until $mech->response->base() ne $base;
my %pages = (proc => "PROCESS",
func => "FUNCTION",
comp => "COMPONENT");
my @pages = $ontology eq "all" ? values(%pages) : $pages{$ontology};
for my $page (@pages) {
my $excel = "${GOrillaURL}/GOrilla/${id}/GO${page}.xls";
$mech->get($excel);
my $content = $mech->content();
print $content;
}
view raw GOrilla.pl hosted with ❤ by GitHub

ADD REPLY
2
Entering edit mode
12.0 years ago
Ryan Dale 5.0k

For REVIGO, there's an API provided -- see http://revigo.irb.hr/invokeRevigoAndFillFields.html. But this is really just a mechanism for uploading the data. On the results pages they provide links to R scripts, but I think you'd have to screen-scrape to get these.

EDIT:

Using mechanize, it's actually pretty straightforward to get the R scripts -- here's a complete example that submits example data, downloads the R scripts for the molecular function treemap and scatter plot, and generates the PDFs.

#!/usr/bin/python
"""
- Submit example data to REVIGO server (http://revigo.irb.hr/)
- Download and run R script for creating the treemap
- Download and run R script for creating the scatterplot
Creates files:
treemap.R, treemap.Rout, revigo_treemap.pdf
scatter.R, scatter.Rout, revigo_scatter.pdf
"""
import os
import urllib
import mechanize
url = "http://revigo.irb.hr/"
# RobustFactory because REVIGO forms not well-formatted
br = mechanize.Browser(factory=mechanize.RobustFactory())
# For actual data, use open('mydata.txt').read()
br.open(os.path.join(url, 'examples', 'example1.txt'))
txt = br.response().read()
# Encode and request
data = {'inputGoList': txt}
br.open(url, data=urllib.urlencode(data))
# Submit form
br.select_form(name="submitToRevigo")
response = br.submit()
# Exact string match on the url for getting the R treemap script
br.follow_link(url="toR_treemap.jsp?table=3")
with open('treemap.R', 'w') as f:
f.write(br.response().read())
# go back and get R script for scatter
br.back()
br.follow_link(url="toR.jsp?table=3")
with open('scatter.R', 'w') as f:
f.write(br.response().read())
# Downloaded scatter script doesn't save PDF, so add this line
f.write('ggsave("revigo_scatter.pdf")')
# Create PDFs
os.system('R CMD BATCH treemap.R')
os.system('R CMD BATCH scatter.R')

ADD COMMENT
0
Entering edit mode

Thanks for that link. Unfortunately, it doesn't look like there's a way to retrieve the R script that makes the treemap (or even the data that go into making the treemap).

ADD REPLY
0
Entering edit mode

Thanks! The follow_link URLs in your script did not work and returned mechanize._mechanize.LinkNotFoundError errors, but after changing them from *?table=3 to *?table=1 I was able to get treemap and scatterplot PDF files.

ADD REPLY
0
Entering edit mode

I was almost imagining just automating the R script itself instead of scraping the R script that they provide, but then I remembered that they do GO term reduction and other things. Good post!

ADD REPLY
1
Entering edit mode
12.0 years ago
Woa ★ 2.9k

I wrote a script to get output from Revigo. Here's the link. I possibly parsed the raw output using Perl's HTML::Table, but can't remember now.

ADD COMMENT
0
Entering edit mode

Thanks for that link. As mentioned in the comment by Daler, unfortunately, it doesn't look like there's a way to retrieve the R script or data that make the treemap from the response HTML.

ADD REPLY
0
Entering edit mode
7.7 years ago

Here is an example that works with python 3

How do you embed a Gist in your post?

ADD COMMENT
0
Entering edit mode
7.7 years ago
EagleEye 7.6k

GeneSCF analysis with batch mode (multiple list in one go).

ADD COMMENT

Login before adding your answer.

Traffic: 3418 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6