Automating Gorilla And/Or Revigo Analysis

Entering edit mode

12.2 years ago

Alex Reynolds 36k

I have a ranked list of genes for my organism and experiment of interest. I have many such experiments, however.

Is there a way to easily automate use of GOrilla and/or REViGO tools to generate analysis results; particularly, if I can run the tools locally, that would help me prevent putting unnecessary load on their servers.

Is this commonly done, or do people populate "faked-out", reverse-engineered web forms with wget or curl and retrieve results that way?

go • 12k views

ADD COMMENT • link updated 2.8 years ago by Ram 45k • written 12.2 years ago by Alex Reynolds 36k

Entering edit mode

12.2 years ago

Stephen 2.8k

A local version of their tools would be optimal but Aaron Mackey here has written a script to do this:

	#!/usr/bin/perl

	use strict;
	use warnings;

	use WWW::Mechanize;
	use Getopt::Long;

	my $GOrillaURL = "http://cbl-gorilla.cs.technion.ac.il/";

	my @organisms = qw(ARABIDOPSIS_THALIANA
	SACCHAROMYCES_CEREVISIAE
	CAENORHABDITIS_ELEGANS
	DROSOPHILA_MELANOGASTER
	DANIO_RERIO
	HOMO_SAPIENS
	MUS_MUSCULUS
	RATTUS_NORVEGICUS
	);
	my %organisms; @organisms{@organisms} = (1) x @organisms;
	my $organism = "HOMO_SAPIENS";

	my @runmodes = qw(mhg hg);
	my %runmodes; @runmodes{@runmodes} = (1) x @runmodes;
	my $runmode = "mhg";

	my @ontologies = qw(proc func comp all);
	my %ontologies; @ontologies{@ontologies} = (1) x @ontologies;
	my $ontology = "all";

	my $pvalue = "0.001";
	my $name = "";
	my $email = "";
	my $includedups = 0;
	my $revigo = 1;
	my $fast = 1;
	my ($targets, $background);

	my $result = GetOptions("organism=s" => $organism,
	"runmode=s" => $runmode,
	"targets=s" => $targets,
	"background=s" => $background,
	"ontology=s" => $ontology,
	"pvalue=f" => $pvalue,
	"name=s" => $name,
	"email=s" => $email,
	"includedups!" => $includedups,
	"fast!" => $fast,
	);

	die "No such organism $organism\n" unless $organisms{$organism};
	die "No such runmode $runmode\n" unless $runmodes{$runmode};
	die "No such ontology $ontology\n" unless $ontologies{$ontology};

	die "Must supply both target and background files with runmode hg\n"
	unless ($runmode eq "mhg" \|\| ($targets && $background));

	die "Must supply target file with runmode mhg\n"
	unless ($runmode eq "hg" \|\| $targets);

	my $mech = WWW::Mechanize->new();

	$mech->get($GOrillaURL);

	$mech->form_name("gorilla");

	$mech->select("species" => $organism);
	$mech->set_fields("run_mode" => $runmode);
	$mech->set_fields("target_file_name" => $targets);
	if ($runmode eq "hg") {
	$mech->set_file("background_file_name" => $background);
	}
	$mech->set_fields("db" => $ontology);
	$mech->select("pvalue_thresh" => $pvalue);
	$mech->set_fields("analysis_name" => $name);
	$mech->set_fields("user_email" => $email);
	$mech->set_fields("output_excel" => 1);
	$mech->set_fields("output_unresolved" => $includedups);
	$mech->set_fields("output_revigo" => $revigo);
	$mech->set_fields("fast_mode" => $fast);

	$mech->click("run_gogo_button");

	my $res = $mech->response();
	my $base = $res->base();
	my ($id) = $base =~ m/id=(.*)/;

	warn "Results can be found at:
	http://cbl-gorilla.cs.technion.ac.il/GOrilla/${id}/GOResults.html\n";

	print "# Results can be found at:
	# http://cbl-gorilla.cs.technion.ac.il/GOrilla/${id}/GOResults.html\n";

	do $mech->get($base)
	until $mech->response->base() ne $base;

	my %pages = (proc => "PROCESS",
	func => "FUNCTION",
	comp => "COMPONENT");

	my @pages = $ontology eq "all" ? values(%pages) : $pages{$ontology};

	for my $page (@pages) {
	my $excel = "${GOrillaURL}/GOrilla/${id}/GO${page}.xls";
	$mech->get($excel);
	my $content = $mech->content();
	print $content;
	}

view raw GOrilla.pl hosted with ❤ by GitHub

ADD COMMENT • link updated 3.7 years ago by Ram 45k • written 12.2 years ago by Stephen 2.8k

Entering edit mode

Thanks, this script worked very well.

After retrieving the ${id} value, curl or wget can be used to retrieve these eight files into their own sub-folder:

GOCOMPONENT.png
GOFUNCTION.png
GOPROCESS.png
GOResults.html
GOResultsCOMPONENT.html
GOResultsFUNCTION.html
GOResultsPROCESS.html
top.html

These results can be loaded by a local web-browser by loading GOResults.html, and they can be kept indefinitely.

Here are my modifications to this script, which put the GO analysis results in the folder specified with --outputDir:

	#!/usr/bin/env perl

	#
	# via: http://www.biostars.org/p/70064/#70085
	#

	use strict;
	use warnings;

	use File::Path;
	use WWW::Mechanize;
	use Getopt::Long;

	my $GOrillaURL = "http://cbl-gorilla.cs.technion.ac.il/";

	my @organisms = qw(ARABIDOPSIS_THALIANA
	SACCHAROMYCES_CEREVISIAE
	CAENORHABDITIS_ELEGANS
	DROSOPHILA_MELANOGASTER
	DANIO_RERIO
	HOMO_SAPIENS
	MUS_MUSCULUS
	RATTUS_NORVEGICUS
	);
	my %organisms; @organisms{@organisms} = (1) x @organisms;
	my $organism = "MUS_MUSCULUS";

	my @runmodes = qw(mhg hg);
	my %runmodes; @runmodes{@runmodes} = (1) x @runmodes;
	my $runmode = "mhg";

	my @ontologies = qw(proc func comp all);
	my %ontologies; @ontologies{@ontologies} = (1) x @ontologies;
	my $ontology = "all";

	my $pvalue = "0.001";
	my $name = "";
	my $email = "";
	my $includedups = 0;
	my $revigo = 1;
	my $fast = 1;
	my ($targets, $background, $outputDir);

	my $result = GetOptions("organism=s" => $organism,
	"runmode=s" => $runmode,
	"targets=s" => $targets,
	"background=s" => $background,
	"ontology=s" => $ontology,
	"pvalue=f" => $pvalue,
	"name=s" => $name,
	"email=s" => $email,
	"includedups!" => $includedups,
	"fast!" => $fast,
	"outputdir=s" => $outputDir,
	);

	die "No such organism $organism\n" unless $organisms{$organism};
	die "No such runmode $runmode\n" unless $runmodes{$runmode};
	die "No such ontology $ontology\n" unless $ontologies{$ontology};

	die "Must supply both target and background files with runmode hg\n"
	unless ($runmode eq "mhg" \|\| ($targets && $background));

	die "Must supply target file with runmode mhg\n"
	unless ($runmode eq "hg" \|\| $targets);

	die "No output directory specified\n" unless $outputDir;
	if (! -d $outputDir) { mkpath $outputDir; }

	my $mech = WWW::Mechanize->new();

	$mech->get($GOrillaURL);

	$mech->form_name("gorilla");

	$mech->select("species" => $organism);
	$mech->set_fields("run_mode" => $runmode);
	$mech->set_fields("target_file_name" => $targets);
	if ($runmode eq "hg") {
	$mech->set_file("background_file_name" => $background);
	}
	$mech->set_fields("db" => $ontology);
	$mech->select("pvalue_thresh" => $pvalue);
	$mech->set_fields("analysis_name" => $name);
	$mech->set_fields("user_email" => $email);
	$mech->set_fields("output_excel" => 1);
	$mech->set_fields("output_unresolved" => $includedups);
	$mech->set_fields("output_revigo" => $revigo);
	$mech->set_fields("fast_mode" => $fast);

	$mech->click("run_gogo_button");

	my $res = $mech->response();
	my $base = $res->base();
	my ($id) = $base =~ m/id=(.*)/;

	print STDERR "Results can be found at: http://cbl-gorilla.cs.technion.ac.il/GOrilla/${id}/GOResults.html\n";

	do $mech->get($base)
	until $mech->response->base() ne $base;

	my %pages = (proc => "PROCESS",
	func => "FUNCTION",
	comp => "COMPONENT");

	my @pages = $ontology eq "all" ? values(%pages) : $pages{$ontology};

	for my $page (@pages) {
	print STDERR "trying to retrieve ${page} records...\n";
	my $excel = "${GOrillaURL}/GOrilla/${id}/GO${page}.xls";
	my $connected = eval {
	$mech->get($excel);
	1
	};
	if ($mech->success()) {
	my $content = $mech->content();
	my $outputFn = "$outputDir/GO${page}.txt";
	open my $outputFh, ">", $outputFn or die "could not open handle to GO output: $outputFn\n";
	print $outputFh $content;
	close $outputFh;
	my $pngUri = "${GOrillaURL}/GOrilla/${id}/GO${page}.png";
	my $pngFn = "$outputDir/GO${page}.png";
	$mech->get($pngUri, ':content_file' => $pngFn);
	}

	my $resUri = "${GOrillaURL}/GOrilla/${id}/GOResults${page}.html";
	my $resFn = "$outputDir/GOResults${page}.html";
	$mech->get($resUri, ':content_file' => $resFn);
	}

	print STDERR "trying to retrieve root results record...\n";
	my $rootResUri = "${GOrillaURL}/GOrilla/${id}/GOResults.html";
	my $rootResFn = "$outputDir/GOResults.html";
	$mech->get($rootResUri, ':content_file' => $rootResFn);

	print STDERR "trying to retrieve top bar record...\n";
	my $topUri = "${GOrillaURL}/GOrilla/${id}/top.html";
	my $topFn = "$outputDir/top.html";
	$mech->get($topUri, ':content_file' => $topFn);

view raw biostars-70136.pl hosted with ❤ by GitHub

ADD REPLY • link updated 5.6 years ago by Ram 45k • written 12.2 years ago by Alex Reynolds 36k

Entering edit mode

Thanks for the script! Save me a lot of manual work. Here are few modifications of a couple of lines which I believe are not correct. I am not a Perl guy but it seems it's working better after the changes.

Line 80 should be

$mech->set_fields("background_file_name" => $background);

instead of

$mech->set_file("background_file_name" => $background);

Lines 110-113 should be

my $excelUri = "${GOrillaURL}/GOrilla/${id}/GO${page}.xls";
my $excelFn = "$outputDir/GO${page}.xls";
my $connected = eval {
    $mech->get($excelUri, ':content_file' => $excelFn);
    1
};

instead of

my $excel = "${GOrillaURL}/GOrilla/${id}/GO${page}.xls";
my $connected = eval {
    $mech->get($excel);
    1
};

ADD REPLY • link 7.1 years ago by opplatek ▴ 300

Entering edit mode

this does not capture the Excel output. How can this be modified so the 3 excel files can be saved?

ADD REPLY • link 7.2 years ago by dec986 ▴ 380

Entering edit mode

Perhaps I can help modify this to add a few more GETs, assuming I understand your question correctly. What are their filenames? I just haven't looked at this in some years, so filenames would be useful.

ADD REPLY • link 7.2 years ago by Alex Reynolds 36k

Entering edit mode

Link broken for Gist! link

	#!/usr/bin/perl

	use strict;
	use warnings;

	use WWW::Mechanize;
	use Getopt::Long;

	my $GOrillaURL = "http://cbl-gorilla.cs.technion.ac.il/";

	my @organisms = qw(ARABIDOPSIS_THALIANA
	SACCHAROMYCES_CEREVISIAE
	CAENORHABDITIS_ELEGANS
	DROSOPHILA_MELANOGASTER
	DANIO_RERIO
	HOMO_SAPIENS
	MUS_MUSCULUS
	RATTUS_NORVEGICUS
	);
	my %organisms; @organisms{@organisms} = (1) x @organisms;
	my $organism = "HOMO_SAPIENS";

	my @runmodes = qw(mhg hg);
	my %runmodes; @runmodes{@runmodes} = (1) x @runmodes;
	my $runmode = "mhg";

	my @ontologies = qw(proc func comp all);
	my %ontologies; @ontologies{@ontologies} = (1) x @ontologies;
	my $ontology = "all";

	my $pvalue = "0.001";
	my $name = "";
	my $email = "";
	my $includedups = 0;
	my $revigo = 1;
	my $fast = 1;
	my ($targets, $background);

	my $result = GetOptions("organism=s" => $organism,
	"runmode=s" => $runmode,
	"targets=s" => $targets,
	"background=s" => $background,
	"ontology=s" => $ontology,
	"pvalue=f" => $pvalue,
	"name=s" => $name,
	"email=s" => $email,
	"includedups!" => $includedups,
	"fast!" => $fast,
	);

	die "No such organism $organism\n" unless $organisms{$organism};
	die "No such runmode $runmode\n" unless $runmodes{$runmode};
	die "No such ontology $ontology\n" unless $ontologies{$ontology};

	die "Must supply both target and background files with runmode hg\n"
	unless ($runmode eq "mhg" \|\| ($targets && $background));

	die "Must supply target file with runmode mhg\n"
	unless ($runmode eq "hg" \|\| $targets);

	my $mech = WWW::Mechanize->new();

	$mech->get($GOrillaURL);

	$mech->form_name("gorilla");

	$mech->select("species" => $organism);
	$mech->set_fields("run_mode" => $runmode);
	$mech->set_fields("target_file_name" => $targets);
	if ($runmode eq "hg") {
	$mech->set_file("background_file_name" => $background);
	}
	$mech->set_fields("db" => $ontology);
	$mech->select("pvalue_thresh" => $pvalue);
	$mech->set_fields("analysis_name" => $name);
	$mech->set_fields("user_email" => $email);
	$mech->set_fields("output_excel" => 1);
	$mech->set_fields("output_unresolved" => $includedups);
	$mech->set_fields("output_revigo" => $revigo);
	$mech->set_fields("fast_mode" => $fast);

	$mech->click("run_gogo_button");

	my $res = $mech->response();
	my $base = $res->base();
	my ($id) = $base =~ m/id=(.*)/;

	warn "Results can be found at:
	http://cbl-gorilla.cs.technion.ac.il/GOrilla/${id}/GOResults.html\n";

	print "# Results can be found at:
	# http://cbl-gorilla.cs.technion.ac.il/GOrilla/${id}/GOResults.html\n";

	do $mech->get($base)
	until $mech->response->base() ne $base;

	my %pages = (proc => "PROCESS",
	func => "FUNCTION",
	comp => "COMPONENT");

	my @pages = $ontology eq "all" ? values(%pages) : $pages{$ontology};

	for my $page (@pages) {
	my $excel = "${GOrillaURL}/GOrilla/${id}/GO${page}.xls";
	$mech->get($excel);
	my $content = $mech->content();
	print $content;
	}

view raw GOrilla.pl hosted with ❤ by GitHub

ADD REPLY • link updated 2.8 years ago by Ram 45k • written 9.9 years ago by cmdcolin ★ 4.3k

Entering edit mode

12.2 years ago

Ryan Dale 5.0k

For REVIGO, there's an API provided -- see http://revigo.irb.hr/invokeRevigoAndFillFields.html. But this is really just a mechanism for uploading the data. On the results pages they provide links to R scripts, but I think you'd have to screen-scrape to get these.

EDIT:

Using mechanize, it's actually pretty straightforward to get the R scripts -- here's a complete example that submits example data, downloads the R scripts for the molecular function treemap and scatter plot, and generates the PDFs.

	#!/usr/bin/python

	"""
	- Submit example data to REVIGO server (http://revigo.irb.hr/)
	- Download and run R script for creating the treemap
	- Download and run R script for creating the scatterplot

	Creates files:
	treemap.R, treemap.Rout, revigo_treemap.pdf
	scatter.R, scatter.Rout, revigo_scatter.pdf
	"""

	import os
	import urllib
	import mechanize

	url = "http://revigo.irb.hr/"

	# RobustFactory because REVIGO forms not well-formatted
	br = mechanize.Browser(factory=mechanize.RobustFactory())

	# For actual data, use open('mydata.txt').read()
	br.open(os.path.join(url, 'examples', 'example1.txt'))
	txt = br.response().read()

	# Encode and request
	data = {'inputGoList': txt}
	br.open(url, data=urllib.urlencode(data))

	# Submit form
	br.select_form(name="submitToRevigo")
	response = br.submit()

	# Exact string match on the url for getting the R treemap script
	br.follow_link(url="toR_treemap.jsp?table=3")
	with open('treemap.R', 'w') as f:
	f.write(br.response().read())

	# go back and get R script for scatter
	br.back()
	br.follow_link(url="toR.jsp?table=3")
	with open('scatter.R', 'w') as f:
	f.write(br.response().read())
	# Downloaded scatter script doesn't save PDF, so add this line
	f.write('ggsave("revigo_scatter.pdf")')

	# Create PDFs
	os.system('R CMD BATCH treemap.R')
	os.system('R CMD BATCH scatter.R')

view raw revigo_download.py hosted with ❤ by GitHub

ADD COMMENT • link updated 5.6 years ago by Ram 45k • written 12.2 years ago by Ryan Dale 5.0k

Entering edit mode

Thanks for that link. Unfortunately, it doesn't look like there's a way to retrieve the R script that makes the treemap (or even the data that go into making the treemap).

ADD REPLY • link 12.2 years ago by Alex Reynolds 36k

Entering edit mode

Thanks! The follow_link URLs in your script did not work and returned mechanize._mechanize.LinkNotFoundError errors, but after changing them from *?table=3 to *?table=1 I was able to get treemap and scatterplot PDF files.

ADD REPLY • link 12.2 years ago by Alex Reynolds 36k

Entering edit mode

I was almost imagining just automating the R script itself instead of scraping the R script that they provide, but then I remembered that they do GO term reduction and other things. Good post!

ADD REPLY • link 9.9 years ago by cmdcolin ★ 4.3k

Entering edit mode

12.2 years ago

Woa ★ 2.9k

I wrote a script to get output from Revigo. Here's the link. I possibly parsed the raw output using Perl's HTML::Table, but can't remember now.

ADD COMMENT • link updated 5.6 years ago by Ram 45k • written 12.2 years ago by Woa ★ 2.9k

Entering edit mode

Thanks for that link. As mentioned in the comment by Daler, unfortunately, it doesn't look like there's a way to retrieve the R script or data that make the treemap from the response HTML.

ADD REPLY • link 12.2 years ago by Alex Reynolds 36k

Entering edit mode

8.0 years ago

sam.demeyer93 • 0

Here is an example that works with python 3

How do you embed a Gist in your post?

ADD COMMENT • link 8.0 years ago by sam.demeyer93 • 0

Entering edit mode

8.0 years ago

EagleEye 7.6k

GeneSCF analysis with batch mode (multiple list in one go).

ADD COMMENT • link 8.0 years ago by EagleEye 7.6k