Question

goseq for non-native species infinite recursion

0

Entering edit mode

6.8 years ago

nsl24 • 0

I'm trying to used the results of a differential expression analysis to look for enriched genes using goseq but I'm having a beast of a time even getting a trial for my non-native species working.

I have:

downloaded gene lengths as a numeric vector taken from biomart (Length)
A gene.vector created from all of the surveyed genes with 1 or 0 depending on DE (from my output file named DE)
A dataframe containing gene ids and the associated GO terms taken from biomart (Named GOT)

My test code:

assayed.genes=DE$assayed.genes
de.genes=DE$de.genes
gene.vector=as.integer(assayed.genes%in%de.genes)
names(gene.vector)=assayed.genes
Length = LEN$genelength
head(gene.vector)

and I see output like

Cre09.g414550.t1.2.v5.5 0

When I try to make the pwf and run goseq

pwf = nullp(gene.vector, bias.data=Length)
go = goseq(pwf, gene2cat = GOT)

The pwf works and produces a plot but when I run goseq I get hit with an infinite recursion error:

Error: evaluation nested too deeply: infinite recursion / options(expressions=)?

Followed by

"Error during wrapup:" repeated

Tweaks and googling haven't turned anything up, so I was hoping someone might be able to spot a glaring error in my approach or offer advice.

RNA-Seq goseq software error • 1.8k views

ADD COMMENT • link updated 6.2 years ago by Ruben ▴ 30 • written 6.8 years ago by nsl24 • 0

score 0 · Answer 1 · 2018-09-04

Hi nsl24,

I know you probably have moved on but I had the identical problem and your question was the only one that popped up in my search. So for people in the future struggling with this, here is how I solved this issues for my code.

The solution for me was very simple. I was working with a tibble and forgot about that (from the tibble package). I could either convert it to a named list or to coerce the tibble into a data frame. In either case, probably some genes map to many GO terms and others do not map to anything. So you should have a named list with many duplicate names pointing at various GO terms or a number of duplicate row values next to all their go terms in the other column. This is also pointed out in the package documentation, just something to keep in mind. Your data frame GOT that you used in your example could be altered as follows:

NamedList = GOT$GOterms
names(NamedList) = GOT$GeneIDs
go = goseq(pwf, gene2cat = NamedList)

Cheers, Ruben