Question

Dataset normalization before gene ontology analysis

1

Entering edit mode

9.1 years ago

tiago211287 ★ 1.5k

I peformed GO analysis from a list of Genes using the Goseq package from bioconductor. After plotting the results, I could see that the bigger the gene list was, most counts it had from each category, there are some way for normalize this by the size of each gene list?

Gene ontology Normalization RNA-Seq bioconductor • 2.0k views

ADD COMMENT • link updated 9.1 years ago by svlachavas ▴ 790 • written 9.1 years ago by tiago211287 ★ 1.5k

Ram · Answer 1 · 2015-10-12

0

Entering edit mode

9.1 years ago

svlachavas ▴ 790

Dear Tiago211287,

I believe that you get this result from ploting, because generally in RNA-seq the length of one gene is crusial regarding the levels of its expression (which in turn is associated with power). Thus, one way to possibly adjust for this when performing a GO analysis with RNA-seq data, is to use prior the function ?nullp:

nullp(DEgenes, genome, id, bias.data=NULL,plot.fit=TRUE)

This will produce a set of relative weights which are "somehow proportional" to how "big" are your input genes.

Then, you can feed it directly to goseq()

Hope that helps,
Efstathios

ADD COMMENT • link updated 5.0 years ago by Ram 44k • written 9.1 years ago by svlachavas ▴ 790

0

Entering edit mode

I did that in goseq, generating a pwf(Probability Weighting Function)

ADD REPLY • link 9.1 years ago by tiago211287 ★ 1.5k

0

Entering edit mode

Well then, excuse me but I misunderstood your question. So, did you meant that you used more than one gene lists ? If so, (without being an expert on RNA-seq analysis) why do you want to normalize for the size of each list?

ADD REPLY • link updated 5.0 years ago by Ram 44k • written 9.1 years ago by svlachavas ▴ 790