Question

Hypergeometric Test R/Bioconductor

1

Entering edit mode

14.3 years ago

Thaman ★ 3.3k

Hi,

I am getting interested in R/Bioconductor packages and trying to learn about it. I want to perform HyperGeometric test for over representation against GO and KEGG. I have go two text files: Back.txt and genes.txt. To test HyperGeometric test I wrote following code in R. The result with be in data.frame and visualizing in the Gograph or KEGG pathway.

library(topGO)

library(GOstats)

universe=read.table("Back.txt", sep=",")  # Background files where only entrez id's are listed without heading column

tbl <- read.table ("genes.txt", sep=",")  # selected genes with following header Probes_id,entrez_gene_id,symbols,P.Value and F.C

selected=<-tbl$V2  # Selecting only second column of tbl vector where entrez_gene_id is present

param <- new ("GOHyperGParams", geneIds = selected, 

universeGeneIds=universe, annotation="org.Hs.eg.db", 

ontology="BP",pvalueCutoff=0.1, conditional=FALSE,testDirection="over")

But, I couldn't succeeed because I get the error

Error in makeValidParams(.Object) : 

geneIds and universeGeneIds must have the same mode

geneIds: NULL 

universeGeneIds: integerFALSE

In addition: Warning message:

In makeValidParams(.Object) :

converting univ from list to atomic vector via unlist

hyp <- hyperGTest (param)

Error in is(object, Cl) : 

error in evaluating the argument 'p' in selecting a method for function 'hyperGTest'

Am I missing something here? Do I have to go through more resources to clear my understanding? if yes where can I find R/Bioconductor HyperGeometric test with all needed R packages?

Plus I have loaded all the library and packages, shown in the link ( http://pastebin.com/i735EUWp )

Thank you

r bioconductor • 8.9k views

ADD COMMENT • link updated 11.6 years ago by Biostar 20 • written 14.3 years ago by Thaman ★ 3.3k

1

Entering edit mode

just a comment, trying to read a file in R using readLines while there a functions like read.delim, read.table, (more robust and flexible) or scan (more efficent) is almost always a bad idea. Also, please provide example input files or put the files online.

ADD REPLY • link 14.3 years ago by Michael 55k

0

Entering edit mode

I think also that an example file would be necessary.

ADD REPLY • link 14.3 years ago by D. Puthier ▴ 350

Ram · Answer 1 · 2010-11-02

3

Entering edit mode

14.3 years ago

Brad Chapman 9.7k

The error message indicates that there are no Entrez IDs in your selected set which are also in the universe set.

What does 'genes.txt' and 'selected' look like? From your comment, it appears as if genes.txt is a CSV file where the Entrez gene IDs are the second column. If so, extract only the entrez IDs:

tbl <- read.table("genes.txt", sep=",")
selected <- tbl$V2

Additionally, are all of the entrez IDs in selected also in universe? Double check that:

lapply(selected, function(x) x %in% universe)

gives a list of TRUE values.

ADD COMMENT • link updated 5.5 years ago by Ram 44k • written 14.3 years ago by Brad Chapman 9.7k

0

Entering edit mode

@ Brad, sorry files content tabulation given by me was confusing. Acutally my Universe (back.txt) contain only Entrez_id no heading. And Selected genes (genes.txt) contains Probes_id,entrez_gene_id,symbols,P.Value and F.C header separated by tab. Yes entrez_gene_id is in second column like you said. I try to do check again but it generate empty list().

ADD REPLY • link 14.3 years ago by Thaman ★ 3.3k

0

Entering edit mode

If they are separated by tabs, then use 'sep="t"' like in D. Puthier's answer instead of 'sep=","'. Otherwise the columns will not get split correctly and you'll have only one column; that's why tbl$V2 (the second column) is NULL.

ADD REPLY • link 14.3 years ago by Brad Chapman 9.7k

0

Entering edit mode

I have done enrichment analysis but not sure whether result is produced as needed but I will do check with DAVID for reference. I want to modify summary result produce my hyperGTest into my own data.frame as G0_term_id/KEGG,GO_term_name/KEGG, Pvalue and number of associated genes from my (genes.txt) file.

ADD REPLY • link 14.3 years ago by Thaman ★ 3.3k

Ram · Answer 2 · 2010-11-02

2

Entering edit mode

14.3 years ago

D. Puthier ▴ 350

The format of your files is not very clear. If these are tabulated files you should use the read.table function as the selected object is an instance of class vector and should not if you have multiple columns (Expressed genes with Probes_id,entrez_gene_id,symbols,P.Value and F.C). So in a first attempt you should try something like:

selected=read.tables("genes.txt",head=T, sep="\t", quote="") # head=F if no header 
# set sep to the right character
# Your  Probes_ids should be in the first column of "selected"
selected[,1]

ADD COMMENT • link updated 5.5 years ago by Ram 44k • written 14.3 years ago by D. Puthier ▴ 350

0

Entering edit mode

I have done enrichment analysis but not sure whether result is produced as needed but I will do check with DAVID for reference. I want to modify summary result produce my hyperGTest into my own data.frame as G0_term_id/KEGG,GO_term_name/KEGG, Pvalue and number of associated genes from my (genes.txt) file

ADD REPLY • link 14.3 years ago by Thaman ★ 3.3k