Entering edit mode
6.1 years ago
jomagrax
▴
40
Hi Im Jose! This is my first time using TopGO and Im having problems generating the GOdata object in R, thank you all in advance, This is the code Im using
# 1. Data preparation: List of genes identifiers, gene scores, list of differentially expressed genes, gene-to-GO annotations are all collected and stored in a single R object.
> annot_GO <- read_delim("E:/VESCA/gen_GO.txt", "\t", escape_double = FALSE, col_names = FALSE, trim_ws = TRUE)
Parsed with column specification:
cols(
X1 = col_character(),
X2 = col_character()
)
> annot_GO
# A tibble: 32,832 x 2
X1 X2
<chr> <chr>
1 locusName GO
2 gene00090-v1.0-hybrid NA
3 gene00091-v1.0-hybrid GO:0003677,GO:0046983
# ... with 32,822 more rows
> # create a list of GO terms
> geneID2GO <- as.list(as.character(annot_GO$X1)) # generates list; element names are transcript IDs
> geneID2GO <- as.list(setNames(as.character(annot_GO$X2), as.character(annot_GO$X1))) # adds Gene Ontology data to list
> geneID2GO <- lapply(geneID2GO, function(x) unlist(strsplit(x, split="[,]"))) # split single GO terms string into a character vector, one element per term
> str(head(geneID2GO))
List of 6
$ locusName : chr "GO"
$ gene00090-v1.0-hybrid: chr NA
$ gene00091-v1.0-hybrid: chr [1:2] "GO:0003677" "GO:0046983"
> # make full list of transcript names, geneNames
> geneNames <- names(geneID2GO)
> head(geneNames)
[1] "locusName" "gene00090-v1.0-hybrid" "gene00091-v1.0-hybrid" "gene00092-v1.0-hybrid" "gene00093-v1.0-hybrid"
[6] "gene00094-v1.0-hybrid"
> head(MyInterestingGenes1)
[1] "77981546__" "CL11544Contig1__" "CL3CG7R__" "CL8558Contig1__" "contig00421__258___5"
[6] "contig01716__233___6"
> #List of all genes
> geneList_1 <- factor(as.integer(geneNames %in% MyInterestingGenes1))
> str(geneList_1)
Factor w/ 1 level "0": 1 1 1 1 1 1 1 1 1 1 ...
> head(geneList_1)
locusName gene00090-v1.0-hybrid gene00091-v1.0-hybrid gene00092-v1.0-hybrid gene00093-v1.0-hybrid
0 0 0 0 0
gene00094-v1.0-hybrid
0
Levels: 0
> #Creation of "GOdata object"
> GOdata_1 <- new("topGOdata", ontology = "MF", allGenes = geneList_1, annot = annFUN.gene2GO, nodeSize=5, gene2GO = geneID2GO)
Error in .local(.Object, ...) : allGenes must be a factor with 2 levels
"MyInterestingGenes1" come from a DESeq2 analysis after a kallisto mapping
As much as I know, I understand that the problem is that none of the genes in "MyInterestingGenes1" match with the ones in "geneNames" thats why the factor "geneList_1" don't have any level.
Perhaps you can help me to figure this out.
Note that topGO expects that what you called geneNames is a large set, which comprises several genes, including those present in geneList_1. In your case looks like the two objects are totally different, that's why topGO doesn't work. Maybe you can look to this thread (which was related to antoher issue) and try to reproduce it to get acquainted to the way of operating of topGO.