Entering edit mode
2.8 years ago
najibveto
▴
120
hello, I am trying to make the gene ontology and kegg database for my non model specie, for that purpose i am using the AnnotationForge package using the following this code:
library(tidyverse)
library(clusterProfiler)
library(AnnotationHub)
library(AnnotationForge)
egg <- rio::import('fatheadminnow-annotation.tsv')
egg[egg==""] <- NA
colnames(egg)
gene_info <- egg %>% dplyr::select(GID = query_name, GENENAME = seed_ortholog) %>% na.omit()
gterms <- egg %>%
dplyr::select(query_name, GOs) %>% na.omit()
gterms<- gterms[!grepl("-", gterms$GOs),]
library(stringr)
all_go_list=str_split(gterms$GOs,",")
gene2go <- data.frame(GID = rep(gterms$query_name,
times = sapply(all_go_list, length)),
GO = unlist(all_go_list),
EVIDENCE = "IEA")
gene2go<- gene2go[!grepl("-", gene2go$GO),]
gene2ko <- egg %>%
dplyr::select(GID = query_name, KO = KEGG_ko) %>%
na.omit()
load("kegg_info.RData")
colnames(ko2pathway)=c("KO",'Pathway')
library(stringr)
gene2ko$KO=str_replace(gene2ko$KO,"ko:","")
gene2ko<- gene2ko[!grepl("-", gene2ko$KO),]
gene2pathway <- gene2ko %>% left_join(ko2pathway, by = "KO") %>%
dplyr::select(GID, Pathway) %>%
na.omit()
library(dplyr)
gene2go <- dplyr::distinct(gene2go)
gene2ko <- dplyr::distinct(gene2ko)
makeOrgPackage(gene_info=gene_info,
go=gene2go,
ko=gene2ko,
maintainer='gmail.com>',
author='gmail.com>',
pathway=gene2pathway,
version="0.0.1",
outputDir = "C:/Users/Documents",
tax_id=90988,
genus="pimephales",
species="promelas",
goTable="go")
the table gene2go is a fellow:
the table gene2ko is a fellow:
when i run, i got this error :
Error in FUN(X[[i]], ...) :
data.frames in '...' cannot contain duplicated rows
i already used the AnnotationForge for making database for another specie aspergillus niger and it worked fine. what could the problem? and how to solve it? thank you for your help.
As you already mentioned "data.frames in '...' cannot contain duplicated rows"
Your df gene2go has multiple duplications. Make a file with one gene correspond to multiple GO terms per line:
thank you for your answer, I already used the same package with a different species:
and as u can see that here is duplicate for the same transcript and when I run the same code, I could make the database:
and i got the database made:
so that is why it is intriguing to work for one specie and not for the other one.