Question about species, supported by the clusterProfiler R package
1
1
Entering edit mode
4.8 years ago
tanya_fiskur ▴ 70

Hello!

I work with a Nothobranchius furzeri transcriptome data and want to do the GO enriched pathway analysis using the clusterProfiler R package. I started with the command 'search_kegg_organism'. The documentation (https://www.rdocumentation.org/packages/clusterProfiler/versions/3.0.4/topics/search_kegg_organism) says that this function searches directly in the KEGG catalogue (https://www.genome.jp/kegg/catalog/org_list.html), where Nothobranchius furzeri is present and has a code 'nfu'. However,

search_kegg_organism('nfu', by='kegg_code')

didn't work. I tried it with other species, and found out that it finds many organisms (e.g. 'mmu', 'dre'), and doesn't find many other organisms (e.g. 'malb', 'els').

What can it depend on? And does it mean that the package will not work correctly with my species in general?

I would really appreciate if you could help me.

rna-seq next-gen R • 1.7k views
ADD COMMENT
1
Entering edit mode

I'm not sure, as it definitely seems like pathway info is available. The documentation for that function is pretty useless, so you may want to ask on the Bioconductor support forum, as I know the author of the package hangs around there.

Is an error returned?

ADD REPLY
0
Entering edit mode

Thank you, I'll ask there.

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Yes, is it against the rules?

ADD REPLY
1
Entering edit mode

I am merely alerting the Biostars and Bioconductor communities about the duplicated post so that nobody ends up duplicating responses.

ADD REPLY
4
Entering edit mode
4.8 years ago
igor 13k

If you have a question about a specific question, it can be helpful to check the source code. In the case of search_kegg_organism, it's not only a few lines:

search_kegg_organism <- function(str, by="scientific_name", ignore.case=FALSE) {
    by <- match.arg(by, c("kegg_code", "scientific_name", "common_name"))
    kegg_species <- kegg_species_data()
    idx <- grep(str, kegg_species[, by], ignore.case = ignore.case)
    kegg_species[idx,]
}

To summarize, it's looking up species in the kegg_species data frame, which is included with the package. You can get that data frame manually with:

kegg_species <- clusterProfiler:::kegg_species_data()

Indeed, "nfu" is not present in that data frame. You can also try to search for any species that have "fish" in the common name:

dplyr::filter(kegg_species, grepl("fish", common_name))

Not many come up.

However, you don't have to use the species or pathways that are included with clusterProfiler. You can use any. This topic is covered in the clusterProfiler book (Chapter 3).

ADD COMMENT
0
Entering edit mode

Thank you for the answer! I am a bit confused: the Chapter 3 ("Universal enrichment analysis") of the clusterProfiler book recommends using Wikipathways analysis, that doesn't support N.furzeri (http://data.wikipathways.org/current/gmt/) or MSigDb analysis that supports just several species too.

I could do pathway analysis using String and the Medaka or Zebrafish data as reference, but since the KEGG database contains information about pathways in N.furzeri, that would be better to use it.

ADD REPLY
0
Entering edit mode

Chapter 3 provides a few different examples. None of them are perfect, but you can adapt them to your situation. My main point was that you can use clusterProfiler with a variety of independent tools.

You can try to import KEGG pathways into R with a package like KEGGgraph. Then you have to convert the object into a format that clusterProfiler can work with.

ADD REPLY

Login before adding your answer.

Traffic: 2857 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6