WGCNA: outputting multiple hubgenes from a module
2
0
Entering edit mode
4.5 years ago
RNAseqer ▴ 270

I know that WGCNA has the function chooseTopHubInEachModule() to pick the gene with the single highest connectivity within a module, but does it have a function to output several hubgenes from a module? Im looking through the manual and it certainly seems like it should be a straightforward task but I'm having some trouble executing it, so I thought I'd just ask if anyone had done this before.

WGCNA HubGenes • 2.0k views
ADD COMMENT
4
Entering edit mode
4.5 years ago

Hi,

I would honestly just hack the function code and create my own function, if I were you. The code is simple:

WGCNA::chooseTopHubInEachModule

function (datExpr, colorh, omitColors = "grey", power = 2, type = "signed", 
    ...) 
{
    isIndex = FALSE
    modules = names(table(colorh))
    if (!is.na(omitColors)[1])) 
        modules = modules[!is.element(modules, omitColors)]
    if (is.null(colnames(datExpr))) {
        colnames(datExpr) = 1:dim(datExpr)[2]
        isIndex = TRUE
    }
    hubs = rep(NA, length(modules))
    names(hubs) = modules
    for (m in modules) {
        adj = adjacency(datExpr[, colorh == m], power = power, 
            type = type, ...)
        hub = which.max(rowSums(adj))
        hubs[m] = colnames(adj)[hub]
    }
    if (isIndex) {
        hubs = as.numeric(hubs)
        names(hubs) = modules
    }
    return(hubs)
}

The line that you'll probably want to change is:

hub = which.max(rowSums(adj))

Note that there are many hub-selection metrics.

Kevin

ADD COMMENT
2
Entering edit mode
2.3 years ago
tiancaigg ▴ 30
    # the grey module is omitted
topHubs <- function (datExpr, colorh, omitColors = "grey", power = 2, type = "signed", 
    ...) 
{
    # modified from chooseTopHubInEachModule, but return the table of all genes connectivity
    isIndex = FALSE
    modules = names(table(colorh))
    if (!is.na(omitColors)[1]) 
        modules = modules[!is.element(modules, omitColors)]
    if (is.null(colnames(datExpr))) {
        colnames(datExpr) = 1:dim(datExpr)[2]
        isIndex = TRUE
    }

    connectivity_table <- data.frame(matrix(ncol = 3)) %>% setNames(c('gene', 'connectivity_rowSums_adj', 'module'))
    hubs = rep(NA, length(modules))
    names(hubs) = modules
    for (m in modules) {
        adj = adjacency(datExpr[, colorh == m], power = power, 
            type = type, ...)

        hub = which.max(rowSums(adj))

        hubs[m] = colnames(adj)[hub]

        sorted_genes <- rowSums(adj) %>% sort(decreasing = T) %>% as.data.frame()  %>%  
                tibble::rownames_to_column() %>% setNames(c('gene', 'connectivity_rowSums_adj')) %>% mutate(module = m)
        connectivity_table <- connectivity_table %>% rbind(sorted_genes)



    }
    if (isIndex) {
        hubs = as.numeric(hubs)
        names(hubs) = modules
    }
    return(connectivity_table %>% na.omit)
}

hope this help. It needs dplyr. you can

connectivity_table= topHubs(dataExpr, colorh=  mergedColors, power=power, type=type)
connectivity_table %>% group_by(module) %>% top_n( 3, wt = connectivity_rowSums_adj)

which return top 3 hubs of each module.

ADD COMMENT

Login before adding your answer.

Traffic: 2070 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6