Kruskal Wallis test, gene expression, RNA-seq
1
0
Entering edit mode
2.7 years ago
Rob ▴ 170

Hi friends,

Does anyone have an R code for the Kruskal-Wallis test for gene expression data?

60 columns and 230 rows

Thanks

RNA-seq kruskal-wallis • 2.2k views
ADD COMMENT
3
Entering edit mode
2.7 years ago

Ah, the Kruskal-Wallis --the non-parametric ANOVA-- my favourite.

Here is some code for you:

In short, the function is kruskal.test()

As you will see, you will require some metadata, the rows of which will have to be aligned to your expression data columns, which you will have to transpose (the expression data needs to be transposed), such that genes are columns and samples are rows. This metadata should contain one or more categorical variables that each contain two or more groups (like disease|nomal, group1|group2|group3, et cetera), the expression of a given gene across which you will compute the Kruskal-Wallis test.

Kevin

ADD COMMENT
0
Entering edit mode

Thank you Kevin The code you shared is for one column (Ozone). My problem is I don't know how to customize it for multiple columns. Is there any way?

ADD REPLY
4
Entering edit mode

There are a few different ways to do that. Here is a reproducible example that eventually outputs a named vector of p-values, which I trust that you can adapt to your own data:

mydata <- data.frame(
  group = c(rep(c('A','B','C'), 6)),
  gene1 = runif(n = 18, min = -10, max = 10),
  gene2 = runif(n = 18, min = -10, max = 10),
  gene3 = runif(n = 18, min = -10, max = 10))
mydata$group <- factor(mydata$group, levels = c('A','B','C'))
mydata
   group      gene1      gene2       gene3
1      A -6.5828614  7.2061653 -4.78849342
2      B  8.0654531  6.8851541 -6.66095607
3      C  0.9716438  9.7684197  5.91039503
4      A  0.2177755 -9.5862999 -9.21830310
5      B  7.1282572 -6.5170962  0.72348540
6      C  4.6401339  5.1515160 -9.60944901
7      A  2.3342942 -3.3071833 -2.87845721
8      B  1.8221881  4.3000686  1.13610765
9      C  3.6113445 -2.5834670 -9.50316515
10     A  8.0464613 -1.1339911  3.40090883
11     B  8.6698397 -5.4779950 -6.58398749
12     C  7.8709801 -5.7729320 -1.81991194
13     A  6.7190791 -4.4822967 -8.43157336
14     B  5.0844176 -5.3323176 -0.08187998
15     C -7.9576522  1.5789440 -0.64270702
16     A  3.6994011 -3.8620588  9.41000514
17     B  4.7855473 -7.2345133 -8.80236299
18     C  7.5792729  0.5457233  3.74570574


# run a test
  kruskal.test(gene1 ~ group, data = mydata)
  kruskal.test(gene1 ~ group, data = mydata)$p.value
  [1] 0.2778423


genes <- colnames(mydata)[-1]
genes
[1] "gene1" "gene2" "gene3"


res <- sapply(genes, function(x) {
  f <- as.formula(paste0(x, ' ~ group'))
  model <- kruskal.test(f, data = mydata)
  p <- model$p.value
  p
})
names(res) <- genes
res
    gene1     gene2     gene3 
0.2778423 0.3469830 0.9941691

We're unlucky this time, as, in this random data generated on this glorious Sunday of May 1st 2022, nothing was statistically significant.

ADD REPLY
0
Entering edit mode

Thanks Kevin, I didnot understand, do we have one data and one metadata files? or just one file?

So, I tried to run with my data file and I got this error :

code chunk:

res <- sapply(genes, function(x) {
+   f <- as.formula(paste0(x, ' ~ group'))
+   model <- kruskal.test(f, data = data)
+   p <- model$p.value
+   p
+ })

Error in eval(predvars, data, env) : object 'ERVV' not found
ADD REPLY
0
Entering edit mode

You need to figure this out on your own - sorry. Somebody else may additionally help.

ADD REPLY

Login before adding your answer.

Traffic: 1008 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6