Multiple comparisons in proteomic data
1
0
Entering edit mode
4.6 years ago
Peter ▴ 20

Hello. I have a question about the application of the dunn.test function. I'm new to R and sorry if this question is basic. I have cancer proteomics data, I have four groups: CTRL (n = 8), GBM1 (n = 6), GBM2 (n = 6), GBM3 (n = 6).

I need to know which proteins are regulated differently between groups. After reading it, I found out that I have to apply a non-parametric test and do a post hoc analysis to find out the comparisons in pairs.

Well, I started working with my data as follows:

GROUP     P04899     P04406    Q93077
CTRL
CTRL
GBM1
GBM2
GBM3

I have over 2000 proteins

I started from the basics and saw that I have to apply this command:

A1 <- dunnTest (GBMTable$P04899 ~GBMTable$ Group, method = "bh")

Dunn (1964) Kruskal-Wallis multiple comparison
  p-values adjusted with the Benjamini-Hochberg method.

  Comparison          Z     P.unadj      P.adj
1      CTRL - GBM1 -1.1322770 0.257517982 0.38627697
2      CTRL - GBM2 -2.1513264 0.031450449 0.09435135
3      GBM1 - GBM2 -1.0190493 0.308179547 0.36981546
4      CTRL -GBM3  0.7925939 0.428014450 0.42801445
5      GBM1 - GBM3  1.9248710 0.054245504 0.10849101
6      GBM2 - GBM3  2.9439203 0.003240835 0.01944501

But I wanted to do this for all proteins automatically, possibly using a "for" command, I believe. And I would like to save all the results in a file to open in excel.

Does anyone have a tip?

Thanks!

R • 813 views
ADD COMMENT
2
Entering edit mode
4.6 years ago

Hi, nice to see somebody else who uses Dunn's test. I happened to have some data open in my terminal.

There are different ways to do this, via a standard for() loop, apply(), lapply(), or parallelised functions like foreach(), mclapply(), or parLapply().

dunnTest() does not have a large footprint or computational cost, so, we can just use lapply():

1, the data:

modeling[1:5,1:5]
  DiseaseStatus     gene1      gene2      gene3
  Disease_3         -8.484839  -2.2394204 -0.6758937
  Disease_2         -8.304133  -1.9972881 -0.6571702
  Disease_2         -7.654713  -1.1625870 -0.3728583
  Disease_2         -6.077792   0.5142025  0.2103687
  Disease_3         -8.232873  -2.3506886 -0.6251163
  gene4
  -1.7088455
  -1.6124870
  -1.7861746
  -0.1267115
  -1.3069726

dim(modeling)
[1] 81 94

2, define variables to test

genes <- colnames(modeling)[2:ncol(modeling)]
head(genes)
[1] "gene1" "gene2" "gene3" "gene4" "gene5" "gene6"

3, perform Dunn's test and store results in a list

require(FSA)
Carregando pacotes exigidos: FSA
## FSA v0.8.30. See citation('FSA') if used in publication.
## Run fishR() for related website and fishR('IFAR') for related book.
Attaching package: ‘FSA’

result <- lapply(genes,
  function(x) {
    dunnTest(as.formula(paste0(x, '~ DiseaseStatus')),
      data = modeling, method = 'bh')$res
  })

# name the list items by the variables tested
names(result) <- genes
result[[1]]

4, bind the list objects together into a 'master' results table, and tidy rownames

# old way via do.call()
result_final <- do.call(rbind, result)
result_final <- data.frame(gene = gsub('\\.[0-9]*$', '', rownames(result_final)), result_final)

# 'modern' way would be via *data.table::rbindlist*

head(result_final)
 gene  Comparison             Z          P.unadj   P.adj
 gene1 Disease_1 - Disease_2 -0.02897449 0.9768849 0.9768849
 gene1 Disease_1 - Disease_3  0.49975886 0.6172449 0.7122056
 gene1 Disease_2 - Disease_3  0.63644611 0.5244857 0.7867285
 gene1 Disease_1 - Healthy_1  0.91243067 0.3615421 0.6025701
 gene1 Disease_2 - Healthy_1  1.09102134 0.2752635 0.6881587
 gene1 Disease_3 - Healthy_1  0.49975886 0.6172449 0.7715561

tail(result_final)
 gene   Comparison             Z          P.unadj   P.adj
 gene93 Healthy_1 - Healthy_2 -0.05961193 0.9524647 0.9524647
 gene93 Disease_1 - Healthy_3 -1.34851137 0.1774940 0.8874698
 gene93 Disease_2 - Healthy_3  0.61426075 0.5390430 0.6738038
 gene93 Disease_3 - Healthy_3  1.14281522 0.2531153 0.4745912
 gene93 Healthy_1 - Healthy_3 -0.15156180 0.8795326 1.0000000
 gene93 Healthy_2 - Healthy_3 -0.10310063 0.9178831 0.9834462

5, write out to TSV

write.table(result_final,
  'DunnsTestresults.tsv',
  row.names = FALSE, sep = '\t', quote = FALSE)

If there are NA values in your data, it will cause an error somewhere. So, sort those out before running dunnTest().

Kevin

ADD COMMENT
1
Entering edit mode

Thank you very much Kevin. It worked perfectly and in a few minutes I received my answer. I am relieved!

ADD REPLY

Login before adding your answer.

Traffic: 1733 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6