Question

Negative Zero Workaround in R for signed p-value sorting

0

Entering edit mode

8.9 years ago

mbio.kyle ▴ 380

I am working with the results of a pathway analysis experiment. I have a dataframe, rows are pathways and and columns are samples. For each sample I did RNAseq, and performed GSEA on the results. I then pulled out each pathway from GSEA results (hallmark) from the positive and negative correlation and their associated p-val. I'd like to make a heatmap of this with significant positive and significant negative on either ends and all the genes in the middle are not all that significant.

So here is what the data looks like:

NAME    signed-p-val
IL2_STAT5_SIGNALING -0.0000
INTERFERON_ALPHA_RESPONSE   -0.0055
ALLOGRAFT_REJECTION -0.0070
ESTROGEN_RESPONSE_EARLY -0.0103
MYOGENESIS  -0.0109
ANGIOGENESIS    -0.0203
APOPTOSIS   -0.0422
# I removed some but each list has the same length
# all 50 pathways from hallmark gene set
APICAL_JUNCTION -0.0428
WNT_BETA_CATENIN_SIGNALING  0.28242677
PROTEIN_SECRETION   0.28635347
HYPOXIA 0.61358315
UV_RESPONSE_UP  0.9225513
CHOLESTEROL_HOMEOSTASIS 0.92826086
TGF_BETA_SIGNALING  0.92060083
DNA_REPAIR  1

That is just a subset of the table, and I have three one for each condition. I did a signed p-value by setting the p-value for the negative enrichment pathways to negative. My issue now is if I sort the dataframe before heatmapping I get all the largely negative p-values at the top and all the largely positive p-values at the bottom. I tried using negative 0 ( -0.000 ) but it didn't work in R (as it does in python).

So I'd like to sort this thing like: -0 -> -1:1 -> 0

Here is the code I have so far. I am really an R novice, but I am guessing I am looking for a way to specify a sort function similar to how you can specify in python by defining the __cmp__ for a class etc etc.

library(pheatmap)
library(RColorBrewer)

sample1 = read.table("sample1.tsv", header=T, row.names=1, sep="\t")
sample2 = read.table("sample2.tsv", header=T, row.names=1, sep="\t")
sample3 = read.table("sample3.tsv", header=T, row.names=1, sep="\t")

merged <- merge(sample1, sample2, all=T, by="row.names")
rownames(merged) <- merged$Row.names
merged$Row.names <- NULL
merged <- merge(merged, sample3, all=T, by="row.names")
rownames(merged) <- merged$Row.names
merged$Row.names <- NULL
merged[is.na(merged)] <- 1
colnames(merged) <- c("sample1", "sample2", "sample3")


merged <- merged[order(rowSums(merged)),]
color <-  colorRampPalette(rev(brewer.pal(9, "RdBu")))(100)
pheatmap(merged, cluster_rows=F, cluster_cols=F, color = color)

R statistics • 2.7k views

ADD COMMENT • link updated 8.9 years ago by fanli.gcb ▴ 730 • written 8.9 years ago by mbio.kyle ▴ 380

score 0 · Answer 1 · 2016-04-06

You can do it in R like this:

Sample data:

df <- data.frame(NAME=c("A","B","C","D"), pval=c(-0.005, 0.002, -0.9, 0.8))

Sort by absolute value of the p-value:

out <- df[order(abs(df$pval)),]

Reverse the order of the positive p-value entries:

tmp <- subset(out, pval>0); tmp <- tmp[rev(1:nrow(tmp)),]

Put it all together:

out <- rbind(subset(out, pval<0), tmp)
out