Negative Zero Workaround in R for signed p-value sorting
1
0
Entering edit mode
8.7 years ago
mbio.kyle ▴ 380

I am working with the results of a pathway analysis experiment. I have a dataframe, rows are pathways and and columns are samples. For each sample I did RNAseq, and performed GSEA on the results. I then pulled out each pathway from GSEA results (hallmark) from the positive and negative correlation and their associated p-val. I'd like to make a heatmap of this with significant positive and significant negative on either ends and all the genes in the middle are not all that significant.

So here is what the data looks like:

NAME    signed-p-val
IL2_STAT5_SIGNALING -0.0000
INTERFERON_ALPHA_RESPONSE   -0.0055
ALLOGRAFT_REJECTION -0.0070
ESTROGEN_RESPONSE_EARLY -0.0103
MYOGENESIS  -0.0109
ANGIOGENESIS    -0.0203
APOPTOSIS   -0.0422
# I removed some but each list has the same length
# all 50 pathways from hallmark gene set
APICAL_JUNCTION -0.0428
WNT_BETA_CATENIN_SIGNALING  0.28242677
PROTEIN_SECRETION   0.28635347
HYPOXIA 0.61358315
UV_RESPONSE_UP  0.9225513
CHOLESTEROL_HOMEOSTASIS 0.92826086
TGF_BETA_SIGNALING  0.92060083
DNA_REPAIR  1

That is just a subset of the table, and I have three one for each condition. I did a signed p-value by setting the p-value for the negative enrichment pathways to negative. My issue now is if I sort the dataframe before heatmapping I get all the largely negative p-values at the top and all the largely positive p-values at the bottom. I tried using negative 0 ( -0.000 ) but it didn't work in R (as it does in python).

So I'd like to sort this thing like: -0 -> -1:1 -> 0

Here is the code I have so far. I am really an R novice, but I am guessing I am looking for a way to specify a sort function similar to how you can specify in python by defining the __cmp__ for a class etc etc.

library(pheatmap)
library(RColorBrewer)

sample1 = read.table("sample1.tsv", header=T, row.names=1, sep="\t")
sample2 = read.table("sample2.tsv", header=T, row.names=1, sep="\t")
sample3 = read.table("sample3.tsv", header=T, row.names=1, sep="\t")

merged <- merge(sample1, sample2, all=T, by="row.names")
rownames(merged) <- merged$Row.names
merged$Row.names <- NULL
merged <- merge(merged, sample3, all=T, by="row.names")
rownames(merged) <- merged$Row.names
merged$Row.names <- NULL
merged[is.na(merged)] <- 1
colnames(merged) <- c("sample1", "sample2", "sample3")


merged <- merged[order(rowSums(merged)),]
color <-  colorRampPalette(rev(brewer.pal(9, "RdBu")))(100)
pheatmap(merged, cluster_rows=F, cluster_cols=F, color = color)
R statistics • 2.6k views
ADD COMMENT
0
Entering edit mode
8.7 years ago
fanli.gcb ▴ 730

You can do it in R like this:

Sample data:

df <- data.frame(NAME=c("A","B","C","D"), pval=c(-0.005, 0.002, -0.9, 0.8))

Sort by absolute value of the p-value:

out <- df[order(abs(df$pval)),]

Reverse the order of the positive p-value entries:

tmp <- subset(out, pval>0); tmp <- tmp[rev(1:nrow(tmp)),]

Put it all together:

out <- rbind(subset(out, pval<0), tmp)
out
ADD COMMENT

Login before adding your answer.

Traffic: 2290 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6