Question

Splitting data frame based on median value

0

Entering edit mode

4.7 years ago

imrankhanbioinfo ▴ 70

Hi everyone, I am working on the TCGA cancer cohort where I have RNA-counts and clinical information merge into one big file. I wanted to split this file into a higher and lower expression based on the median value of one gene. I used two R scripts, unfortunately, both of them do not work as I was expecting: The first script split the data frame but keep only the genes count matrix with no matched clinical information, Which was something I wasn't expected. The second one was so memory intense takes ages to run then come up with an error.

First:

med<-median(df2$gene)
upper_median<-df[which(df2$gene >= med]
lower_median<-df[which(df2$gene < med]

Second:

med<-median(df2$gene)
upper<-split(df, which(df$gene >= med), drop = TRUE)
lower<-split(df, which(df$gene < med), drop = TRUE)

Any idea what I am missing or doing wrong??

Thank you very much! Imran

R RNA-Seq • 698 views

ADD COMMENT • link 4.7 years ago by imrankhanbioinfo ▴ 70

GenoMax · Accepted Answer · 2020-11-17

1

Entering edit mode

4.7 years ago

imrankhanbioinfo ▴ 70

I try to solve the issue, fortunately, Just sharing the script if anyone has the same issue.

med<-median(df2$gene)
upper_median<-df[which(df2$gene >= med),]
lower_median<-df[which(df2$gene < med),]

ADD COMMENT • link updated 4.7 years ago by GenoMax 152k • written 4.7 years ago by imrankhanbioinfo ▴ 70