How to filter low expressed genes based on TPM expression?
1
1
Entering edit mode
5.7 years ago
newbie ▴ 130

I have a dataset with 50k genes as rows in a dataframe and there are 500 samples as columns with TPM expression values. I want to classify these tumor samples samples into two groups i.e. Gene_High and Gene_low based on TPM expression values.

Before that I want to filter out low expressed genes, which are of no use. There are some genes with showing TPM value for only 50 samples and the rest 450 samples are 0.

So, if I have a dataframe df with 50k genes as rows and 500 samples as columns how to fillter out low expressed genes? How to give the command to filter out low expressed genes in R?

RNA-Seq r tpm gene filtering • 5.3k views
ADD COMMENT
6
Entering edit mode
5.7 years ago
Prakash ★ 2.2k

somebody wrote this code on biostars but i don't remember the post. see if this could help.

count <- read.csv("count.txt",sep = "\t",header = T,row.names=1)
head(count)
#Remove rows if count is < zero in 50% of sample
rem <- function(x){
  x <- as.matrix(x)
  x <- t(apply(x,1,as.numeric))
  r <- as.numeric(apply(x,1,function(i) sum(i == 0) ))
  remove <- which(r > dim(x)[2]*0.5)
  return(remove)
}
remove <- rem(count)
countdata <- count[-remove,]
ADD COMMENT

Login before adding your answer.

Traffic: 2041 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6