Gene-filterin In R
1
0
Entering edit mode
3.5 years ago
soniabedi.07 ▴ 30

Hi All,

I am trying to filter out genes from my gene-fusion excel file. I want to remove any gene which is getting repeated more than 10 times in the column. I am doing this in R.

Any suggestions??

Thanks in advance.

R • 766 views
ADD COMMENT
0
Entering edit mode

Using plain R , this should work :

# save your data as a csv and read that into R
df <- read.csv ('data.csv') # this may have only one column by the name "gene"

# count occurrence of each gene by table() 
countTable <- data.frame(table(df$gene)) # this table has two columns Var1, and Freq which is gene name and its frequency

# list of genes with more than 10 time occurrence 
repGene <- as.character(countTable$Var1[countTable$Freq > 10])

# Select genes with lower than 10 time occurrence:
res <- df$gene[-which(df$gene %in% repGene)]
ADD REPLY
0
Entering edit mode

Thank you Hamid. Let me try it out.

ADD REPLY
0
Entering edit mode
3.5 years ago
Dunois ★ 2.8k

Assuming you have your data stored in a data.frame named df, you could use dplyr::group_by and dplyr::n to count the number of instances of each value in the target column (e.g., a), and add these as a new column (b). Then you can filter the data.frame using this new column.

library(magrittr)
library(dplyr)

df %>% 
  group_by(a) %>% 
  mutate(b = n()) %>% 
  ungroup() %>% 
  filter(b <= 10) %>% 
  select(-b)
`
ADD COMMENT
0
Entering edit mode

Thank you. Let me try it out

ADD REPLY

Login before adding your answer.

Traffic: 1899 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6