Question

Gene-filterin In R

0

Entering edit mode

3.5 years ago

soniabedi.07 ▴ 30

Hi All,

I am trying to filter out genes from my gene-fusion excel file. I want to remove any gene which is getting repeated more than 10 times in the column. I am doing this in R.

Any suggestions??

Thanks in advance.

R • 766 views

ADD COMMENT • link 3.5 years ago by soniabedi.07 ▴ 30

0

Entering edit mode

Using plain R , this should work :

# save your data as a csv and read that into R
df <- read.csv ('data.csv') # this may have only one column by the name "gene"

# count occurrence of each gene by table() 
countTable <- data.frame(table(df$gene)) # this table has two columns Var1, and Freq which is gene name and its frequency

# list of genes with more than 10 time occurrence 
repGene <- as.character(countTable$Var1[countTable$Freq > 10])

# Select genes with lower than 10 time occurrence:
res <- df$gene[-which(df$gene %in% repGene)]

ADD REPLY • link 3.5 years ago by Hamid Ghaedi 3.3k

0

Entering edit mode

Thank you Hamid. Let me try it out.

ADD REPLY • link 3.5 years ago by soniabedi.07 ▴ 30

score 0 · Answer 1 · 2021-06-02

0

Entering edit mode

3.5 years ago

Dunois ★ 2.8k

Assuming you have your data stored in a data.frame named df, you could use dplyr::group_by and dplyr::n to count the number of instances of each value in the target column (e.g., a), and add these as a new column (b). Then you can filter the data.frame using this new column.

library(magrittr)
library(dplyr)

df %>% 
  group_by(a) %>% 
  mutate(b = n()) %>% 
  ungroup() %>% 
  filter(b <= 10) %>% 
  select(-b)
`