Hi I have little bit R problem when I am trying to filtering my DGE data taken from edgeR. My trial data is like below:
SRR13826419_counts SRR13826420_counts SRR13826421_counts SRR13826422_counts SRR13826423_counts
ENSG00000000003.15 55.4289230485435 48.6042625938295 50.0892015765418 39.9978650657232 45.745056983312
ENSG00000000005.6 0 0 0 0 0
ENSG00000000419.14 36.8048049042329 36.6032101015259 35.981799866693 32.9242876425245 39.1630343957851
ENSG00000000457.14 6.799281227288 4.50039468461384 5.54785460499672 6.17330393297335 6.033520705233
ENSG00000000460.17 19.6587913745501 25.8022628584527 20.6063171042735 18.7771327961273 21.0624722800861
ENSG00000000938.13 0 0 0 0 0
SRR13826424_counts
ENSG00000000003.15 38.0779604481818
ENSG00000000005.6 0
ENSG00000000419.14 42.1830193812572
ENSG00000000457.14 5.5205964962048
ENSG00000000460.17 14.7215906565461
ENSG00000000938.13 0
I would like to filter out genes with expression greater than 1 (in CPM) in more than 50% of samples in at least 1 sample group.
My samples are in 2 conditions and they are triplicate.
CPM_matrix <- read.delim("~/Desktop/CPM_matrix.txt", row.names = "gene_id")
labels <- names(CPM_matrix)
CPM_matrix[labels] <- lapply(CPM_matrix[labels] , factor)
CPM_try <- head(CPM_matrix)
group <- factor(c(1,1,1,2,2,2))
countMatrix_groups <- data.frame(label=colnames(CPM_matrix),group)
aa <- noquote(countMatrix_groups[countMatrix_groups$group == '1',]$label[1])
bb <- noquote(countMatrix_groups[countMatrix_groups$group == '1',]$label[2])
cc <- noquote(countMatrix_groups[countMatrix_groups$group == '1',]$label[3])
dd <- noquote(countMatrix_groups[countMatrix_groups$group == '2',]$label[1])
ee <- noquote(countMatrix_groups[countMatrix_groups$group == '2',]$label[2])
ff <- noquote(countMatrix_groups[countMatrix_groups$group == '2',]$label[3])
for (i in 1:nrow(CPM_try)) {
aaa <- filter(CPM_try, aa < 1 & bb < 1 | aa < 1 & cc < 1 | bb < 1 & cc < 1 | dd < 1 & ee < 1 | dd < 1 & ff < 1 | ee < 1 & ff < 1 )
}
When I apply this code it gives me nothing, there is sth wrong but I couldnt figure out thanks in advance.
Another very common and automated filter prior to the DE analysis would be
filterByExpr
from edgeR.