Hello, I need to filter genes from data: >10 reads in at least 2 replicates.
I tried with these lines of code:
Expressedgenes=counts>10
NumExpressedgenes=apply(Expressedgenes,1,sum)
FilteredCounts=counts[NumExpressedgenes>0,]
The first line is a logical vector that shows the genes with at least 10 reads (rows), but if i write these lines I select only genes with 10> reads in at least 1 replicate (replicates) are columns. So how can I write a code that select TRUE rows in at least 2 columns (replicates)?
I am sorry but I am trying to paste the output of Expressedgenes=counts>10 and I dont know how to do it properly.
counts>=10
GTEX-Y5V6-0526-SM-4VBRV GTEX-1KXAM-1726-SM-D3LAE GTEX-18A67-0826-SM-7KFTI GTEX-14BMU-0226-SM-5S2QA
ENSG00000243485 FALSE FALSE FALSE FALSE
ENSG00000237613 FALSE FALSE FALSE FALSE
ENSG00000186092 FALSE FALSE FALSE FALSE
ENSG00000238009 FALSE FALSE FALSE TRUE
ENSG00000222623 FALSE FALSE FALSE FALSE
ENSG00000241599 FALSE FALSE FALSE FALSE
ENSG00000236601 FALSE FALSE FALSE FALSE
ENSG00000235146 FALSE FALSE FALSE FALSE
ENSG00000223181 FALSE FALSE FALSE FALSE
ENSG00000237491 TRUE TRUE TRUE TRUE
ENSG00000177757 TRUE TRUE TRUE TRUE
ENSG00000225880 TRUE TRUE TRUE TRUE
ENSG00000230368 FALSE FALSE FALSE FALSE
ENSG00000272438 FALSE FALSE TRUE FALSE
ENSG00000230699 TRUE TRUE TRUE TRUE
ENSG00000241180 FALSE FALSE FALSE FALSE
GTEX-13PVR-0626-SM-5S2RC GTEX-1211K-0726-SM-5FQUW GTEX-1KXAM-0926-SM-CXZKA GTEX-18A67-2626-SM-718AD
ENSG00000243485 FALSE FALSE FALSE FALSE
ENSG00000237613 FALSE FALSE FALSE FALSE
ENSG00000186092 FALSE FALSE FALSE FALSE
ENSG00000238009 FALSE FALSE FALSE FALSE
ENSG00000222623 FALSE FALSE FALSE FALSE
ENSG00000241599 FALSE FALSE FALSE FALSE
ENSG00000236601 FALSE FALSE FALSE FALSE
ENSG00000235146 FALSE FALSE FALSE FALSE
ENSG00000223181 FALSE FALSE FALSE FALSE
ENSG00000237491 TRUE TRUE TRUE TRUE
ENSG00000177757 TRUE TRUE TRUE TRUE
ENSG00000225880 TRUE TRUE TRUE TRUE
ENSG00000230368 FALSE TRUE FALSE FALSE
ENSG00000272438 FALSE FALSE FALSE FALSE
ENSG00000230699 TRUE TRUE TRUE TRUE
ENSG00000241180 FALSE FALSE FALSE FALSE
GTEX-14BMU-1126-SM-5RQJ8 GTEX-1211K-1426-SM-5FQTF GTEX-11TT1-0726-SM-5GU5A GTEX-1HCUA-1626-SM-A9SMG
ENSG00000243485 FALSE FALSE FALSE FALSE
ENSG00000237613 FALSE FALSE FALSE FALSE
ENSG00000186092 FALSE FALSE FALSE FALSE
ENSG00000238009 FALSE FALSE FALSE TRUE
ENSG00000222623 FALSE FALSE FALSE FALSE
ENSG00000241599 FALSE FALSE FALSE FALSE
ENSG00000236601 FALSE FALSE FALSE FALSE
ENSG00000235146 FALSE FALSE FALSE FALSE
ENSG00000223181 FALSE FALSE FALSE FALSE
ENSG00000237491 TRUE TRUE TRUE TRUE
ENSG00000177757 FALSE TRUE TRUE TRUE
ENSG00000225880 TRUE TRUE TRUE TRUE
ENSG00000230368 FALSE FALSE FALSE FALSE
ENSG00000272438 FALSE FALSE FALSE FALSE
ENSG00000230699 FALSE TRUE TRUE TRUE
ENSG00000241180 FALSE FALSE FALSE FALSE
Your post history suggests that you are some sort of edgeR-like analysis on RNA-seq data. Don't do custom things unless you know what you do. Follow the edgeR manual, they have a dedicated function for filtering RNA-seq data prior to expression analysis called
edgeR::filterByExpr()
.