How can I filter genes >10 reads in at least 2 replicates
2
0
Entering edit mode
2.4 years ago
Luca ▴ 10

Hello, I need to filter genes from data: >10 reads in at least 2 replicates.

I tried with these lines of code:

Expressedgenes=counts>10
NumExpressedgenes=apply(Expressedgenes,1,sum)
FilteredCounts=counts[NumExpressedgenes>0,]

The first line is a logical vector that shows the genes with at least 10 reads (rows), but if i write these lines I select only genes with 10> reads in at least 1 replicate (replicates) are columns. So how can I write a code that select TRUE rows in at least 2 columns (replicates)?

I am sorry but I am trying to paste the output of Expressedgenes=counts>10 and I dont know how to do it properly.

counts>=10
                GTEX-Y5V6-0526-SM-4VBRV GTEX-1KXAM-1726-SM-D3LAE GTEX-18A67-0826-SM-7KFTI GTEX-14BMU-0226-SM-5S2QA
ENSG00000243485                   FALSE                    FALSE                    FALSE                    FALSE
ENSG00000237613                   FALSE                    FALSE                    FALSE                    FALSE
ENSG00000186092                   FALSE                    FALSE                    FALSE                    FALSE
ENSG00000238009                   FALSE                    FALSE                    FALSE                     TRUE
ENSG00000222623                   FALSE                    FALSE                    FALSE                    FALSE
ENSG00000241599                   FALSE                    FALSE                    FALSE                    FALSE
ENSG00000236601                   FALSE                    FALSE                    FALSE                    FALSE
ENSG00000235146                   FALSE                    FALSE                    FALSE                    FALSE
ENSG00000223181                   FALSE                    FALSE                    FALSE                    FALSE
ENSG00000237491                    TRUE                     TRUE                     TRUE                     TRUE
ENSG00000177757                    TRUE                     TRUE                     TRUE                     TRUE
ENSG00000225880                    TRUE                     TRUE                     TRUE                     TRUE
ENSG00000230368                   FALSE                    FALSE                    FALSE                    FALSE
ENSG00000272438                   FALSE                    FALSE                     TRUE                    FALSE
ENSG00000230699                    TRUE                     TRUE                     TRUE                     TRUE
ENSG00000241180                   FALSE                    FALSE                    FALSE                    FALSE
                GTEX-13PVR-0626-SM-5S2RC GTEX-1211K-0726-SM-5FQUW GTEX-1KXAM-0926-SM-CXZKA GTEX-18A67-2626-SM-718AD
ENSG00000243485                    FALSE                    FALSE                    FALSE                    FALSE
ENSG00000237613                    FALSE                    FALSE                    FALSE                    FALSE
ENSG00000186092                    FALSE                    FALSE                    FALSE                    FALSE
ENSG00000238009                    FALSE                    FALSE                    FALSE                    FALSE
ENSG00000222623                    FALSE                    FALSE                    FALSE                    FALSE
ENSG00000241599                    FALSE                    FALSE                    FALSE                    FALSE
ENSG00000236601                    FALSE                    FALSE                    FALSE                    FALSE
ENSG00000235146                    FALSE                    FALSE                    FALSE                    FALSE
ENSG00000223181                    FALSE                    FALSE                    FALSE                    FALSE
ENSG00000237491                     TRUE                     TRUE                     TRUE                     TRUE
ENSG00000177757                     TRUE                     TRUE                     TRUE                     TRUE
ENSG00000225880                     TRUE                     TRUE                     TRUE                     TRUE
ENSG00000230368                    FALSE                     TRUE                    FALSE                    FALSE
ENSG00000272438                    FALSE                    FALSE                    FALSE                    FALSE
ENSG00000230699                     TRUE                     TRUE                     TRUE                     TRUE
ENSG00000241180                    FALSE                    FALSE                    FALSE                    FALSE
                GTEX-14BMU-1126-SM-5RQJ8 GTEX-1211K-1426-SM-5FQTF GTEX-11TT1-0726-SM-5GU5A GTEX-1HCUA-1626-SM-A9SMG
ENSG00000243485                    FALSE                    FALSE                    FALSE                    FALSE
ENSG00000237613                    FALSE                    FALSE                    FALSE                    FALSE
ENSG00000186092                    FALSE                    FALSE                    FALSE                    FALSE
ENSG00000238009                    FALSE                    FALSE                    FALSE                     TRUE
ENSG00000222623                    FALSE                    FALSE                    FALSE                    FALSE
ENSG00000241599                    FALSE                    FALSE                    FALSE                    FALSE
ENSG00000236601                    FALSE                    FALSE                    FALSE                    FALSE
ENSG00000235146                    FALSE                    FALSE                    FALSE                    FALSE
ENSG00000223181                    FALSE                    FALSE                    FALSE                    FALSE
ENSG00000237491                     TRUE                     TRUE                     TRUE                     TRUE
ENSG00000177757                    FALSE                     TRUE                     TRUE                     TRUE
ENSG00000225880                     TRUE                     TRUE                     TRUE                     TRUE
ENSG00000230368                    FALSE                    FALSE                    FALSE                    FALSE
ENSG00000272438                    FALSE                    FALSE                    FALSE                    FALSE
ENSG00000230699                    FALSE                     TRUE                     TRUE                     TRUE
ENSG00000241180                    FALSE                    FALSE                    FALSE                    FALSE
PCA r • 1.0k views
ADD COMMENT
0
Entering edit mode

Your post history suggests that you are some sort of edgeR-like analysis on RNA-seq data. Don't do custom things unless you know what you do. Follow the edgeR manual, they have a dedicated function for filtering RNA-seq data prior to expression analysis called edgeR::filterByExpr().

ADD REPLY
1
Entering edit mode
2.4 years ago
Marco Pannone ▴ 810

Assuming you have a count matrix (which we call "count_matrix") with genes as rows and replicates as columns, if you don't care in which of the 2 replicates these genes have >10 reads, you can do this:

filt <- rowSums(count_matrix[1:2]) > 10 
subset_count_matrix <- count_matrix[filt,]

Hope I understood your aim correctly, it was a bit tricky to figure out exactly what you meant.

ADD COMMENT
1
Entering edit mode

Yes It worked, I am sorry for being unclear, thank you and have a good evening.

ADD REPLY
1
Entering edit mode
2.4 years ago
counts[rowSums(counts > 10) > 1, ]
ADD COMMENT

Login before adding your answer.

Traffic: 2135 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6