Question

How to filter out a list of specific genes from the DESeq object in R - bulk RNA-seq differential expression

1

Entering edit mode

3.8 years ago

msimmer92 ▴ 310

I have the following object in a basic DESeq2 bulk RNA-seq differential expression pipeline (human data). It filters out the genes that have low counts but on top of that I would like to remove a couple of genes that I know have issues in my dataset, and want to see how my analysis looks without them. I have the list of such genes in a vector named "genes" and it's encoded as gene symbols (I could transform them to EnsemblIDs if needed).

genesToRemove<- c("gene1","gene2","gene3","gene4","gene5","gene6")

dds <- DESeqDataSetFromHTSeqCount(sampleTable = mytable,directory = directory,design= ~ condition)

dds

class: DESeqDataSet 
dim: 60725 326 
metadata(1): version
assays(1): counts
rownames(60725): ENSG00000278625.1 ... ENSG00000277374.1
rowData names(0):
colnames(326): 9275 9351 ... 10146 10199
colData names(5): Condition Age ...

genes_to_keep <- rowSums(counts(dds)) >= 50
dds2 <- dds[genes_to_keep,]

I would like to do it at this point, after this code, so that then I keep going without them. The problem is that I am not sure how to access the part of the dds2 object where you have the genes in order to filter them out. Any thoughts? Thank you.

R bulk object DESeq2 filter • 7.3k views

ADD COMMENT • link 3.8 years ago by msimmer92 ▴ 310

1

Entering edit mode

The answer below lists several possible strategies. In general the DESeqDataSet is basically a SummarizedExperiment so all the SE filtering options apply. You can either provide a list of genes by names (rownames) to keep, or a numeric or a logical vector as you would do for most other R data objects. Does that make sense?

https://www.bioconductor.org/packages/devel/bioc/vignettes/SummarizedExperiment/inst/doc/SummarizedExperiment.html#subsetting

ADD REPLY • link 3.8 years ago by ATpoint 87k

score 4 · Accepted Answer · 2021-04-25

4

Entering edit mode

3.8 years ago

rodolfo.peacewalker ▴ 390

Hi!

In this case you could try this one:

#Obtain the indices of only desired genes
genesToRemove <- which(!rownames(dds) %in% genesToRemove)

#Cut your desired genes in the DESeq object
dds <- dds[genesToRemove, ]

#Verify that undesired genes are removed from DESeq object
genesToRemove %in% rownames(dds)

And the result must be FALSE for every undesired gene.

Best regards!

ADD COMMENT • link 3.8 years ago by rodolfo.peacewalker ▴ 390

2

Entering edit mode

Or alternatively use setdiff:

dds[setdiff(rownames(dds), genesToRemove),]

ADD REPLY • link 3.8 years ago by ATpoint 87k

1

Entering edit mode

yes, this is what I had in mind but couldn't get it right . Thanks to both! I also didn't know setdiff, this is also a good thing to know. I like that it is more compact.

ADD REPLY • link 3.8 years ago by msimmer92 ▴ 310