I have the following object in a basic DESeq2 bulk RNA-seq differential expression pipeline (human data). It filters out the genes that have low counts but on top of that I would like to remove a couple of genes that I know have issues in my dataset, and want to see how my analysis looks without them. I have the list of such genes in a vector named "genes" and it's encoded as gene symbols (I could transform them to EnsemblIDs if needed).
genesToRemove<- c("gene1","gene2","gene3","gene4","gene5","gene6")
dds <- DESeqDataSetFromHTSeqCount(sampleTable = mytable,directory = directory,design= ~ condition)
dds
class: DESeqDataSet
dim: 60725 326
metadata(1): version
assays(1): counts
rownames(60725): ENSG00000278625.1 ... ENSG00000277374.1
rowData names(0):
colnames(326): 9275 9351 ... 10146 10199
colData names(5): Condition Age ...
genes_to_keep <- rowSums(counts(dds)) >= 50
dds2 <- dds[genes_to_keep,]
I would like to do it at this point, after this code, so that then I keep going without them. The problem is that I am not sure how to access the part of the dds2 object where you have the genes in order to filter them out. Any thoughts? Thank you.
The answer below lists several possible strategies. In general the
DESeqDataSet
is basically a SummarizedExperiment so all the SE filtering options apply. You can either provide a list of genes by names (rownames) to keep, or a numeric or a logical vector as you would do for most other R data objects. Does that make sense?https://www.bioconductor.org/packages/devel/bioc/vignettes/SummarizedExperiment/inst/doc/SummarizedExperiment.html#subsetting