Hello, need some help here as I'm kind of stuck with the edgeR DGElist format.
I have a DGE list named x with the following dimensions:
> dim(x )
[1] 11301 52
It's containing a RNA-seq count matrix, I have gene IDs as rows and sample names as columns (as an usual count matrix). I made the filtering, normalization steps and made a differential expression analysis.
Now, following my observations on the DE analysis, I want to filter the DGE list by the following rule:
I want to keep only the genes that are expressed (so > 0) at the same time in each sample as a particular gene with a particular gene id (by the way, there are no replicates in my data).
Maybe there is already an edgeR function that does it (I don't know if filterByExpr
by can be used for this). Maybe with grep
?
Any ideas?
An example would help. What is "time" here?
"time" here is the sample. I should have added more information to my question.
I have multiple samples (52) in which there several conditions: Time 0, + 5 hours, + 1day and +8 days. Also if there is the presence or not of a marker. Here's an example:
So you want genes that only have values > 0?
not really as I want to filter my columns in order to keep only the ones where my gene of interest is > 0. And I don't want to loose the DGE list object information so I don't want to convert it into a simple matrix or data.frame
If I understood right, you are looking to filter genes with raw count values > 0 and at the same time you want to save your gene of interest, am I right? Would you like to share a snippet of your code, please?
I'm changing a bit my question now that I thought of it more. I just one to keep the sample columns where the row count values of this gene are >0, so the condition is true
I have for now
> x$counts["ENSMUSG00000028369",] > 0
I only want to keep the true ones. I can't use neither
subset
orwhich
method as it's a DGElist object and not a simple matrix.Okay, I think I found how to do it
columns_keeped <- x$counts["ENSMUSG00000028369",] > 0
x_filtered <- x[,columns_keeped]
It was not so difficult, sometimes you're just tired and don't think of the easy solutions, I hope it worked well with the dge list object, I hope it didn't messed up with the keys keeping the values together inside the dge list.
Yes, that was the best way. By the way, I suggest you to normalize your raw counts using the
cpm
function and filter your columns based on the abundance of your gene. It would be something like this:As I understood this step is executed previous to the creation of the
DGEList
.Best regards!
Okay then, thank you for the suggestion! I'll keep it in mind :)