Dear All,
I am trying to filter most variable genes for my specific analysis. I have normalized count from DEseq2 attached here with row for genes and column for sample ID.
I have found code chunk in Biostar with the following
**data$variance= apply(data, 1, var)
data2 = data[data$variance >= quantile(data$variance, c(.50)),] #50% most variable genes
data2$variance <- NULL
summary(data2)**
These code chunk worked perfectly if I excluded the gene column from the dataframe during filtering. However, the new could matrix after filtering have no geneID as I excluded gene column during filtering. Once I added the gene column back for filtering, it won't work and it might be because gene column is factor.
Any member knows how I could modifiy this code chunk so that I could retain the geneID column in the new count matrix after filtering. Or if anyone could suggest any other better way and could provide code chunk, I will highly appreciate. I understand this is code problem, but I could not make it to work for the whole day! and sorry for many posts recently as I'm still learning RNAseq data analysis.
Looking forward to responses soon and thank in advance for your help.
Kind Regards,
synat
Are your gene names not the rownames of data?