Hi All, I have RNA-Seq data on some cancer samples. I would like to cluster these samples based on the expression of a defined subset of genes. Subsequently, I want to select those samples that are with the highest and lowest expression of this gene subset. I am not sure how to do it. Any suggestions are welcome.
Take a look at bioconductor/rgsepd where I run DESeq to find genes with differences, then GOSeq to find gene groups of significance, then it moves on to what you're asking for directly. Subset the counts matrix by gene set, PCA on the sub-space, and youll get clustering with respect to named gene sets.
Hi Karl, Thanks for the info. The issue in my case is that I want to define sample groups based on expression of a subset of genes and then do a differential expression (DE) analysis. I do not know if I can in some way use raw/normalized counts of these subset of genes to categorize samples and subsequently do a DE analysis.
Could you be more specific of what data you have - is it quantified yet? If so in what unit? Once I know that I can guide you better :-)
Hi kristoffer, I have HTSeq counts from some TCGA cancer samples.
I would be good Biostars manners to update the question instead of adding a comment - makes it easier for people reading it int the future :-)