Hi everyone,
I am a computer science master student and doing my final year thesis in applying AI in Oncology. In order to get the data for my ML models, my professor asked me to get the gene signature data for cancer and immune cells. For this, I have performed DEG analysis on the RNA-Seq data in R and got the required results with statistical properties. Now, my professor wants me to extract differentially expressed gene list for each of the samples provided i.e. individual list of genes for each sample. But, I am not sure how to extract the individual list of genes for each sample as I am new to Bioinformatics.
Also, I came across a post in Bioconductor support to extract the genes by using cutree function in R. I did use the cutree function which as follows,
## distance calculation of VST transformed DESeqDataSetFromMatrix obj
gene_dist = dist(DESeqVST_Matrix)
## Hierarchical clustering
gene_clust = hclust(gene_dist,method = "complete")
gene_clusters = cutree(gene_clust,k=12)
Where K is the no. of samples i.e. 12 samples 6 in cancer and 6 in immune that gives each gene in different clusters, but I am not sure which cluster belongs to which sample.
I would be very pleased if you could guide me.
What program did you use to calculate the differentially expressed genes? You might want to include your code.
I used the DESeq2 package in R programming to perform DEG analysis and below is the code.
Running the command sort(cutree(heatmap$tree_row,k=12)), I am getting the list of genes along with their cluster numbers. A partial image of it is shown below,
Likewise, running the command colnames(DESeqVST_Matrix[,heatmap$tree_col[['order']]]), I am getting the list of sample names in the heatmap as displayed.
1 "Sample 3" "Sample 6" "Sample 1" "Sample 17" "Sample 8" "Sample 10"
[7] "Sample 15" "Sample 4" "Sample 18" "Sample 2" "Sample 11" "Sample 9"
But I am not sure which gene belongs to which sample, for example, Cluster no, 1 does belong to Sample 3 or another sample?.
I would be very pleased if you could guide me.