Extract differentially expressed genes list from DESeq2 results for each of the sample provided
0
2
Entering edit mode
3.2 years ago
Lawrence ▴ 20

Hi everyone,

I am a computer science master student and doing my final year thesis in applying AI in Oncology. In order to get the data for my ML models, my professor asked me to get the gene signature data for cancer and immune cells. For this, I have performed DEG analysis on the RNA-Seq data in R and got the required results with statistical properties. Now, my professor wants me to extract differentially expressed gene list for each of the samples provided i.e. individual list of genes for each sample. But, I am not sure how to extract the individual list of genes for each sample as I am new to Bioinformatics.

Also, I came across a post in Bioconductor support to extract the genes by using cutree function in R. I did use the cutree function which as follows,

## distance calculation of VST transformed DESeqDataSetFromMatrix obj

    gene_dist = dist(DESeqVST_Matrix)

## Hierarchical clustering

    gene_clust = hclust(gene_dist,method = "complete")
    gene_clusters = cutree(gene_clust,k=12)

Where K is the no. of samples i.e. 12 samples 6 in cancer and 6 in immune that gives each gene in different clusters, but I am not sure which cluster belongs to which sample.

Clusters with number of genes

I would be very pleased if you could guide me.

clustering RNA-Seq DESeq2 R • 2.0k views
ADD COMMENT
0
Entering edit mode

What program did you use to calculate the differentially expressed genes? You might want to include your code.

ADD REPLY
0
Entering edit mode

I used the DESeq2 package in R programming to perform DEG analysis and below is the code.

feature_Counts = as.matrix(read.table("raw_read_counts_12_sample.txt",header=T))

sample_info = DataFrame(row.names = colnames(feature_Counts),
                    ctype = factor(c(rep("PC3",6),rep("U937",6))),
                    condition = factor(c(rep("high",2),
                                         rep("low",2),
                                         rep("MC",2),
                                         rep("MC",2),
                                         rep("high",2),
                                         rep("low",2))))

 DESeq.ds = DESeqDataSetFromMatrix(countData = feature_Counts,
                              colData = sample_info,
                              design = ~condition + ctype)

 genes_with_reads = rowSums(counts(DESeq.ds)) > 1
 filtered_DESeq.ds = DESeq.ds[genes_with_reads,]

 dESeq_Obj = DESeq(filtered_DESeq.ds)

 DESeq_res = results(dESeq_Obj)

 DESeq_res = DESeq_res[DESeq_res$padj<0.001 & !is.na(DESeq_res$padj),]
 DESeq_res_df = as.data.frame(DESeq_res)

 ## Generating labels for Up regulated genes and down regulated genes
 dr_ur_label = factor(ifelse(DESeq_res_df$log2FoldChange<0,"DR","UR"))
 DESeq_res_df$DR_UR = dr_ur_label

 ## Adding Gene IDs as a separate column
 genes = rownames(DESeq_res_df)
 DESeq_res_df = add_column(DESeq_res_df,Genes = genes,.before = "baseMean")

 ## Transforming the count data using Variance Stabilizing Transform
 DESeqVST = vst(DESeq.ds)

 ## Converting the above transformed data to data frame
 DESeqVST <- assay(DESeqVST)
 DESeqVST <- as.data.frame(DESeqVST)
 DESeqVST$Gene <- rownames(DESeqVST)
 head(DESeqVST)

 DESeqVST = DESeqVST[DESeqVST$Gene %in% DESeq_res_df$Genes,]

 DESeqVST_transformed = melt(DESeqVST,id.vars = "Gene")

 ## converting significant genes dataframe to matrix
 DESeqVST_Matrix = dcast(DESeqVST_transformed,Gene ~ variable)
 rownames(DESeqVST_Matrix) = DESeqVST_Matrix$Gene
 DESeqVST_Matrix$Gene = NULL

 ## Calculating distance between genes
 gene_dist = dist(DESeqVST_Matrix)

 gene_clust = hclust(gene_dist,method = "complete")

 gene_clusters = cutree(gene_clust,k=12). ## Cluster numbers for each gene where K = 12 is the number of samples
 gene_clusters = sort(gene_clusters)

 dendogram_cluster = as.dendrogram(gene_clust)

 heatmap = pheatmap(DESeqVST_Matrix,legend = T,labels_row = "")

 colnames(DESeqVST_Matrix[,heatmap$tree_col[['order']]]) ## provides the list of sample names in the heatmap

Running the command sort(cutree(heatmap$tree_row,k=12)), I am getting the list of genes along with their cluster numbers. A partial image of it is shown below,

Genes with cluster numbers

Likewise, running the command colnames(DESeqVST_Matrix[,heatmap$tree_col[['order']]]), I am getting the list of sample names in the heatmap as displayed.

1 "Sample 3" "Sample 6" "Sample 1" "Sample 17" "Sample 8" "Sample 10"

[7] "Sample 15" "Sample 4" "Sample 18" "Sample 2" "Sample 11" "Sample 9"

But I am not sure which gene belongs to which sample, for example, Cluster no, 1 does belong to Sample 3 or another sample?.

I would be very pleased if you could guide me.

ADD REPLY

Login before adding your answer.

Traffic: 1294 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6