Plot heatmap of differentially expressed genes identified by DESeq2 a R coding problem
1
0
Entering edit mode
7.8 years ago

I was trying to a differential gene expression analysis by using Deseq2 [1] with samples like this [2], and would like to plot a heatmap using differentially expressed genes (DEGs) against all the samples, this is my initial try [3], that did not work out because I tried to subset the "resCellA" object, get the order from the subsetted object and use that to pull out rows of "rld" based on numerical indexing, but the index are not linked. I did that because I am not so sophisticated with R coding, and tried to solve the problem by aping the example in the vignette.

Actually I got kind suggestion that in stead of using index I need to use gene ID which is consistent across the res and mat. Could anyone advise me how to do that? I have two DataFrame [4], I guess what I need to do is to get the geneIDs from res with padj < 0.05, and retrieve those geneIDs in assay(rld),but how to do that?

Thanks very much in advance. (I understand that in the long run, I need to do more homework about R coding...)

[1]

sampleTable <- read.csv("samples.csv",row.names=1)
filenames <- paste0(sampleTable$samplename,".bam")
library("Rsamtools")
bamfiles <- BamFileList(filenames, yieldSize=2000000)
gtffile <- ("mm10genes.gtf")
library("GenomicFeatures")
(txdb <- makeTxDbFromGFF(gtffile, format="gtf", circ_seqs=character()))
(ebg <- exonsBy(txdb, by="gene"))
library("GenomicAlignments")
library("BiocParallel")
register(MulticoreParam())
se <- summarizeOverlaps(features=ebg, reads=bamfiles,
                        mode="Union",
                        singleEnd=TRUE,
                        ignore.strand=TRUE,
                        fragments=FALSE )
library("DESeq2")
(colData(se) <- DataFrame(sampleTable))
dds <- DESeqDataSet(se, design = ~ compartment + genotype)
dds <- dds[rowSums(counts(dds)) > 1,]
dds <- DESeq(dds)
rld <- rlog(dds, blind=FALSE)
library("gene filter")

[2]

"samples.csv"

"ids","samplename","genotype","compartment"

"457CellA","457CellA","WT","CellA"

"457CellB","457CellB","WT","CellB"

"457CellC","457CellC","WT","CellC"

"457CellD","457CellD","WT","CellD"

"458CellA","458CellA","cKO","CellA"

"458CellB","458CellB","cKO","CellB"

"458CellC","458CellC","cKO","CellC"

"458CellD","458CellD","cKO","CellD"

"459CellA","459CellA","WT","CellA"

"459CellB","459CellB","WT","CellB"

"459CellC","459CellC","WT","CellC"

"459CellD","459CellD","WT","CellD"

"460CellA","460CellA","cKO","CellA"

"460CellB","460CellB","cKO","CellB"

"460CellC","460CellC","cKO","CellC"

"460CellD","460CellD","cKO","CellD"

[3]

res <- results(dds)
sigGenes <- subset(res,padj <0.05)
sigGenes <- head(order(sigGenes$padj),decreasing=TRUE,40)
mat <- assay(rld)[ sigGenes, ]
mat <- mat - rowMeans(mat)
df <- as.data.frame(colData(rld)[,c("compartment","genotype")])
pheatmap(mat, annotation_col=df,fontsize=9,fontsize_number = 0.4 * fontsize)
dev.off()

[4]

> head(assay(rld))
                       457CellA      458CellB        459CellC          460CellD
0610005C13Rik          1.808277      1.6076085        1.533645         1.695644
0610007P14Rik          8.191532      7.5024585        8.660520         8.334527
0610009B22Rik          5.807757      5.6569778        5.989643         6.123990
0610009L18Rik          3.881685      3.5666925        3.786769         3.921051
0610009O20Rik          6.800692      6.8433131        6.951461         7.218018
0610010B08Rik         -1.224636     -0.8949234       -1.230797        -1.226311


> res <- results(dds)
> res
log2 fold change (MAP): compartment CellA vs CellB
Wald test p-value: compartment CellA vs CellB 
DataFrame with 23372 rows and 6 columns
                baseMean log2FoldChange      lfcSE        stat       padj
               <numeric>      <numeric>  <numeric>   <numeric>    <numeric>
0610005C13Rik   3.842899    -0.81806436  0.6916742 -1.18273074   0.23691588
0610007P14Rik 297.131215     0.22129241  0.1115736  1.98337585   0.04732546
0610009B22Rik  60.686295     0.43130300  0.2452162  1.75886844   0.07859986
0610009L18Rik  14.512324     0.02226444  0.3606164  0.06173996   0.95076992
0610009O20Rik 125.067373     0.19888416  0.1723663  1.15384599   0.24856332
RNA-Seq R • 8.6k views
ADD COMMENT
5
Entering edit mode
7.8 years ago

You can easily pull out differentially expressed gene names

de <- rownames(res[res$padj<0.05 & !is.na(res$padj), ])
de_mat <- assay(rld)[de,]

now de_mat contains the rld values of differentially expressed genes across all samples. This could be used to plot heatmap.

ADD COMMENT
0
Entering edit mode

Hi Goutham, Thank you very much. I guess I had worked out a more tedious version but yours are more succinct. So I learnt a lot from your answer.

res0.05 <- subset(res, res$padj < 0.05)
sigGenes <- rownames(res0.05)
rows <- match(sigGenes, row.names(rld))
mat <- assay(rld)[rows,]

Just have another question, why "& !is.na(res$padj)" is needed? I can not imaging a case in which padj value is missing. TKS in advance.

ADD REPLY
0
Entering edit mode

I was looking for this thank your for the solution but i have one doubt why are you using this can you explain & !is.na(res$padj),?

ADD REPLY
0
Entering edit mode

Thank you for your post, its was simple and it works!

ADD REPLY

Login before adding your answer.

Traffic: 2574 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6