Question

Distance matrices between RNAseq samples generated with different transformations/normalization do not correlate

0

Entering edit mode

2.9 years ago

hans ▴ 20

There are 64 samples taken from wheat. Samples were taken from two tissues, 32 from leaves ( L) and 32 from roots ( R), each tissue includes two treatments (LN, FN) . Gene counts were obtained using the Tuxedo pipe line

The gene counts were loaded into DESeq object, and transformed. A distance matrix between samples was calculated and plotted using this code:

library(DESeq2)
library(pheatmap)
library(NormExpression)
cnt<-read.csv('/home/me/gene_count_matrix.csv', row.names=1)
cnt<-as.matrix(cnt)
mode(cnt)<-'numeric'
des <- DESeqDataSetFromMatrix(countData = cnt,
                          colData = coldata,
                          design = ~ samp)
des<-DESeq(dest)
vsd <- vst(des, blind=T)
sampleDists <- dist(t(assay(vsd)))
sampleDistMatrix <- as.matrix(sampleDists)
pheatmap(sampleDistMatrix,
     clustering_distance_rows=sampleDists,
     clustering_distance_cols=sampleDists,
     col=rainbow(10))

Alternatively, gene counts were normalized but not transformed using NormExpression code:

tu_cnt <- getFactors(data = cnt, method = "TU", pre_ratio=0.5, lower_trim=0.05,
                 upper_trim=0.65)
TU_cnt.matrix <- getNormMatrix(data = cnt, norm.factors = tu_cnt)
dcnt<-dist(t(TU_cnt.matrix),method = 'euc')
pheatmap(as.matrix(dcnt),
     clustering_distance_rows=dcnt,
     clustering_distance_cols=dcnt,
     col=rainbow(10))

The two plots were totally different. The DESeq image was highly ordered following the expectations for a separation between the two tissue groups. However, the sample names were completely out of the expected order. While the NormExpression image was a bit more messy but the samples were ordered as expected with one or two outliers. See plots: DESeq DESeq NormExpression The order of the samples in the DESeq plot was something like this (only tissue and treatment abbreviations): L_FN, R_FN, R_LN, L_LN, L_LN, L_LN, R_LN, R_FN, R_FN, R_LN.... while the order of the samples in the NormExpression plot was ordered where all L were clustered first fallowed by the R samples and in each tissue all the samples from the same treatments (LN or FN) were clustered together. The questions are:

How to reconcile between the methods?
How come the DESeq image seems very ordered but the samples are completely messed?

Thank you

DESeq2 RNASEQ NormExparssion • 658 views

ADD COMMENT • link 2.9 years ago by hans ▴ 20

0

Entering edit mode

NormExpression authors suggest to use TU method for normalization. The package does not have methods for transformation as far as I understand. However, at least in the case presented here, The TU normalization gave more biologically meaningful results than the DESeq normalization and vst transformation. However the package only to tests normalization methods and do not preform DE analysis. So I wonder if I can trust the methods implemented in DESeq when I see such disorder in clustering.

ADD REPLY • link 2.9 years ago by hans ▴ 20