Entering edit mode
5.9 years ago
Mithil Gaikwad
▴
50
Hello, I am doing RNA seq analysis to obtain Differential expression genes using DESeq2 for 13 patient and 6 Healthy donors. Before going for DESeq2 analysis, I am visualizing my samples by Distance matrix and PCA plot, using the following commands:
library("RColorBrewer")
sampleDistMatrix <- as.matrix(sampleDists)
rownames(sampleDistMatrix) <- paste(rld$condition, sep="-")
colnames(sampleDistMatrix) <- NULL
colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255)
pheatmap(sampleDistMatrix,
clustering_distance_rows=sampleDists,
clustering_distance_cols=sampleDists,
col=colors)
plotPCA(rld, intgroup="condition")
: PCA plot
Here I am not able to visualize both the sample group separately. Am I suppose to discard the samples, which can not be separated and how one can find this exact sample? Is there any function or solution to separate the samples without discarding it? Kindly reply. Any help in this regard will be highly appreciated
sessionInfo() R version 3.5.2 (2018-12-20) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.1 LTS
Matrix products: default BLAS: /usr/lib/x8664-linux-gnu/blas/libblas.so.3.7.1 LAPACK: /usr/lib/x8664-linux-gnu/lapack/liblapack.so.3.7.1
locale: [1] LCCTYPE=enIN.UTF-8 LCNUMERIC=C
LCTIME=enIN.UTF-8 LCCOLLATE=enIN.UTF-8 [5] LCMONETARY=enIN.UTF-8 LCMESSAGES=enIN.UTF-8
LCPAPER=enIN.UTF-8 LCNAME=C [9] LCADDRESS=C LCTELEPHONE=C
LCMEASUREMENT=enIN.UTF-8 LC_IDENTIFICATION=C
attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages: [1] hexbin1.27.2 vsn3.50.0
pheatmap1.0.12 RColorBrewer1.1-2 [5] DESeq21.22.1 SummarizedExperiment1.12.0 DelayedArray0.8.0 BiocParallel1.16.5 [9] matrixStats0.54.0 Biobase2.42.0
GenomicRanges1.34.0 GenomeInfoDb1.18.1 [13] IRanges2.16.0 S4Vectors0.20.1
BiocGenerics_0.28.0
loaded via a namespace (and not attached): [1] bit640.9-7
splines3.5.2 Formula1.2-3 assertthat0.2.0
affy1.60.0 [6] BiocManager1.30.4 latticeExtra0.6-28 blob1.1.1 GenomeInfoDbData1.2.0 pillar1.3.1
[11] RSQLite2.1.1 backports1.1.3 lattice0.20-38
limma3.38.3 glue1.3.0 [16] digest0.6.18
XVector0.22.0 checkmate1.9.0 colorspace1.4-0
preprocessCore1.44.0 [21] htmltools0.3.6 Matrix1.2-15
plyr1.8.4 XML3.98-1.16 pkgconfig2.0.2
[26] genefilter1.64.0 zlibbioc1.28.0 purrr0.2.5
xtable1.8-3 scales1.0.0 [31] affyio1.52.0
htmlTable1.13.1 tibble2.0.1 annotate1.60.0
ggplot23.1.0 [36] nnet7.3-12 lazyeval0.2.1
survival2.43-3 magrittr1.5 crayon1.3.4
[41] memoise1.1.0 foreign0.8-70 tools3.5.2
data.table1.12.0 stringr1.3.1 [46] locfit1.5-9.1
munsell0.5.0 cluster2.0.7-1 AnnotationDbi1.44.0
bindrcpp0.2.2 [51] compiler3.5.2 rlang0.3.1
grid3.5.2 RCurl1.95-4.11 rstudioapi0.9.0
[56] htmlwidgets1.3 labeling0.3 bitops1.0-6
base64enc0.1-3 gtable0.2.0 [61] DBI1.0.0
R62.3.0 gridExtra2.3 knitr1.21
dplyr0.7.8 [66] bit1.1-14 bindr0.1.1
Hmisc4.1-1 stringi1.2.4 Rcpp1.0.0
[71] geneplotter1.60.0 rpart4.1-13 acepack1.4.1
tidyselect0.2.5 xfun_0.4
Does this mean you are directly using raw counts for clustering and PCA? If so, your results are not unusual. Also from the heatmap scale, I really think there is some sort of problem with normalization. You can use DESeq2 rlog/vst normalized counts for PCA.
Actually, I have transformed my raw counts using rlog and then I am visualizing my data in PCA plot and heatmap so that I could obtain proper Differential expressing genes.
These plots that you have shown say virtually nothing about differentially expressed genes. These plots just show [mostly] unbiased / unsupervised relationships between your samples.
It looks like you may benefit from reading through the DESeq2 Vignette
By the way, in your PCA bi-plot, there is some evidence that 2 of your samples are outliers.
Thank you for your reply, Kevin, I have read DESeq2 vignette but it didn't find anything about solving such kind of error, however, Again i'll read and find it out what can be done. But as per your experience, can you give me any suggestions for solving this problem? Thank you
I see what you mean. You want to identify the samples in the PCA bi-plot so that you can remove them?
Please take a look at this reproducible code (using random data), which will perform PCA and generate a bi-plot with lables (just for QC-ing):
Your object is called rld, with the counts in this object being accessible via
assay(rld)
. So, your code needs to be:The resulting bi-plot will differ from your original. This is because DESeq2's PCA function automatically removes a large proportion of variables prior to performing PCA, whereas, my code does not do this.
How to add images to a Biostars post