Hi all,
I'm currently doing a differential expression analysis on RNA-seq data. After counting the reads with salmon, and normalizing with deseq2, I've plotted the dispersion of my samples thanks to a PCA.
I have 3 replicates per conditions, and I was kind of waiting for them to cluster together, but I found out that there is not really a clustering of replicates, the samples being spread across the PCA.
I'm not really familiar with this kind of output, and I was wondering if not seing clustering of replicates could be normal (due to biological variation between replicates,even with the sam conditions), or if I should worried about something before going to DE analysis.
Thanks for your inputs,
Guillaume
Please give some details on the experimental setup: Cell line or primary samples, which organism, which treatment etc.
I have 72 samples of Vitis vinifera leaves, with 4 changing treatments, and 3 biological replicates for each set of conditions. Mainly I wanted to know if a DE analysis can still be relevant with a low transcriptomic concordance between biological replicates.
What have you actually input to the PCA functions, and which PCA functions have you used? Please show your exact code. Also, have you performed pre-filtering steps on the raw counts prior to normalisation?
My input is the "vst" normalized table of counts, filtered for genes with no reads. I used the function plotPCA from the DEseq2 package.
Try this code: A: PCA plot from read count matrix from RNA-Seq
You'll get a different answer. Why? - because the DESeq2 plotPCA function filters a large proportion of your genes based on variance prior to performing the PCA transformation.
You could also simply increase the
ntop
option toInf
(plotPCA(foo, ntop=Inf)
or something like that).How to add images to a Biostars post