Question

Quality Check of Samples and Biological Replicates

2

Entering edit mode

3.9 years ago

doinelpierrot ▴ 50

Hello,

I am performing differential gene expression analysis between to condition of temperature (A and B). I have the matrix of count for all my genes and the 5 replicates for each condition.

When I perform a PCA following the Deseq2 biocondutor guide, my replicates seems not so well correlated:

PCA on all gene on two conditions of temperature .

And I don't know what to think about it for the continuation of the analysis.

However when I plot the PCA keeping only differentially expressed genes we get better results. But for my part, I think it's a bit biased since it's obvious that genes will cluster between the 2 conditions since we have selected differentially expressed genes ... (whatever the input data would have been)

RNA-Seq R assembly • 1.7k views

ADD COMMENT • link 3.9 years ago by doinelpierrot ▴ 50

score 1 · Answer 1 · 2020-12-18

1

Entering edit mode

3.9 years ago

swbarnes2 14k

Based on this data alone, I think you have to accept that your data is what it is.

You might investigate exactly what is driving PC1. 70% of the variance not caused by your experimental condition is a lot.

ADD COMMENT • link 3.9 years ago by swbarnes2 14k

0

Entering edit mode

I have no obvious explanation that could explain this apart from randomness or the fact that the change of temperature has little impact on the transcriptional response, or handling mistakes ...

So I am wondering now if it is relevant and worth the effort to pursue a long analysis of differentially expressed genes.

ADD REPLY • link 3.9 years ago by doinelpierrot ▴ 50

0

Entering edit mode

Just because the first 2 PCs don't split your samples doesn't mean there are no significant differences. You just might not find many.

ADD REPLY • link 3.9 years ago by swbarnes2 14k

score 1 · Answer 2 · 2020-12-18

1

Entering edit mode

3.9 years ago

jerry ▴ 130

For those replicates that are very different from each other, you may find it helpful to plot simple X-Y scatter plots. This will tell you if the variance is happening for all ranges of read counts, or only for low-read count genes, or only high-read count genes, etc...

If the replicate scatter plots are "all over the place", I would then dig into if you think this is experimental error, or if this is actual variation due to biology.

ADD COMMENT • link 3.9 years ago by jerry ▴ 130

0

Entering edit mode

Thanks for your answer ! What would be your axes for the X-Y plots ?

ADD REPLY • link 3.9 years ago by doinelpierrot ▴ 50

1

Entering edit mode

gene read counts between any two samples, X and Y. This will give you a visual sense of the differences between any two samples, beyond your initial PCA plot.

ADD REPLY • link 3.9 years ago by jerry ▴ 130

0

Entering edit mode

Ok thanks for your answer so I have a new PCA plot with all the names of samples (and the batch number after the "B" PCA I have plot expression (kallisto raw count) B2_1 vs B1_12 and B4_3 vs B4_1

In the both cases the variance seems to be partially explained by low count transcripts but also with a more significant part of ofther transcripts in the second case ...

I have also investigate which transcrips are mostly driving PC1 and PC2 but nothing interesting has popped out so far ..

ADD REPLY • link 3.9 years ago by doinelpierrot ▴ 50

score 0 · Answer 3 · 2020-12-23

Ok thanks for your answers so I have a new PCA plot with all the names of samples (and the batch number after the "B" PCA I have plot expression (kallisto raw count) B2_1 vs B1_12 and B4_3 vs B4_1

In the both cases the variance seems to be partially explained by low count transcripts but also with a more significant part of ofther transcripts in the second case ...

I have also investigate which transcrips are mostly driving PC1 and PC2 but nothing interesting has popped out so far ..