Pearson correlation for RNAseq data - input formats
0
0
Entering edit mode
13 months ago
Lada ▴ 30

Hi guys,

I work with tiny crustaceans and did RNAseq on different species, each with 3 biological replicates (each replicate being a pool of 3 individuals). I want to check if there is a good correlation between my replicates.

I trimmed my reads, made de novo assembly in trinity, quantified my transcripts with Salmon (aligner free-based method) and built expression matrices. I plan to do the Pearson correlation test in R with the cor function. I have a couple of questions:

  1. What is the correct input for this test: raw counts, TPMs or maybe transformed counts (vst or rlog which I can do in DESeq2)? I guess that since RNAseq data are very skewed toward a small fraction of highly expressed genes, I should use some kind of transformation...

  2. Is it okay to do it on isoform level or should I do this analysis on trinity "genes"?

  3. What do you consider a cutoff for as a measure of good reliability of your experiment? I read that ENCODE suggests that the square of the Pearson correlation coefficient should be larger than 0.92, under ideal experimental conditions. With TPMs, I am getting correlations from 0,63 to 0,99.

Tnx,

Lada

vst Pearson-correlation TPM rlog RNA-seq • 480 views
ADD COMMENT

Login before adding your answer.

Traffic: 2305 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6