Hi guys,
I work with tiny crustaceans and did RNAseq on different species, each with 3 biological replicates (each replicate being a pool of 3 individuals). I want to check if there is a good correlation between my replicates.
I trimmed my reads, made de novo assembly in trinity, quantified my transcripts with Salmon (aligner free-based method) and built expression matrices.
I plan to do the Pearson correlation test in R with the cor
function. I have a couple of questions:
What is the correct input for this test: raw counts, TPMs or maybe transformed counts (vst or rlog which I can do in DESeq2)? I guess that since RNAseq data are very skewed toward a small fraction of highly expressed genes, I should use some kind of transformation...
Is it okay to do it on isoform level or should I do this analysis on trinity "genes"?
What do you consider a cutoff for as a measure of good reliability of your experiment? I read that ENCODE suggests that the square of the Pearson correlation coefficient should be larger than 0.92, under ideal experimental conditions. With TPMs, I am getting correlations from 0,63 to 0,99.
Tnx,
Lada