Hi all,
I have a kind of replicates for three samples from two RNAseq works. Two RNAseq works mean I performed one RNAseq earlier, and the other RNAseq around 3 month later. I'm not sure If I can say this is biological replicates. Probably it is.
The RNAs was prepared from the same cell line. It means that the source of two RNAs are the same. But each sample was in different condition each like below.
RNAseq works-----Condition1---------Condition2-------Condition3
First RNAseq-------RNA-Sample1-----RNA-Sample2-----RNA-Sample3
Second RNAseq----RNA-Sample1-----RNA-Sample2-----RNA-Sample3
From pearson's correlation, the coefficient are 0.92 (0.80), 0.98 (0.96), 0.98 (0.96) for sample1, 2, 3, respectively when expected count (TPM value) was used.
In theory, two samples in the same condition are the same, and the expression profiles are supposed to be the same with almost coefficient 1. But, I understand technical variance.
From this, my opinion is that I can use the replicated data for sample 2 and sample 3, but I'm not sure about replication for sample 1.
Considering the coefficients of sample 2 and 3, I think technical variance didn't affect a lot between two RNAseq works. If the way I think is wrong, please point it out.
Is it OK if I think sample1 is replicated and use the data for the further analysis with that coefficient score? or Do I have to discard it?
Plus, is there any other reliable or relevant method to check replication?
I'm not very new, but don't have enough knowledge and experience in this field. Looking forward to good comment and advice.
Thanks, SS
You need to find out the source of the RNA. The question of biological replicates or technical replicates is very important and due to the source material. Where did the (each of six) RNA come from? Your collaborators will know if they are the same cells or different. Pearson correlation is irrelevant for this answer.
Hi karl.stamn
Thank you for the reply and comments.
As you can see some kind of table above. It's biological replicates.
RNAs were prepared from sample1, 2, and 3, and each sample were treated in different condition, like sample 1 was in condition 1, and sample2 in condition 2, and sample 3 in condition 3. Prepared RNAs were used first RNAseq.
And three months later, I did the exactly the same thing. So, in theory, data from sample 1 of the first RNAseq is supposed to be the same as data from sample 1 of the second RNAseq, like this.
Could you let me know why pearson correlation is irrelevant for this?
Is that because of some kind of variation that might cause significant different read counts between replicates?
Then, would Spearman correlation can give better explain between replicates?
Do I have to normalize expected counts to do Spearman correlation? Would just standardization works, instead of normalization?
Thanks, SS
I did Spearman correlation, and I got this results 0.95 (0.95) for sample 1, 0.97 (0.96) for sample2, 0.98 (0.97) for sample3 (EC values (TPM values)) similar to the results of Pearson correlation.
I expected that the Spearman correlation gives higher coefficient because it eliminates variance caused by the differences in read counts.
But the question unanswered is whether this coefficiency 0.95 is enough to convince the replication (Second RNAseq counts = the first RNAseq counts, not exactly same, very similar enough to ignore the minor differences in whole gene expression profiles) or not.
Please, comment and point out things that I miss or misunderstand.
Thanks, SS