How to compare samples between two different platforms?
0
2
Entering edit mode
5.9 years ago
newbie ▴ 130

Hello Everyone,

I'm looking for some help regarding comparison between same samples of different platforms. Initially I had Microarray data for 7 samples. The same sample ids were also used for RNA-Sequencing.

So, for both Microarray and RNA-seq I have same 7 samples. But, when I did clustering of 7 RNA-seq samples they behave in a different way. I worry that the sample ids got mixed either in the RNAseq or microarray. I wanted to check the correlation between the samples of both Microarray and RNAseq platforms.

Here I'm showing some data from both platforms.

Microarray data is shown below (Affymetrix SNP 6.0 data). This is how the expression data looks.

Genes   Sample1     Sample2    Sample3      Sample4     Sample5      Sample6    Sample7
Gene1   131.4311    369.4926    222.0441    687.4181    176.8892    258.1233    316.5573
Gene2   78.73022    83.97501    81.56039    86.11443    78.09758    81.88231    84.17101
Gene3   90.02816    95.07267    101.1761    93.35585    81.96468    94.3553     93.89527
Gene4   79.86837    81.63064    88.19524    79.47265    76.3437     101.6351    93.71674
Gene5   99.03493    109.1835     104.97     102.7423    108.3677    98.93459    101.4052
Gene6   79.58075    84.28915    90.53562    74.47786    75.96112    96.39649    95.8828
Gene7   121.5373    149.9351    146.5956    122.8523    110.5759    132.4268    130.4409
Gene8   616.5994    1326.735    1358.187    2315.851    1068.745    3229.759    4435.021
Gene9   70.44073    69.56772    68.25446    68.35857    70.86771    74.3843     67.93569
Gene10  78.62103    78.9498     76.96349    73.12749    76.37209    82.27274    82.54192
Gene11  69.45438    76.9461     80.96048    80.35287    78.84947    84.79259    88.00973
Gene12  84.79181    81.74586    81.13312    77.90322    79.47303    77.37466    75.77509
Gene13  86.88158    91.10217    90.85628    78.9453     76.18599    86.22007    83.63694

RNA-Seq data looks like below. I have the raw counts data from RNA-Seq.

Genes   Sample1 Sample2 Sample3 Sample4 Sample5 Sample6 Sample7
Gene1   470       1058    263     488    2027   1047    737
Gene2   20         89      14      87     76    169      21
Gene3   16         13       1      0       6     17      6
Gene4   0           0       0      0       0      0      0
Gene5   0           0       0      5       0      0      1
Gene6   2           3       4      30     23      9      23
Gene7   20          0      48     113     92      14     88
Gene8   0           0       0      10      0      3      2
Gene9   40         10     154      18     401   74     318
Gene10  85         51     333      40     632   165    667
Gene11  1897    8879     1725    2645    8519   3823    1983
Gene12  203       593     361     380     524    436    227
Gene13  46        267     117     207     351    240    70

I'm very new to R and this type of analysis about the correlation between the samples. Can anyone tell me how I should do this and what functions I have to use? Will be very helpful if you could show me with the above data.

Any help is appreciated. Thanks a lot.

RNA-Seq R correlation microarray • 1.4k views
ADD COMMENT
1
Entering edit mode

Simplest thing is to plot the correlation for each sample. plot(madata$Sample1, rnadata$Sample1)

ADD REPLY
0
Entering edit mode

Do I need to do that for each sample separately? What if I have 50 samples in each platform?

ADD REPLY
1
Entering edit mode

Yes, if you have 50 samples, you want to calculate all their pairwise comparisons to each other. Maybe the corrplot package is helpful here.

Which RNA-seq values did you use for the clustering of the RNA-seq data? I would probably not go with the raw counts for the correlation analysis because the raw counts will also depend on the individual sequencing depths of every sample and so on. You probably want to do that on rlog or cpm values, or, at the bare minimum, the results of DESeq2::counts(dds, normalized = TRUE). For more details on DESeq2, see Michael Love's tutorial.

ADD REPLY
0
Entering edit mode

Yes, I will convert counts to normalized counts and use it. Do I need to take same number of genes in both the platforms for correlation?

ADD REPLY
0
Entering edit mode

If you are missing genes in one assay, you can hardly compare its expression values between the two methods. Unless you had another way of calculating the correlation in mind, my answer would be: yes, absolutely do you need to look at the same genes in both assays.

ADD REPLY

Login before adding your answer.

Traffic: 1829 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6