Hello everyone, so I have two data.frames with expression data obtained from an RNA-seq experiment. I have one data.frame with selected miRNA expression values over 24 samples, and another data.frame with selected genes expression values over the same 24 samples.
Each data.frames were created selecting those miRNAs or mRNAs that showed differential expression using DESeq2, filtered by FDR < 0.05. Expression values are obtanied from cpm formula in edgeR package in raw counts, which were obtanied via FeatureCounts. This cpm values were calculated before normalisation of data performed by DESeq2.
Example from mRNA cpm data:
mRNA sample6 sample8 sample87 sample139
ENSSSCG00000013396 4.9226133236 7.2400541062 3.6369306772 5.0415819189
ENSSSCG00000022687 16.0221597119 13.9341369192 2.530038732 2.9623893757
ENSSSCG00000021638 61.0593383407 82.4891410464 13.8022648681 10.7087615941
ENSSSCG00000013397 5.2776094767 5.1511204625 3.0947795203 4.8023827767
ENSSSCG00000016338 10.6498845943 7.9284526934 12.2435802921 11.5183586906
ENSSSCG00000008171 6.3425979362 6.0294221081 1.6942223651 1.9135931371
ENSSSCG00000010464 222.0855934065 256.3928668898 437.7870591546 191.7273123892
ENSSSCG00000023714 22.7197538012 42.2771684039 16.8970443884 18.7127328887
ENSSSCG00000024527 12.1645348477 15.3346719758 76.1948271686 53.2494090263
ENSSSCG00000017986 9.5848961349 11.133066806 57.1743574159 42.2462484881
Example from miRNA cpm data:
miRNA sample6 sample8 sample87 sample139
ssc-miR-1285 36.2788665777 37.6145686343 2286.6900268583 34.3905779882
ssc-miR-339 1.2596828673 4.4514282408 4.9803454225 2.5163837552
ssc-miR-421-5p 22.1704184641 6.8997137732 3.5573895875 13.211014715
ssc-miR-374a-3p 136.2976862397 115.5145628475 69.7248359154 155.3866968856
ssc-miR-129a-3p 6.8022874833 25.1505695602 40.5542412977 6.7103566806
ssc-miR-296-5p 5.542604616 13.1317133102 38.4198075452 8.8073431433
ssc-miR-7 307.3626196163 274.2079796303 152.2562743459 337.6148204938
cpm values were obtained vía this R function from edgeR package:
y2 <- cpm(x, normalized.lib.sizes=FALSE)
where x is the table obtained with raw counts from FeatureCounts, no previous normalisation taken.
I would like to correlate miRNA-mRNA expression levels, expecting to select those with negative correlation as miRNAs act as inhibitors of gene expression if expressed, or enhancers of gene expression if repressed.
I've used the corr.test() function in R package psych, to get Spearman and Pearson correlation matrices, with correlation and FDR corrected p-values, but I would like to know which test (Spearman/Kendall or Pearson) would be the most appropiate aproach. I tend to think that Spearman should be the chosen one, as the distribution showed in expression data in each sample is no parametric, but I've seen some papers implementing simple Pearson correlation. According to my data, what should be the best aproach to take?
Do you know any other formula to have this work done? For instance, regression (I'm not very sure about the correct way to implement regression with this data...). Any package that solves this particular problem? Any other statistical aproach?
Thanks.
+1 for selecting the appropriate test before you do it instead of just taking the one that worked the best.