I have RNA seq data-set. Bellow is an example. Where 1st column are the genes and 1st row are the samples. In actual dataset I have over 4000 samples. Now, I have two set of genes. Set1 (A1, A2, A3) is my list of genes of my interest and set2 (B-J) with known disease causing genes. Now, I want to calculate Pearson correlations for each of the set2 genes with respect to my gene and want to retain genes P < 0.05. How can I do this?
N S1 S2 S3 S4 S5 S6 S7 S8 S8 S10 S11 S12 S13 S14
A1 1943 3056 813 812 1769 2324 360 920 830 582 85 123 1306 1651
A2 39 39 7 10 212 360 100 188 202 178 178 206 52 51
A3 593 704 73 61 204 383 201 245 384 278 369 652 523 651
B 104 43 10 17 397 639 158 294 226 217 802 690 49 80
C 73 58 17 13 638 886 316 553 608 573 564 759 134 173
D 102 54 14 17 204 398 452 454 289 271 66 65 122 113
E 2820 3419 1531 1305 1795 2489 1057 1821 1457 1383 292 241 3672 3678
F 34 169 71 67 949 1014 337 1019 747 668 52 53 1936 812
G 85 117 54 58 37 54 99 146 115 96 48 38 920 512
H 133 202 32 48 42 37 50 105 79 80 24 21 433 277
I 142 172 40 43 39 69 114 122 105 117 169 95 344 172
J 84 132 21 36 49 77 120 159 108 144 206 447 302 98
Note that testing the null hypothesis of correlation = 0 is statistically equivalent to testing slope = 0 in the corresponding linear regression model. Since any stats package will give you this P-value as part of its output for fitting a linear model, this might be an easier way to get what you're looking for.