Hi, I have two files (A and B): one is a list of circuit pathways with circuit_id, hgnc and entrez_id; the other one is an expression matrix with samples (50 normal samples and 50 tumor samples) in columns and significative pathways in rows (1094). I converted them into two dataframes, the first one contains 42300 entries. With this information and knowing the circuits that present a significant differential activation between tumors and normal, I can recover the variants of the genes of each circuit in each sample (they said me) but how? I tried to do this with merge function but I obtained a dataframe of about 23gb. Can someone help me? Thanks
df<-merge(A,B, all=F).
A
circuit_id hgnc entrez_id
hsa04014__42 MAPK1 5594
hsa04014__42 MAPK1 5595
hsa04014__42 MAP2K1 5604
hsa04014__42 MAP2K1 5605
hsa04014__42 RAF1 5894
hsa04014__42 SOS1 6654
hsa04014__42 SOS1 6655
hsa04014__42 CSF1 1435
B
TCGA.E9.A1RB.01A.11R.A157.07 TCGA.E9.A1RB.11A.33R.A157.07 TCGA.E9.A1RD.01A.11R.A157.07 TCGA.E9.A1RD.11A.33R.A157.07 TCGA.E9.A1RF.01A.11R.A157.07 TCGA.E9.A1RF.11A.32R.A157.07 TCGA.E9.A1RH.01A.21R.A169.07 TCGA.E9.A1RH.11A.34R.A169.07 TCGA.E9.A1RI.01A.11R.A169.07 TCGA.E9.A1RI.11A.41R.A169.07
hsa04014__42 0.00839838888865256 0.00970133390996418 0.00737341972835342 0.00863589753817323 0.00817853278260332 0.00781469312097898 0.00814992424121156 0.0087469911010703 0.00964292262765758 0.00835381834272886 0.00893480121733551 0.00813965643801891 0.00821478135537079 0.00830167851561369 0.00636562199937457 0.00713808179967132 0.00941398752408285 0.00778778637233438 0.00711370333108985 0.00767988322689047 0.00841340012022184 0.00924541657793039 0.00771191115534063 0.00772125565766991 0.00938142603686072 0.00862550371463733 0.00904613815697069 0.00768062097231074 0.00848175263005076 0.00784553899099086 0.00773842850170287 0.00797415330573294 0.0078464068475903 0.00798592466619992 0.00836218304696701 0.00897998623363194 0.00718691055590463 0.00829712021511866 0.00874333790867059 0.00778948187438473 0.00751720959312361 0.00836368386581015 0.00708868309001744 0.00803571891697858 0.00659444164527804 0.00865615404281961 0.00844627197807296 0.00792277466852089
Can you elaborate on what exactly you want to do, like, outside of pathways. You want to merge A and B by which key value (presumably hsa ID?)?.
Also, if you paste code or command output into your question/comment, then highlight the code/output and click on the '
101 010
' button.People will be much more likely to help you if you make your question presentable.
Kevin
I see an issue with your data esp after comparing B with A. In B, you have expression values (as I assume) and in A you have corresponding genes. So far I have come across expression for a single entity till gene. But in your case, it is pathway (hsa04014 of hsa04014_42). So back translating (based on OP), MAPK1 , MAP2K1 , RAF1, SOS1 and CSF1 have same expression values (0.00839838888865256 .. 0.00970133390996418 etc). Probably because of this multimapping, your final data ended in such big data frame. Please go back and look at your original data.