Question

Problem with Pathways

0

Entering edit mode

7.7 years ago

fp89 ▴ 30

Hi, I have two files (A and B): one is a list of circuit pathways with circuit_id, hgnc and entrez_id; the other one is an expression matrix with samples (50 normal samples and 50 tumor samples) in columns and significative pathways in rows (1094). I converted them into two dataframes, the first one contains 42300 entries. With this information and knowing the circuits that present a significant differential activation between tumors and normal, I can recover the variants of the genes of each circuit in each sample (they said me) but how? I tried to do this with merge function but I obtained a dataframe of about 23gb. Can someone help me? Thanks

df<-merge(A,B, all=F).




A

circuit_id  hgnc    entrez_id

hsa04014__42    MAPK1   5594
hsa04014__42    MAPK1   5595
hsa04014__42    MAP2K1  5604
hsa04014__42    MAP2K1  5605
hsa04014__42    RAF1    5894
hsa04014__42    SOS1    6654
hsa04014__42    SOS1    6655
hsa04014__42    CSF1    1435

B

TCGA.E9.A1RB.01A.11R.A157.07    TCGA.E9.A1RB.11A.33R.A157.07    TCGA.E9.A1RD.01A.11R.A157.07    TCGA.E9.A1RD.11A.33R.A157.07    TCGA.E9.A1RF.01A.11R.A157.07    TCGA.E9.A1RF.11A.32R.A157.07    TCGA.E9.A1RH.01A.21R.A169.07    TCGA.E9.A1RH.11A.34R.A169.07    TCGA.E9.A1RI.01A.11R.A169.07    TCGA.E9.A1RI.11A.41R.A169.07

hsa04014__42    0.00839838888865256 0.00970133390996418 0.00737341972835342 0.00863589753817323 0.00817853278260332 0.00781469312097898 0.00814992424121156 0.0087469911010703  0.00964292262765758 0.00835381834272886 0.00893480121733551 0.00813965643801891 0.00821478135537079 0.00830167851561369 0.00636562199937457 0.00713808179967132 0.00941398752408285 0.00778778637233438 0.00711370333108985 0.00767988322689047 0.00841340012022184 0.00924541657793039 0.00771191115534063 0.00772125565766991 0.00938142603686072 0.00862550371463733 0.00904613815697069 0.00768062097231074 0.00848175263005076 0.00784553899099086 0.00773842850170287 0.00797415330573294 0.0078464068475903  0.00798592466619992 0.00836218304696701 0.00897998623363194 0.00718691055590463 0.00829712021511866 0.00874333790867059 0.00778948187438473 0.00751720959312361 0.00836368386581015 0.00708868309001744 0.00803571891697858 0.00659444164527804 0.00865615404281961 0.00844627197807296 0.00792277466852089

pathway circuit pathway rstudio dataframe merge • 1.5k views

ADD COMMENT • link updated 7.7 years ago by GenoMax 152k • written 7.7 years ago by fp89 ▴ 30

0

Entering edit mode

Can you elaborate on what exactly you want to do, like, outside of pathways. You want to merge A and B by which key value (presumably hsa ID?)?.

Also, if you paste code or command output into your question/comment, then highlight the code/output and click on the '101 010' button.

People will be much more likely to help you if you make your question presentable.

Kevin

ADD REPLY • link 7.7 years ago by Kevin Blighe 89k

0

Entering edit mode

I see an issue with your data esp after comparing B with A. In B, you have expression values (as I assume) and in A you have corresponding genes. So far I have come across expression for a single entity till gene. But in your case, it is pathway (hsa04014 of hsa04014_42). So back translating (based on OP), MAPK1 , MAP2K1 , RAF1, SOS1 and CSF1 have same expression values (0.00839838888865256 .. 0.00970133390996418 etc). Probably because of this multimapping, your final data ended in such big data frame. Please go back and look at your original data.

ADD REPLY • link 7.7 years ago by cpad0112 21k