I have 66 sample ( Relapse, Non-Relapse). I did dfferential gene expression analysis using limma package and then I made a lasso-penalised model based on 45 DEGs using the instruction that is prepared here.
These are the result based on metrics: min λ (lamba) .
$`non-Relapse`
46 x 1 sparse Matrix of class "dgCMatrix"
1
(Intercept) 9.97551314
ASPH .
ATP5A1 .
ATP5G3 .
COL24A1 0.05181047
DCAF10 .
DES .
DNMT1 .
EIF2B2 .
ETS2 .
FLJ38773 .
FLNA -0.72130732
HCAR1 -0.08396882
HK2 .
HMGB2 .
HNRNPDL .
HSPA1A 0.05883314
HSPE1 .
ILF3 .
IQGAP3 .
IRF2BPL .
JCHAIN 0.05657491
KPNA2 .
LOC102546294 -0.09527344
MAP7D2 -0.10777351
MED31 .
MSL1 .
OLFM4 .
PCK1 .
PHYKPL .
PROSC .
PTRF .
REG3A .
RNF7 .
SEC22B .
SF1 .
SLC2A3 -0.15121214
SOX9 0.02238456
SRI .
STRN3 .
TM2D1 .
TPT1 .
UQCRFS1 .
ZCCHC8 .
ZNF638 .
ZNF761 .
$Relapse
46 x 1 sparse Matrix of class "dgCMatrix"
1
(Intercept) -9.97551314
ASPH .
ATP5A1 .
ATP5G3 .
COL24A1 -0.05181047
DCAF10 .
DES .
DNMT1 .
EIF2B2 .
ETS2 .
FLJ38773 .
FLNA 0.72130732
HCAR1 0.08396882
HK2 .
HMGB2 .
HNRNPDL .
HSPA1A -0.05883314
HSPE1 .
ILF3 .
IQGAP3 .
IRF2BPL .
JCHAIN -0.05657491
KPNA2 .
LOC102546294 0.09527344
MAP7D2 0.10777351
MED31 .
MSL1 .
OLFM4 .
PCK1 .
PHYKPL .
PROSC .
PTRF .
REG3A .
RNF7 .
SEC22B .
SF1 .
SLC2A3 0.15121214
SOX9 -0.02238456
SRI .
STRN3 .
TM2D1 .
TPT1 .
UQCRFS1 .
ZCCHC8 .
ZNF638 .
ZNF761 .
sorry for naive question. I am new to this field. Can any one help me with these coefficient results that I got for genes? I studied this page but I couldn’t figure out what is the meaning of these dots. My second question is that based on which coefficient I have to select the best predictors; min λ (lamba) or 1 standard error of λ? I really appreciate any helps.
Many thanks Jean-Karim for your explanation. I still couldn't understand the meaning of those dots . could you please explain more? I am looking for the best predictor genes which can predict patients at risk of relapse. Based on the result that I got above, which genes are the best predictors? Many thanks!
The dots indicate features (here genes) that didn't make it into the model (i.e. coefficient is 0). The magnitude of the coefficient indicates the "strength" of the contribution of the corresponding gene. So a greater positive value means a stronger positive influence.
Thank you for your reply. But, I got confused. You say " a greater positive value means a stronger positive influence."
If I'm not mistaken you mean a gene with a greater positive value of coefficient (for example in my case, FLNA and SLC2A3 and MAP7D2 respectively in relapse data) is a better predictor. But @Kevin in this post mentioned that "The best predictors will generally be those that have the smallest (possibly zero) coefficient values". could you please help me to get out of this confusion? I really appreciate your time and help.
Yes
On the face of it, this seems wrong as a coefficient of 0 means the gene doesn't contribute to the model but maybe this comment was made with something else in mind.