Hello everyone,
I am currently working on a project for the university: Thyroid Disease. The purpose is to understand the influence of parameters (age, pregnancy, TSH, t4...) on the diagnostic of the patient (hypothyroid, hyperthyroid...).
I have a huge matrix with:
- rows: the patient
- columns: parameters
age: continuous.
sex: M,F.
on_thyroxine: f,t.
query_on_thyroxine: f,t.
on_antithyroid_medication: f,t.
thyroid_surgery: f,t.
query_hypothyroid: f,t.
query_hyperthyroid: f,t.
pregnant: f,t.
sick: f,t.
tumor: f,t.
lithium: f,t.
goitre: f,t.
TSH_measured: y,n.
TSH: continuous.
T3_measured: y,n.
T3: continuous.
TT4_measured: y,n
TT4: continuous.
T4U_measured: y,n.
T4U: continuous.
FTI_measured: y,n.
FTI: continuous.
TBG_measured: y,n.
TBG: continuous.
DIAGNOSTIC (hyperthyroidn hypothyroid...)
- My first question: I am not sure if I understand the data-->What does parameters with query mean? If the patient takes the thyroxine as the medication (in this case, what does "on thyroxine" mean ?) or if he asks to have the thyroxine (yes would be yes he asked the doctor to have the drug). I try to find the relative article to understand the data """See the following for a discussion of relevant experiments and related work: | Quinlan,J.R., Compton,P.J., Horn,K.A., & Lazurus,L. (1986).""" but I didn't find it
- My second question: I build networks with several algorithm: ARACNE, PC, HC, MMHC. And each time, I get a different network where the link between parameters are different. So I don't know what information I can get from these networks, which one is correct. Which method do you recommend to compare these network?
I build a network for each diagnosis:
- hyperthyroid: there are 400 patients
- hypothyroid: there are 160 patients
I want to know if the network is the result of reducing my sample number (9000--> 400 patients) or there is a real influence of the diagnosis on the network. So my hint was to select randomly for example 400 patients among the data. And calculate a distance between the network constructed with the 9000 patients and the random networks with 400 patients and the distance between the network with 9000 patients and my network with my real 400 patients. Then I don't know how to compare this distance.
Thank you for your help
Baptiste
"to understand the influence of parameters ... on the diagnostic ..." sounds like a regression/classification problem to me. Alternatively it can be viewed as a feature selection problem.
In such a situation, I would probably start with something like discriminant analysis to understand the separability of the diagnostics. I don't think a network approach is warranted here.
Thank you for your reply,
Yes you are right but the problem is that I have to use networks for the project.
Thank you again Jean-Karim
Using different algorithms to infer a network from your data will almost always give you different networks. Each algorithm has its own way of deciding which nodes to link so the meaning of an edge is usually different between the networks. The R package minet has a function to compare two networks that may be useful.