Question

simulation of time-series protein expression data

0

Entering edit mode

10.6 years ago

Zara ▴ 20

Hi everybody,

I'm trying to simulate time-series data for protein expression. I have a PPI network for which I want to simulate the data. Beside proteins, there are other proteins which we know as stimuli nodes. These nodes are connected to one or several proteins (like a drug molecule which is affecting a protein in our network).

I use formula 1 from this paper for simulating the relation between concentration of proteins.

I need this data to test my network inference algorithm.

Now my question is:

How can I simulate data in first time-step? For example can I just create random values and then apply the formula to get the data for the next time steps (of course I should ignore the random row)?

For example if I have an activator for EGFR, then if this activator is on, EGFR should be expressed, otherwise EGFR should be silent (no expression I mean). But if I just fix this, then the data would hardly be changed in next time steps. I mean any way I can't use a fix value for perturbed nodes, since in the formula itself the expression level is not fixed. I don't know how to calculate the concentrations for a node which is connected to an inhibitor or a stimuli node. It is already considered in the formula but using the formula it is possible that an inhibited node, gets a high value during next time steps.

I don't know how to set reasonable alpha and beta in the formula 1 (I solved the formula to get expression level in time t in terms of expression levels in time t-1).

Any help would be appreciated.

Thanks :)

simulation R time-series • 3.1k views

ADD COMMENT • link updated 3.3 years ago by Ram 45k • written 10.6 years ago by Zara ▴ 20

Ram · Accepted Answer · 2014-11-02

There seem to be two questions here -- how to get meaningful initial conditions for your network, and how to choose model parameters for that network.

For the first question, it depends on what you're trying to do. Picking random initial states and running several time steps through your network until you get conditions that are suitable (by whatever metric makes sense for your problem) is certainly one approach - it would look a bit like burn-in in MCMC models, which is certainly a widely-used technique. But note that burn-in from random starting points is often seen as something of a hack (here's one vote against, for example) - it's something you might try if you don't know what else to do.

Presumably you're going to use your network and the initial conditions to start from some biologically plausible initial state, then perturb the state in some well-characterized way, and then see what the outcome is. So ideally you'd just pick one (or a set of several) initial states that you know are relevant and start from there. (You may still want to run several time steps before imposing the perturbation just to shake out any transients that result from having slightly approximated or otherwise mis-characterized the initial conditions. Otherwise you risk confusing those initial transients with the consequences of the perturbation you're imposing).

If you really don't know what sensible initial conditions are, you could try the random approach, but at the very least try a large number of initial random configurations to make sure you're getting a sensible range of sensible starting points.

As for the second question, good choices for α and β (do you mean ε? There's no β in equation 1 of the cited paper) that's just part of your model for the time evolution of the protein expression, and the paper describes how they were chosen in that case; most of the derivations had them set to 1, and then they were eventually updated to better match the desired properties by using conjugate gradient (several relevant R packages are listed in the Optimization task view on CRAN).