Random seed in genomics analyses
1
0
Entering edit mode
9.0 years ago
orzech_mag ▴ 230

Dear all,

I have some concerns in the terms of the "random seed" parameter which one can set in Comparative Marker Selection at Gene Pattern. Firstly, I understand that this is the parameter related to the random number generating to produce the permutations. However, I do not understand how it works.

I. e. if one use GSEA software the "random seed" is not numerical value, but "timestamp". So, the first question is - what does mean timestamp random seed?

Second thing is, that if one performs Comparative Marker Selection the random seed is by default set to some enigmatic value like 779948241. So, the second question is - what does this value mean?

I do not understand the difference between timestamp in GSEA and 779948241 in Comparative Marker Selection. Moreover, if I i.e. change this value to 0 how does it affect my data and the results of analysis?

Unfortunately, both manuals (GSEA and Comparative Marker Selection) do not explain this parameter. Please, explain it to me or suggest any literature about that would be understandable for not statistician.

Best regards and thanks in advance for any help!

GSEA comparative-marker-selection statistics • 3.0k views
ADD COMMENT
1
Entering edit mode
9.0 years ago

In short, the random seed is a number that can be used to reproduce any analysis involving random choices. For example if you run the Comparative Marker analysis with the seed 779948241 and the same data, you will get exactly the same results. If you use any other random seed, you will probably get a different output, because the random numbers used to generate the permutations or whatever random component used in the algorithms will be different.

The random seed is usually a number. In the case of GSEA, they probably convert the timestamp to a number, e.g. they convert it to Unix time. It is probably a more readable way to present a random number, e.g. it is easier to remember the current date and time at which you run an analysis, rather than a number like 779948241.

I wouldn't recommend you to set the random seed to 0, because the software may interpret this as you don't want to specify any seed, so it will use the system settings to generate one, and this will make it impossible to reproduce your analysis.

To learn more you can read about how computers generate pseudo-random sequences. In short, computers can not make up a random number like we human do; therefore software engineers can only use algorithms to generate pseudo-random numbers. These functions take a number as input (the random seed), and generate a different number or series of numbers.

ADD COMMENT
1
Entering edit mode

That's a good answer, here are some considerations about using the timestamp as seed :

Another advantage of the timestamp is that, every time you run the program, the seed (and the output) will be different. On the opposite, if you set manually the seed at 779948241 (or 0, or whatever), the program will use the same random numbers and will produce the same results from identical input. Sometimes it is useful to be able to reproduce exactly the same results, but usually, its better to vary the seed to make sure that your result is robust.

ADD REPLY
0
Entering edit mode

OK. Now I get it. The default seed value in Comparative Marker Selection (779948241) I leave only for my curiosity, however it is very nice to understand the basics. Generally, the problem with bioinformatic analyses lays in the parameters and simply, in mathematics.

Thank you very much for help!

ADD REPLY

Login before adding your answer.

Traffic: 1924 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6