Hi,
Edit:
I have two different conditions for histone H3 data. For each condition I got a geneset. I plotted average line plot for each geneset from respective condition. Now, I want to confirm that whatever difference I am seeing between these two conditions in average line is statistically significant or not ?
To prove that, I did following steps.
1) Randomly I generated two genesets from respective condition and then I calculated average for each geneset. once I get the average between two random geneset from respective conditions, I calculate euclidian distance between two of them. Once I get the distribution (Total 1000 iterations), I get p value from the distribution for the distance of my original geneset using z-score. The question here is the distribution which I got from random geneset is skewed distribution while pvalue calculation from the z-tableassumes that data is normally distributed. Here, value of my skewness and kurtosis is 1.166323 and 4.91863 respectively. So, i wonder the way I am calculating pvalue is ok or I should use another distribution to get the pvalue for skewed distribution
Currently I am using zscore and resulting pvalue is significant.
See the distribution here http://rpubs.com/parsaniac/277714
Thanks Chirag.
Hello Chirag Parsania!
We believe that this post does not fit the main topic of this site.
This has no connection to bioinformatics. Please make an effort to explain it if it exists.
For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.
If you disagree please tell us why in a reply below, we'll be happy to talk about it.
Cheers!
Hi Michael Dondrup,
Statistics has always direct connection with bioinformatics. I asked this question for one my NGS data analysis query in which I have H3 chip data of two different conditions. Statistically I want to prove that for set of genes difference of H3 between these two sample is very significant. Anyway, Thanks for your suggestion. I will ask on link you have provided.
~Chirag.
It is your duty to make the connection explicit. Not every stat question is relevant for bioinformatics, in the same way not every programming question is relevant for bioinformatics. Incomplete definition of the application domain is a big problem for applying statistics and of this question. Therefore we need to know the exact setup to be able to judge if the question can be answered here or at all.
You can see a misconception here as well:
Short answer: No p-value != z-score
It is not clear what you are asking here. A p-value is an extreme value of a distribution of a test-statistic under the null hypotheses. For which observations do you calculate which test statistics? Is the (skewed) distribution known or did you infer it from the data empirically? If there is only the "distribution" you want to assign the p-value to, then there is no such thing as the p-value of a single distribution.
Let me explain more clearly. As I told, I have two different conditions for histone H3 data. For each condition I got a geneset. I plotted average line plot each geneset from respective condition. Now, I want to confirm that whatever difference I am seeing between these two conditions in average line is statistically significant or not ?
To prove that, I did following steps.
1) Randomly I generated two genesets from respective condition and then I calculated average for each geneset. once I get the average between two random geneset from respective conditions, I calculate euclidian distance between two of them. Once I get the distribution (Total 1000 iterations), I get p value from the distribution for the distance of my original geneset using z-score. The question here is the distribution which I got from random geneset is skewed distribution while pvalue calculation from the z-tableassumes that data is normally distributed. Here, value of my skewness and kurtosis is 1.166323 and 4.91863 respectively. So, i wonder the way I am calculating pvalue is ok or I should use another distribution to get the pvalue for skewed distribution
Hope this explains well.!
Thanks.
you should edit your question to add these informations.
You sould post your question to cross-validated : https://stats.stackexchange.com/
I have edited and re-opened the question. Please see my edits to demonstrate how to convey enough bioinformatics context, so that the question can be answered.
Thanks a lot. I really appreciate :)
Cheers, Chirag.