Hi , I have data that has 3 column SNPs their gene based on Annovar and a pvalue for every SNP . What I would like is to aggregate the p values for every gene . I know that their is a dependence between the pvalues and I don't know the dependence structure. so fisher method won't work . I read about some methods here : https://arxiv.org/pdf/1212.4966.pdf . And came upon this :
6.1 A rule of thumb First we state a crude rule of thumb for choosing r. Since any method based on the observed values of p1, . . . , pK would affect the validity of the method (see Subsection 6.3), we have to rely on prior or side information for a suitable choice of r. As a rule of thumb, if there is potentially substantial dependence among the p-values, then we should not use Bonferroni, and the harmonic mean might be a safer choice. If we are certain that the dependence is really strong, then the geometric and the arithmetic means might be an even better option. See Subsection 6.4 for a simulation study illustrating this poin.
Based on the article would be happy if you could share your knowledge in this situation what technic is best to use for pvalue aggregation.
LChart Thank you . Could you explain a little more : " and the primary question is whether you can re-generate that 3-column file given new data, as that would enable you to perform a permutation test". The data I have is from an experiment and I can't generate a new one. . The pvalues I got for every SNP (that I want to aggregate based on gene ) were calculated using Armitage test
Sure, but can you switch around the labels in your data and re-run the trend test? If so, it means you have access to the underlying genotype and phenotype data, and can either manually run a permutation test (in which case you can select the minimum p-value and use permutations to understand the distribution of the null), or apply SAIGE/SKAT-O.
What do you mean switch the labels in the data ? I had data on patients with some mild and severe disease. So for each snp I counted how many patients that and heterozygous homozygous or didn't have that mutation and on that I did the Armitage test , so what do you mean by switching the labels? Do you find maybe any of the methods in the article helpful, I thought about the using the harmonic mean?
Because you have the raw genotype data, you can randomize the patient labels, which forms the basis of a https://en.wikipedia.org/wiki/Permutation_test.
In addition, there are tools already written for performing multi-variant association tests within genes. I have linked two of them.
The article you referenced is not pointing you in the correct direction, I'm afraid.
following your suggestion if i want to use :LD-based approach in sumFREGAT I read that i can obtain the LD information from 1000 Genomes Project BUT it may not be relevant or applicable to my study population -> can i still use it?
LChart I'm sorry for asking another of questions but I'm new to this field , can you please explain a little more how to use SKAT , saige in my case I didn't understand fully the toturials online about this and how can I implement in on my data like what input files should I have