hi, I have data that looks like this: 3-column SNPs their gene based on Annovar and a p-value for every SNP. What I would like is to aggregate the p values for every gene.
snps <- data.frame(
snp_id = c("rs1", "rs2", "rs3", "rs4", "rs5", "rs6", "rs7", "rs8"),
Gene.refGene_ANNOVAR = c("gene1", "gene1", "gene1", "gene1", "gene2", "gene2", "gene2", "gene2"),
p.value = c(0.7703884, 0.9648540, 0.9648540, 0.9648540, 0.54, 0.03, 0.03, 0.8)
)
above an example of the data. I read about the SKAT method -> https://www.hsph.harvard.edu/skat/ and figured it might do the work. I read about the package here:https://rdrr.io/cran/SKAT/man/SKAT.html tried to implement it on my data, but got lost as to how to perform it correctly:
gene_pvals <- aggregate(p.value ~ Gene.refGene_ANNOVAR, data = df, FUN = function(x) SKAT::SKAT_Null(x)$p.value)
it doesn't work and returns errors. I would be happy if you could share your knowledge in this situation. I don't have information about the correlations between the SNPs but I know that they are correlated, do I have enough data to complete the SKAT method?
I have no sense of whether SKAT is the right tool for your analysis, but some insight into your errors: did you mean
data = snps
instead ondata = df
? Also,SKAT_Null()
does not appear to be a valid function in SKAT v2.2.5. Did you mean to callSKAT_Null_Model()
?would SAIGE-GENE+ be any help here ?
acvill can you sujjest maybe pther tools , methods for this ? ( also for every SNP i have data how many patients had the hetro/homozygous encodein (0,1,2). THANKS
cross-posted: https://bioinformatics.stackexchange.com/questions/20772