Good morning,
I have a data set for different patients and mutated genes in two conditions (recurrence and T0). What I want to compare is if the genes are more frequent at recurrence or T0. I am not considering expression, type of mutation etc. Eg:
# Example data frame
gene_counts <- data.frame(
gene = c("A", "B", "C", "N", "T", "X", "Y", "Z"),
Recurrence = c(10, 5, 0, 100, 1, 0, 1, 0),
T0 = c(50, 10, 4, 150, 0, 1, 0, 1)
)
Now, my question for asses this comparison is, should I check for frequencies to define the type of test (F-fisher or Chi) basedon the frequencies as follow:
# Function to perform the appropriate test based on counts
perform_test <- function(recurrence, t0) {
contingency_table <- matrix(c(recurrence, t0, sum(gene_counts$Recurrence) - recurrence, sum(gene_counts$T0) - t0), nrow = 2)
if (any(contingency_table < 5)) {
# Use Fisher's test if counts are low
p.value <- fisher.test(contingency_table)$p.value
} else {
# Use Chi-square test otherwise
p.value <- chisq.test(contingency_table)$p.value
}
return(p.value)
}
# Apply test for each gene
gene_counts <- gene_counts %>%
rowwise() %>%
mutate(p_value = perform_test(Recurrence, T0)) %>%
ungroup()
gene_counts <- gene_counts %>%
mutate(adj_p_value = p.adjust(p_value, method = "BH"))
Or is there any other more straight forward way to do this? e.g is there a good practice or standardised statistical test for this approach rather than to check every case?
P.S Not all the genes are necessarily present in all the samples
Thank you!!!
rvtest ? http://zhanxw.github.io/rvtests/
skat ? skat-o ? https://cran.r-project.org/web/packages/SKAT/SKAT.pdf
regenie ? https://rgcgithub.github.io/regenie/
Those are really good tools, however, I am not sure if suitable for my approach? As what I have is the counts per gene per condition within a population, not directly related with specific mutations. Would a permutation test be beneficial in this case?
yes, I'm not a specialist, but I think the tools above don't use the specific variations, just looking how many people in case/control population have least one zero rare allele (missense...) in a gene.
Thank you, The idea for this is just to compare if there are differences in the frequency of a mutated gene in a population at two different time points. I am struggle try to find the best model to do this comparison... Hopefully someone here might have more experience!