Hi everyone,
Firstly, I apologize for posting this question here if it's not the appropriate platform.
I'm currently working with data from Whole Genome Sequencing (data is quite huge). The data includes information about individuals who carry Single Nucleotide Polymorphisms (SNPs) within different age brackets and categories. The categories consist of heterozygous, Homozygous Alternate, and Homozygous Reference, and each category has 12 age brackets ranging from age <30 to age >80. Here's an example of the columns I have:
col1 hetero_age<30 hetero_age30-35 till hetero_age>80 followed by Homo_cols age<30 to homo_age >80 and Homo_ref_age<30 to homo_ref_age>80
SNP1 counts in each age bracket for all categories
Essentially, each SNP has counts of individuals for each category in the 12 age brackets.
From this data, I have reduced it to two categories: Minor Allele carriers (Homozygous Alternate * 2 + heterozygous for all age groups) and Major Allele carriers (sum of all alleles - Minor Allele). The final file appears as follows:
now the final file looks like this.
col1 Minor_allele_age<30 Minor allele_age30-35 till Minor_allele_age>80 followed by Major_allele_allele_age<30 till Major_allele_age>80
SNP:1 counts in each age bracket.
In the final file, column 1 displays the rsID followed by 12 columns showing the counts of Minor Alleles in different age brackets, and another set of 12 columns representing the counts of Major Alleles in the same age brackets as the Minor Alleles.
Now, I'm interested in conducting a statistical analysis to compare the counts of Major and Minor Alleles across all age brackets. However, I'm unsure which statistical test would be appropriate for this analysis. In R, I have already performed a single CHISQ test and Fisher exact test, as shown below:
I have run a simple CHISQ and FISHER exact test in R. e.g
for (i in 1:nrow(data)) {
minor_counts <- as.numeric(data[i, 2:13])
major_counts <- as.numeric(data[i, 14:25])
contingency_table <- rbind(minor_counts, major_counts)
chisq_result <- chisq.test(contingency_table, simulate.p.value = TRUE)
p_values[i] <- chisq_result$p.value
}
Both tests operate on a 2x12 matrix. As I am not a statistician, I am uncertain if these tests are suitable for the type of analysis I am conducting. I would greatly appreciate the guidance of a statistician to recommend a suitable analysis for this type of data.
I would greatly appreciate the guidance of a statistician to recommend a suitable analysis for this type of data. Thank you in advance for your assistance.