Entering edit mode
4.3 years ago
MAPK
★
2.1k
I have a number of individuals for project A to L carrying one particular variant (rsXXX). There are homozygous reference and heterozygous only--no homozygous alternate for this variant in my cohorts. I would like to calculate the MAF for cases and control for the cohorts (and also for cases and controls), but I am not very familiar with the calculation methods. Could someone please help me calculate this.
cohort <- structure(list(Project = c("A", "B", "C", "D",
"E", "F", "G", "H", "I", "J", "K",
"L"), Homo_Ref_Total_Individuals = c(836L, 1666L, 209L, 16L, 929L, 841L, 252L,
1493L, 568L, 44L, 190L, 2L), Homo_Ref_CASES = c(527, 993, 0, 0, 471,
226, 201, 1036, 0, 0, 0, 0), Homo_Ref_CONTROLS = c(191, 671, 209, 0,
295, 615, 17, 326, 161, 0, 94, 0), Hetero_Total_Individuals = c(5, 10, 2, 0, 12,
8, 6, 23, 1, 0, 1, 0), Hetero_CASES = c(2, 6, 0, 0, 5, 1, 4, 21, 0,
0, 0, 0), Hetero_CONTROLS = c(3, 4, 2, 0, 5, 7, 0, 2, 1, 0, 0, 0)), class = "data.frame", row.names = c(NA,
-12L))
You know you could
print(cohort)
and paste that as a table instead of usingdput
- easier to eyeball that way. It would look like this:I think that you should first resolve why
Hom_Ref_CASES
+Hom_Ref_CONTROLS
does not equalHom_Ref_Total_Individuals
. I mean, how can you explain the data for D (which is easier for the brain because it is all zero values)? Perhaps 'Hom_Ref_Total_Individuals' is an incorrect label.Otherwise, once you resolve the discrepancy, the minor allele frequency (MAF) calculation is literal as per the very term, i.e., the frequency of your less frequent [minor] allele, but we usually quote this frequency per cases and controls (i.e., X% in cases; Y% in controls).
Ultimately, your data, as presented, makes no sense.
Hi Kevin, It is because I have
(Hom_ref_CASES ==2) + (Hom_Ref_CONTROLS==1) + (unknown== -9) = Hom_Ref_Total
. Same with theHet_Total
. So, now the question is do I have to sum the cases (hom+Het cases) and make it my cohort or should I just use (Hom_Ref_Total+ Het_Total) as my cohort? I am really confused how to calculate the maf with this data.