Hi everyone,
I'm a Msc. student in epidemiology with absolutely no background in genetics and Plink who's in over his head in his genetic epidemiology class, so I'm sorry in advance if my question seems trivial or if it's not explained well.
I have a homework that was in two parts, the first part being quality control/cleaning a dataset and this part which is a statistical analysis of the data we cleaned earlier. The first part went well, I have the same amount of SNPs and participants remaining as in the teacher's solution.
The final dataset contains 500 SNPs and around 15 000 participants. Our SNPs are all 300kb around the PITX2 gene. This is a case-control study in which we're trying to find the association between SNPs and one specific phenotype using logistic regression. It's also not explicitly stated, but the way things are worded, I have a feeling I'm supposed to find one SNP associated with the phenotype and then talk about it.
We are using Plink and for building our model the teacher gives us the code line, we basically just have to choose which covariates to include. I did exactly that and went in R to make a manhattan plot and look at my results. I created a column to get my -log10 of my p-values and realized that most of my 500 SNPs are statistically significant. About one-fifth of them have a -log10 p-value of around 40 and not a single one of them stands out from the rest. That was the code I used for my regression.
plink \
--bfile cohorte3rs \
--logistic sex \
--ci 0.95 \
--covar covariables.txt \
--covar-name AGE SEX BMI EDUYRS C1 C2 C3 C4 C5 \
--hide-covar \
--out model1
Now, I'm a total beginner but that doesn't seem to make sense, so I'm wondering what could have went wrong?
I'm pretty sure about the covariates I'm using, even when I try changing them a bit the result doesn't seem to really change.
Is it possible that I screwed up somewhere in the data cleaning part but was still able to get exactly the same number of remaining SNPs and participants as intended?
Sorry for the long post, if any of you can give me some assistance, that would be greatly appreciated, thanks!
Thank you for taking the time to answer!
I copy/pasted the version in which I forgot to remove SEX from my covariates. I didn't expect this type of results since we usually read articles and it's not what we're used to see. I tried a couple of my highest -log10 p values on https://genetics.opentargets.org/ and most seem to be strongly associated with my phenotype so it actually makes sense.
Thanks again!