Hi people,
I have my plink files, and I would like to simulate a GWAS. First of all, I do:
1) Obtaining a list of causal SNPs (here, 10 SNPs are selected randomly as causal).
awk '{print $2}' plink.map | shuf -n 10 > CAUSAL_LIST
2) Phenotypes are estimated according to the causal list.
./bin/gcta64 --bfile plink8 --simu-qt --simu-causal-loci CAUSAL_LIST --simu-hsq 0.5 --simu-rep 3 --out qphenotype
3) Introducing the estimated phenotypes to the plink.ped file.
awk 'FNR==NR{a[NR]=$3;next}{$6=a[FNR]}1' qphenotype.phen plink.ped > temp.ped
cp temp.ped plink.ped
4) Association analysis.
./bin/plink --noweb --file plink --assoc --allow-no-sex
Now, a new file is generated with the P-values. They are plotted -log10 using R. Ordering the column of p-values in an ascending way, I realize that there are many false positives (some of my causal SNPs are at the top, and others not). It does not help too much applying a FDR of 0.05.
My question is why my causal SNPs are not at the top, all of them?
How many SNPs and samples do you have? What is the lowest P you observe.
400 SNPs. The lowest: 3.4ยท10^-11.
If you do the analysis with 10 new SNPs, do you still have false positive?
Yes I still have false positives. However, I've noticed that I had stratification among my population size by doing a PCA. So, I've just removed this bias. Now I've done another GWAS simulation (for the quantitative trait) without this bias. But now, there are not any significant SNP. Might it be because of my population size (120 individuals)?
Yeah it's possible. 120 individuals for a GWAS is a small dataset. Maybe you could try new options when you estimate phenotypes.