PLINK returns NA values for all minor allele frequencies in my data
0
1
Entering edit mode
10.2 years ago

Hi,

I'm having the following problem with PLINK:

I am using the --freq command to calculate allele frequencies from an input that was created from 23andMe data. However all I get in the frq report is NA estimates for all minor allele frequencies:

 CHR          SNP   A1   A2          MAF  NCHROBS
   1   rs12564807    0    A           NA        0
   1    rs3131972    A    G           NA        0
   1  rs148828841    A    C           NA        0
   1   rs12124819    G    A           NA        0
   1  rs115093905    T    G           NA        0
   1   rs11240777    A    G           NA        0
etc...

Same thing goes for --hwe etc. --missing is the only command that seems to be working, so I know that the file is read correctly.

I don't know what's wrong because PLINK reads the input files correctly. I suspect it is the allele coding, but I have tried several solutions and they still don't work. Has anyone come across with a similar issue?

Yorgos

freq plink maf SNP • 7.8k views
ADD COMMENT
0
Entering edit mode

Can you post your log file.

ADD REPLY
0
Entering edit mode

Sure!

PLINK v1.90b2i 64-bit (8 Sep 2014)
4 arguments: --file test --freq --set-hh-missing
Hostname:
Working directory: /Users/
Start time: Tue Sep 30 17:26:49 2014

Random number seed: 1412090809
16384 MB RAM detected; reserving 8192 MB for main workspace.
Scanning .ped file... done.
Performing single-pass .bed write (592555 variants, 723 people).
--file: plink-temporary.bed + plink-temporary.bim + plink-temporary.fam
written.
592555 variants loaded from .bim file.
723 people (232 males, 491 females) loaded from .fam.
Using 1 thread (no multithreaded calculations invoked).
Calculating allele frequencies... done.
Warning: 206862 het. haploid genotypes present (see plink.hh ).
Warning: Nonmissing nonmale Y chromosome genotype(s) present.
Total genotyping rate is 0.986469.
--freq: Allele frequencies written to plink.frq .

End time: Tue Sep 30 17:27:05 2014
ADD REPLY
3
Entering edit mode

Try adding --nonfounders to the command line. Normally, PLINK --freq and --hwe excludes all samples with at least one parental ID; so if everyone in your dataset has parental IDs (it's necessary to use '0' to indicate an unknown parent), that would explain your result.

(You should also use the most recent build: there was a --nonfounders bug fixed on September 26th.)

If --nonfounders does not fix the problem, let me know.

ADD REPLY
0
Entering edit mode

I spent the entire morning testing different files and I got to the exact same conclusion:

When I first built the ped file, I assigned a non-zero father and mother to all my individuals, so there were no founder individuals left to be used for allele frequency calculations. I was just about to re-built the file with 0's for dads and mums, but then I saw your reply: --nonfounders flag actually worked, so thank you so much!

I don't know if I should lough or cry, ha ha ha...

ADD REPLY
0
Entering edit mode

Did you check your plink.hh file? It says you have a lot of haploid genotypes present. This suggests that your file format might be off.

ADD REPLY
0
Entering edit mode

I did check it and I tried different things to solve the problem (including using the --set-hh-missing option and by removing X, Y, XY and mtDNA SNPs), but the problem persists...

Any ideas? :-/

ADD REPLY

Login before adding your answer.

Traffic: 1860 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6