Hi, thanks for taking the time. I am learning Plink these days and find one question. I download the example(called hapmap) in the website and find the number of genotype data in the PED is different from the number of SNP in the MAP. I want to know if anyone knows the reason. Thanks a lot.
The PED has SNPs for columns and individuals for rows (2 columns per SNP! I.e., A A instead of AA), while the MAP has SNPs as rows. Furthermore, the first few columns in a PED file are not genotypes - they are
Family ID
Individual ID
Paternal ID
Maternal ID
Sex (1=male; 2=female; other=unknown)
Phenotype
awk '{print (NF - 6)/2}' your_ped_file.ped | head -1
The first wc command prints the number of lines in the map file (= number of SNPs in map), the second prints the number of columns (NF) minus the 6 additional IDs, divided by 2 since you always have two columns per SNP.
If your numbers are actually different, try re-downloading the files.
Thank you so much for your answering. I read it and find my understanding before was not wrong. However, when I do it again I find I make a stupid mistake. I ignore there are so many colunms in the excel and just read the first few columns. Thanks again.
And I wonder if you can help me for anther question. I am using another uncommon software called AML.The input file needs SNPs information. And there are two codes.
Thank you so much for your answering. I read it and find my understanding before was not wrong. However, when I do it again I find I make a stupid mistake. I ignore there are so many colunms in the excel and just read the first few columns. Thanks again.
And I wonder if you can help me for anther question. I am using another uncommon software called AML.The input file needs SNPs information. And there are two codes.
I only know the first as follows:
Can you give me some suggest for the second format?
Sorry I never heard of that software, and googling for it doesn't give me any results; try opening another question, maybe someone else knows?
OK, thanks.