I have a question regarding the MISSING column (3rd column) of SAMPLE file for SNPTEST.
In the webpage of SNPTEST, it said that:
The sample file has three parts (a) a header line detailing the names of the columns in the file, (b) a line detailing the types of variables stored in each column, and (c) a line for each individual detailing the information for that individual. Here is an example of the start of a sample file for reference
ID_1 ID_2 missing cov_1 cov_2 cov_3 cov_4 pheno1 bin1
0 0 0 D D C C P B
1 1 0.007 1 2 0.0019 -0.008 1.233 1
2 2 0.009 1 2 0.0022 -0.001 6.234 0
3 3 0.005 1 2 0.0025 0.0028 6.121 1
4 4 0.007 2 1 0.0017 -0.011 3.234 1
5 5 0.004 3 2 -0.012 0.0236 2.786 0
This missing refers the sample call rate of certain number of SNPs.
I wonder how does "missing" affect association results?
When handling big data, you often break down into 22 chromosomes. The missing value varied in each chromosomes.
If "missing" does affect results, what should we use?
If "missing" does not affect results, why SNPTEST require this for analysis?
hi, did you manage to calculate this, I don't know how to calculate the missing for creating a sample file