Question

How does "missing" (3rd column) of sample file in SNPTEST affect the results?

2

Entering edit mode

11.1 years ago

lanjinglingxiaoni ▴ 20

I have a question regarding the MISSING column (3rd column) of SAMPLE file for SNPTEST.

In the webpage of SNPTEST, it said that:

The sample file has three parts (a) a header line detailing the names of the columns in the file, (b) a line detailing the types of variables stored in each column, and (c) a line for each individual detailing the information for that individual. Here is an example of the start of a sample file for reference

ID_1 ID_2 missing cov_1 cov_2 cov_3 cov_4 pheno1 bin1
0 0 0 D D C C P B
1 1 0.007 1 2 0.0019 -0.008 1.233 1
2 2 0.009 1 2 0.0022 -0.001 6.234 0
3 3 0.005 1 2 0.0025 0.0028 6.121 1
4 4 0.007 2 1 0.0017 -0.011 3.234 1
5 5 0.004 3 2 -0.012 0.0236 2.786 0

This missing refers the sample call rate of certain number of SNPs.

I wonder how does "missing" affect association results?

When handling big data, you often break down into 22 chromosomes. The missing value varied in each chromosomes.

If "missing" does affect results, what should we use?

If "missing" does not affect results, why SNPTEST require this for analysis?

sample-file snptest missing imputation gwas • 3.5k views

ADD COMMENT • link updated 3.7 years ago by Ram 45k • written 11.1 years ago by lanjinglingxiaoni ▴ 20

0

Entering edit mode

hi, did you manage to calculate this, I don't know how to calculate the missing for creating a sample file

ADD REPLY • link 7.8 years ago by jfertaj ▴ 110