Does anyone know or can point me to any resource regarding how to convert SNPTEST dosage data files (GEN/SAMPLE) files so that they work in R/SAS? PLINK can read SNPTEST dosage format data but it seems that it can only perform association tests whereas I would like to perform a multinomial logistic regression.
Thanks,
-Joey
EDIT: added the example data: I was given two sets of data: a) Hapmap2 imputed -> a series of files for each chromosome in standard SNPTEST format (GEN/SAMPLE) and a *.mlinfo file i.e. 66 files in total.
The *.mlinfo files looks like the following:
SNP POS A1 A2 REF_FREQ RSQ
rs10047182 4434181 A G 0.117476853526221 0.98222786900009
rs1009345 3576288 A G 0.395093490054250 0.389054499338887
b) 1000 genomes imputed dataset: IMPUTE v2 was used to get the files. For each chromosome, I have around 40-50 chunks depending on the # os SNPs in each.
I have a chunk1_info file which has the following:
np_id rs_id position exp_freq_a1 info certainty type info_type0 concord_type0 r2_type0
--- rs58108140 10583 0.125 0.025 0.765 0 -1 -1 -1
--- rs3877545 11508 1.000 0.000 1.000 0 -1 -1 -1
A infobysample file:
concord_type0 r2_type0
0.949 0.915
0.949 0.936
and the SNP information contained in each of the chunks:
--- rs4912140 20001071 T G 0 0 1 0 1 0 0 0 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 1 0 0 1 0 0 1 0.004 0.595 0.401 0 ........
I guess what I want is a file similar to the file one gets when one uses --recodeA option in PLINK. I can use the *.raw file along with covariates to run a bunch of other models (multinomial logit or cox prop, hazards model).
Thanks,
Joey
It could be helpful if you posted a small example of what this file format looks like, and perhaps also what you want to convert it to ("so that they work in R" is not very specific - R is pretty flexible and does not require strictly specified file formats).
Just added a link to our GWASToolKit on GitHub: https://github.com/swvanderlaan/GWASToolKit.