Hello, I want to do an association test with plink.
I have two vcf files: one has all common variants for case population, the other has all common variants for control population. I want to feed these vcfs to plink. I don't understand the input plink needs for --assoc command. Is there any way I can do this? Or how can I reformat the vcf files to feed them to plink --assoc?
Thank you!
Thanks a lot. Am I correct to think MySampleListing.list should be like:
?
And how about the FAM file? ( only have phenotype (condition) information, but I put it in the FAMILY ID
Your sort file (MySampleListing.list) looks okay. Those IDs should all be in your VCF. If not, a warning or error will be thrown (I believe).
Your FAM has too much information. The minimal needed is:
Most of these can be left as -9 or 0. An association test just looks at Phenotype, and uses the IID to match up to your PLINK dataset.
When you input the data to PLINK, you can double check the sample order of the new dataset by outputting the new dataset again and looking at the header of the output PED file, e.g.,
--recode transpose
I get the error in which says 'Error: --indiv-sort file does not contain all loaded sample IDs.' here is what my vcf looks like:
Also, after I removed the option 'indiv-sort' just to check what happened, I kept on with the pipeline, and in the last command for the association analysis I used this .fam:
(30 lines one for each sample in the vcf)
I got this message: "No phenotypes present" "Skipping --assoc/--model since less than two phenotypes are present."
For good reason, there is strict enforcing in the matching of the samples in the VCF with those in the sort file specified with
--indiv-sort file
. Also, I believe the sort file should be tab-delimited. If you only have 30 samples, you should be able to resolve the discrepancy quite quickly.Alternatively, you can indeed skip the
--indiv-sort file
part and proceed from there. In this case, it is absolutely essential that your FAM file sample order is the exact same as the sample order in your PLINK dataset. Do not make any assumptions in this regard, because PLINK makes no checks.