I have a large vcf file with both case and control samples in the file. I am planning to input the vcf into the --assoc function of plink with the --fam parameter that contains a .fam file that specifies which samples are case and which are control. I want to make a .fam file with the case and control samples labelled in the phenotype value column. How can I assign case/control to different samples for the .fam of my merged vcf? My big vcf file doesn't indicate whether the samples are case or control. I have two tsv files, one with a list of control samples and one with a list of case samples. Can I use these two files to specify in the .fam file which samples are case and which are control?
Could I also set the specific gene mutation as the case and samples without the mutation as the control, and run an association test based on those parameters?
Thanks, this looks pretty good. The thing is, I only have the IID listed in the case and control .tsv. I think the FID and IID in my .fam are the same for each sample.
Also, I need the three fields between IID and Phenotypical value (all should be 0). How can I add 3 columns of 0 between IID and Phenotypical value?
I tried this, but plink log tells me that all the samples detected are control samples. Is it because I don't have the values for ID of father and mother in the file?
You don't need the file to be a .fam when you are working on a vcf, only really need a phenotype file, with format of
FID IID Phenotype
.If your tsv only contain IID, you can do
Or did you already converted you vcf into a binary plink file?
In that case, you can do
Yeah, I had converted the vcf into a bfile prior. Thanks.
So it doesn't matter if the Phenotype file lists the samples in a diff order than the bfile?
Yup, plink is very smart, will do matching for you automatically