Hi, I'm new with GWAS studies and Plink. I have a set of 4 chromosomes, and more than 3000 individuals and nearly 200 000 SNP. I have already cleaned the data using this command (for each chromosome) :
plink --file chr5 --make-bed --out chr5 --geno 0.05 --maf 0.05 --mind 0.025 --hwe 0.001
Now my first question is is this relevant to do an PCA to select outliers even after using HWE ? And second question : the outputs of the command for each chromosome gives me different number of individuals. So we have something like :
# chr 2
52627 variants and 3027 people pass filters and QC.
# chr 5
41022 variants and 3032 people pass filters and QC.
# chr 13
24171 variants and 3004 people pass filters and QC.
# chr 16
19664 variants and 3048 people pass filters and QC.
Should I keep files with the same individuals between all files ? or is it ok to keep some individuals for a chromosome, which may not be present in the other chromosome. Thanks !
I would merge all into one binary (bed) file then do the QC, etc.
How do you merge all files into one bed file ? with cat ? and you do this with bed and bim and fam files ? That would be very helpful if you'd be a little more specific. Thanks !
Plink is very well documented, see Merge multiple filesets
If you are performing a single SNP analysis, generally it is not necessary to have the same number of samples for each variant.