I have virus sequences from different geography for that I have perform Population structure analysis using STRUCTURE software. It give me Kopt at K=3. Now I want to perform PCA for these using Eigensoft but I have only vcf files. i have no case control data. how should i use VCF as input data.
The important error here is "Error: data.bim cannot contain multiallelic variants".
Use --make-pgen/--pfile instead of --make-bed/--bfile when working with multiallelic variants.
I use pgen command it give me three files i.e. pgen, pvar, and psam. how can i use these file for plotting PCA. plz guide me further.
--vcf: 4690 variants scanned. --vcf: NV-temporary.pgen + NV-temporary.pvar.zst + NV-temporary.psam written. 822 samples (0 females, 0 males, 822 ambiguous; 822 founders) loaded from NV-temporary.psam. 4690 variants loaded from NV-temporary.pvar.zst. Note: No phenotype data present. Writing NV.psam ... done. Writing NV.pvar ... done. Writing NV.pgen ... done. End time: Mon Jan 25 10:24:45 2021
i use this command for PCA (plink2 --pfile file --PCA) it give me error failed to open .psam file.
Did you try googling the error message?
thanks it solved i have got two files eigenvec and eigen value. i am confused that my psm file have no sex(male female) information, will this create any bias in result?
also guide me how can now use egeinvec and eigen value in R for plotting pca
please guide me. I have low knowledge about plink and PCA.