Hi, I have a raw vcf file composed of around 40,000 SNPs. I want to input it in R Studio and divide it into two files based on the SNP positions. I have done this and these two files currently exist as two independent unsaved data frames. After this I want to run a PCA and Fst analysis on these two SNP files in R itself. How do I go about it- what packages do I use and more importantly what format do the files need to be converted to? Or can it be performed on this kind of dataframe directly? I have SNP positions in the second column and SNP data (GT/PL/GQ) for each individual in the following columns. This is what the file looks like (1st row given with header)
CHROM POS REF ALT QUAL FORMAT Indiv 1 Indiv 2 Indiv 3
EU153401.1 209 A G 999 GT:PL:GQ 1/1:70,12,0:31 1/1:132,15,0:34 1/1:72,12,0:31
I also have a heterozygosity matrix for these two files (data in terms of 0,1,2). Can this be used for pca or fst?
I am new to R. Need some help urgently. Thanks a lot.
This previous answer will probably help you: A: Pca From Vcf Files