Hello,
I have a sync file extracted with Popoolation2 software that looks like that:
Contig Position Ref Pool1 Pool2 Pool3 Pool4
SCAFOLD1 11722 A 330:0:0:0:0:0 315:0:0:0:0:0 334:0:0:0:0:0 111:0:0:0:0:0
SCAFOLD1 11723 T 0:330:0:0:0:0 0:316:0:0:0:0 0:334:0:0:0:0 0:111:0:0:0:0
SCAFOLD1 11725 T 0:327:0:0:0:0 0:314:0:0:0:0 0:329:0:0:0:0 0:111:0:0:0:0
SCAFOLD1 11726 A 330:0:0:0:0:0 314:0:0:0:0:0 332:0:0:0:0:0 111:0:0:0:0:0
Each cell contain the allelic counts for each basis (e.g. 330:0:0:0:0:0 for A:T:C:G:N).
I would like to perform a genetic PCA on this dataset just as one would do it on a 012 file extracted with VCFtools. I guess, one could convert the sync file with a single value per cell by adding the total number of non-reference alleles and work from that.
Does anybody have experience with that? Any opinion/comment would be very helpful.
Thanks!
Hi, did you find out how to perform the PCA? I also obtained a sync file using popoolations2 and a VCF using GATK and I was trying to perform a PCA using either file... but no success yet. Thank you,
Natalia
I managed following your method. Thanks a million!