Hi,
I will like to carry out visualisations (PCA, clustering etc) based on SNP calls in some files with aim to see how the samples in each of the files are different from each other. Some of the files originate from affymetrix (from CEL in Axiom Analysis Suite; 0, 1, 2, -1 format) and others from sequencing (VCF to plink files; 0, 1, 2 format). I have merged the SNP calls from all files, but just wondering If I will be comparing apple with apple and not apple with orange. My understanding of the SNP call formats is:
Haploid vcf: Sequencing
0: Ref
1: first alternative allele
2. second alternative allele
Diploid organism - Affymetrix:
0: AA (HOM REF)
1: AB (HET)
2: BB (HOM VAR)
-1: missing/No call
Since there is no "-1" in the sequencing format, I can set -1 in the combined data to NA; even then I'm not sure the consistent format (0 1 2) left will be a fair comparison because of the haploid/diploid formats. Can anyone suggest what I can do in terms of how to transform the data to enable fair comparison and visualisations of the SNP calls from different origins? Thanks
Thanks for your advise @Kevin Blighe