Visualising affymetrix and sequencing SNP calls
1
0
Entering edit mode
4.8 years ago
robjohn70000 ▴ 160

Hi,

I will like to carry out visualisations (PCA, clustering etc) based on SNP calls in some files with aim to see how the samples in each of the files are different from each other. Some of the files originate from affymetrix (from CEL in Axiom Analysis Suite; 0, 1, 2, -1 format) and others from sequencing (VCF to plink files; 0, 1, 2 format). I have merged the SNP calls from all files, but just wondering If I will be comparing apple with apple and not apple with orange. My understanding of the SNP call formats is:

Haploid vcf: Sequencing

0: Ref
1: first alternative allele
2. second alternative allele

Diploid organism - Affymetrix:

0: AA (HOM REF)
1: AB (HET)
2: BB (HOM VAR)
-1: missing/No call

Since there is no "-1" in the sequencing format, I can set -1 in the combined data to NA; even then I'm not sure the consistent format (0 1 2) left will be a fair comparison because of the haploid/diploid formats. Can anyone suggest what I can do in terms of how to transform the data to enable fair comparison and visualisations of the SNP calls from different origins? Thanks

snp next-gen R • 849 views
ADD COMMENT
1
Entering edit mode
4.7 years ago

I think that you need to be careful, as I am not confident that the Affymetrix AB encoding is the same as the PLINK 012 encoding.

You will likely have to determine which is the A and B allele on the Affymetrix platform, and then go variant-by-variant to align these to the data in your original VCF.

You will have to explore various options.

Kevin

ADD COMMENT
1
Entering edit mode

Thanks for your advise @Kevin Blighe

ADD REPLY

Login before adding your answer.

Traffic: 1800 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6