My goal is to convert Illumina data to forward/reverse strand data (AffyMetrix). I have 10K, 60K, 80K and 660K data from both sources.
I have the following types of files:
***.ref
snp CHROMOSOME POSITION COUNT_ALLELE OTHER_ALLELE
snp1 17 190170 G A
snp2 17 469495 A G
--- One line for each SNP
***.dat
pig_id snp1 snp2 snp3 snp4 --- snp58448
11641206 1 0 1 0 0
11324561 1 0 2 0 0
14561322 2 1 1 0 1
13513507 0 0 2 0 0
--One line per animal, one row per snp
I also have an Illumina HD annotation file which looks as follows:
Illumina Inc.
[Heading]
Descriptor File Name GGP HD Porcine.csv
Assay Format Infinium HD Ultra
Date Manufactured 5/15/2013
Loci Count 68516
[Assay]
IlmnID Name IlmnStrand SNP AddressA_ID AlleleA_ProbeSeq AddressB_ID AlleleB_ProbeSeq GenomeBuild Chr MapInfo Ploidy Species Source SourceVersion SourceStrand SourceSeq TopGenomicSeq BeadSetID
LD-Porcine80K_ALGA0000022-0_T_F_2164561890-0_T_F_2165597341 ALGA0000022 TOP [A/G] 19808437 *seq* 10.2 1 865364 diploid Sus scrofa rs80958395 0 TOP *seq* [A/G]877
When I compare SNPs from the illumina annotation file with the affymetrix annotation file, I find inconsistency in the SNP callings.
For example after manually checking some values I find the following:
sometimes illumina TOP A/G is called as T/C in affymetrix, it sometimes is also called as A/G in affymetrix.
sometimes illumina TOP A/C is (sometimes?)called as T/G in affymetrix,
sometimes illumina BOT T/G is (sometimes?)called as A/C in affymetrix
I hope somebody will be able to help me out here