I used an Illumina iScan microarray, designed for humans and based on hg19, to genotype macaques. Based on prior advice in Biostars, I was able to export from Illumina's proprietary IDAT format into PLINK, and then export into VCF. Thus, I now have a VCF based on hg19.
Recently, we sequenced a bunch of whole genomes from macaques. We'd like to combine the iScan and whole-genome VCFs for downstream analysis, but the genomes were mapped to the macaque reference genome -- not hg19!
I now need to get the iScan VCF from hg19 into macaque reference genome format. I have struggled with this for weeks and I'm at a loss on how to move forward.
I tried LiftoverVCF in Picard, understanding that it's not meant for use between species. It doesn't work at all, because the reference allele doesn't match that in the index. My understanding is also that PLINK determines the reference allele based on the population sample (i.e. the iScan data) and not based on a reference genome.
To accomplish this, I think I need to write scripts to do the following:
1) Manually replace the CHROM, POS and ID positions in the VCF, from hg19 to macaque, using the information in the UCSC liftover chain file.
2) Manually look up the reference and alternative alleles for each position in the macaque reference genome.
3) Re-code the VCF, forcing it to update the REF and ALT alleles to use those looked up from the reference.
Does this sound like the correct approach, or am I misunderstanding or over-complicating things here? I have never done any cross-species or liftover work before and this is more challenging than I can normally handle. I would appreciate any suggestions on how to tackle this.