Hi
I am trying to align my genetic data (microarray chip) in plink binary format to forward strand So what I did was get the reference genome GRCh38.primary_assembly.genome.fa and make a dictionary in python for key: chromosome, value: sequence. Then I retrieved the reference base from the dictionary using my basepair position in the bim file to find the nucleotide in the sequence in each chromosome and added the reference base found as a separate column in the bim file and aligned my A1 and A2. This, I believe, have aligned my genetic data to be forward stranded. Then I lifted down to hg19 and checked if my SNPs are , at least most of them, are forward stranded by using the fasta file for hg19 and compared whether my A1 or A2 is identical with reference base retrieved as done in the same way with what I have done with reference genome fasta file of hg38. I only did all these with hg19, hg38 non-palindromic SNPs just to be clear. But, I see many many non-palindromic SNPs having A1 or A2 unidentical with the fasta file hg19 retrieved reference base. What could be wrong here?
Also, if there is better approach in aligning my genotype chip data to forward strand, if you could comment that would be great
Thanks in advance !