Hi all, first of all I apologise in case the following has been asked in previous posts but I am not able to find the solution to my problem.
I have a data frame with a list of SNPs, their locations and the information of the Reference (REF) and Alternate (ALT) alleles. In addition I have information about the phased genotypes for a list of various individuals.
Example
SNP CHR POS REF ALT ID1 ID2 ID3
rs2754554 1 8656 A C 0|1 0|0 1|1
rs1111786 16 975544 T A 0|0 0|1 1|0
rs986355 7 75987 G T 1|1 0|1 1|1
rs 2256743 21 442324 G C 1|0 0|1 0|1
In the example I have only 4 SNPs and 3 individuals but the list is much larger. I would like to modify the genotype information to be replaced with the corresponding alleles based on the information of the REF and ALT columns:
Desired output:
SNP CHR POS REF ALT ID1 ID2 ID3
rs2754554 1 8656 A C A|C A|A C|C
rs1111786 16 975544 T A T|T T|A A|T
rs986355 7 75987 G T T|T G|T T|T
rs2256743 21 442324 G C C|G G|C G|C
The output is based on my understanding that if it is 0 it means equal to reference while 1 equals to alternate. Any help highly appreciated.
Hi, thanks for your solutions to the problem, sorry the dots are there by mistake...so do not consider them, the rest is correct.thanks!