Hi guys,
I want to generate an ancestral allele from Gallus varius. I have a .vcf file for both the varius and domestic mapped against Galgal4. It is expected that at the same position in both .vcf file, the REF allele will be the same. To proceed, I want to work only on the ALT field. If at the same position in the ALT field between the domestic and GV, the allele are the same then the decision is to select that single allele and if the allele are different, it should do nothing. All the allele selected in the ALT field should be written into a new file, first file.vcf and file.fasta. I put in table below my idea.
Thank you.
Dom GV
Position REF allele ALT allele ALT allele Decision
1 A T T Select T
2 G C C Select C
3 T A G Don’t select
4 C G T,C Don’t select
5 A T T Select T
Looks like you've got the algorithm generally worked out. Do you need help implementing it?
What is the phylogeny of these three Gallus species?
If you mapped against Galgal4, then REF will be the base seen in Galgal4.
Regarding how to code this, I would recommend learning some perl, python or awk. Programming this does not need a complicate algorithm. However, I think it is too complicate to explain through biostars. I would use perl and make three hashes, one with variants from GV , one for domestic, the other for REF sites. Then iterate the hashes, apply the conditions you stated and print the result.