Changing the reference allele
0
0
Entering edit mode
4.2 years ago
evoecogen ▴ 30

Hello,

I have a population genomics dataset including 100s of individuals of species A, mapped to the reference genome of species A. I also have several individuals from a few outgroup species included. (This resembles a typical human populations dataset with a few chimps, gorillas and orangutans.) Currently my reference allele is from one population of species A. What tool can I use to determine the ancestral alleles for A and recode the VCF? The goal is to determine and compare the history of certain alleles between all populations of A. Thanks!

population genomics reference ancestral allele • 1.0k views
ADD COMMENT
1
Entering edit mode

I don't think this is a valid operation. A VCF file should show differences from one specific reference genome. You're better off creating separate per-population VCF files. Also, remember that the ALT allele need not be the minor allele - it's just the allele seen in that individual. For this reason, sub-groups that share a more common (>0.5 frequency for that subgroup) ALT allele are definitely a known observation. If they're the same species, they do not have to share a REF allele that is their "normal".

For example, certain alleles in the human genome are seen more commonly in Asian or Norwegian or African populations, but they're still ALT because the reference genome was not constructed with them.

ADD REPLY
0
Entering edit mode

I have definitely seen this done in human popgen papers, except they do not describe the specifics! It should be possible to determine the ancestral allele for A from the outgroups. My problem right now is that the reference comes from a random population of A, at the edge of its distribution. So the populations that are most distant geographically from the reference appear most derived... BTW I suspect that doing individual VCFs per population would be much less accurate ( I use GATK, which encourages calling genotypes of all samples together).

ADD REPLY
1
Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. SUBMIT ANSWER is for new answers to original question

ADD REPLY
0
Entering edit mode

I use GATK, which encourages calling genotypes of all samples together

GATK is best suited for human analyses. You seem to be working on a non-model organism. Following GATK's Best Practices is not the best course of action here. Remember, GATK assumes that your reference genome is as stable as human ref genomes. You cannot joint genotype samples with different ref genomes.

ADD REPLY

Login before adding your answer.

Traffic: 2357 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6