Question

How to polarize ancestral versus derived alleles?

1

Entering edit mode

7.1 years ago

Alice ▴ 320

Hello biostars,

I am trying to get into popgen analysis in angsd and currently working on some summary statistics. This made me to think a little bit more about allele polarization for D-stat, f3, f4 stats, SFS and other analyses.

For D-statistic estimate, ANGSD is asking for ancestral fasta file. However, I am not sure what kind of fasta it should be. If all my BAMs are aligned, let's say, to hg19, but an outgroup is chimp, should I provide a reference PanTro genome? In this case, coordinates are different: BAMs are aligned to hg19.

Or, should I convert PanTro bam file aligned to hg19 into some kind of consensus fasta? Or, finally, I can realign all bams on chimp genome, and then use these realigned bams together with PanTro for the analysis. What is the best way to do that?

I guess it would be better to use a real outgroup to polarize alleles (especially when doing SFS), but some papers (as this one) use non-outgroup reference and do that using folded SFS with no problems.

In general, is there an optimal strategy for this kind of popgen decision making?

next-gen SNP • 4.2k views

ADD COMMENT • link updated 3.1 years ago by qing • 0 • written 7.1 years ago by Alice ▴ 320

0

Entering edit mode

Apologies that no-one else has responded. It is a very specific type of analysis that you are aiming to do, but very interesting I must admit.

From what I can see, ANGSD could accept a BAM aligned to hg19 and another aligned to the Pan troglodytes, however, this may not necessarily be the correct way to run the program.

I noticed this recent study, which appeared to run ANGSD separately on 3 different species: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4788117/

ADD REPLY • link 7.1 years ago by Kevin Blighe 88k

0

Entering edit mode

Thanks! Yeah, it is not an easy question. I ended up aligning chimp on hg19. Other part of my question is very theoretical, I looked through the literature to see what people do - and they do whatever data allows. Some for examples do not have a sequenced outgroup so they just use a reference.

ADD REPLY • link 7.1 years ago by Alice ▴ 320

0

Entering edit mode

Hi Alice, I am doing unfolded SFS, I didn't know how to use a real outgroup to polarize alleles, can you give me some suggestions?

ADD REPLY • link 3.1 years ago by qing • 0