Hello, I am interested to use to D-stat to test for ancient admixture or ILS between different species. I would like to use ANGSD software and Dfoil too (for direction of gene flow).
I have question about the type of input data needed.
Most studies run D and Dfoil statistics using whole genome sequence data or RADseq, but can we also use D-stat with exomes captured data? I have thousands of CDSs sequences, does it make sense to concatenate them (~ 3 Mbp) and use it for D-stat with a sliding windows?
For the input, I mapped reads to a reference genome which will be the outgroup in my D-stats test; and got my sorted mapped bam files. My question is do I need to run SNP calling (e.g. with GATK with base recalibration, and remove InDel and duplicate reads) before to use D-stat in ANGSD?
Thanks for advices.