Is there any documented information on how many of the ESP/1000 genomes samples were included in the ExAC data release? I was under the impression that all samples were included but when I'm trying to annotate a few SNPs I can see some discordance in the allele frequencies.
For example I see this Exonic SNP with 0.119 allele frequency in 1000 genomes Phase 3 dataset but this cannot be found in ExAC data
5 131705587 rs13180043 C T 100 PASS AF=0.11901
Another example, present in both ESP & 1000 genomes but not in ExAC
1 156108976 rs7339 G C . PASS DBSNP=dbSNP_52;EA_AC=246,2936;AA_AC=561,823;TAC=807,3759; 1 156108976 rs7339 G C 100 PASS AF=0.185304;
I'd like to know preferably what kind of overlap exists between these 3 population sets and if possible what kind of capture regions were used for ExAC data.
Is there any source to your assumption that all 1000g/ESP samples were included in ExAC? The About page speaks of an analysis from scratch, which would imply that the results are independent of 1000g or ESP.
Check under contributing projects: http://exac.broadinstitute.org/about
Even if they did variant calling from the scratch if you include a sample set, you expect to see a SNP with high enough allele frequency in the population.
That makes sense. I guess you could always email them for specific details or check if they have a preprint that you could read.