Hi everyone,
This is meant to be more of a discussion but I was wondering about people's thoughts regarding the increasing use of the ESP6500 data for filtering of variants in the same way we use the 1000Genomes data. I certainly use both in terms of minor allele frequency filters when the populations are relevant but the more I think about it the trickier it is. While the bulk of individuals aren't expected to be harbouring a severe Mendelian disease, some of the individuals and pedigrees are explicitly selected because they do have a family history of a disorder that is probably Mendelian. Some of these cohorts are small, and the likelihood of all individuals having the same disease-causing mutation even smaller, so the expected minor allele frequencies won't be too inflated, but it is a possibility. The individuals from the hematological malignancies cohort for instance have some individuals with quite rare, but very severe, disorders that are rare in families, like MDS.
Given that all of the phenotype information is only available through dbGaP, and you need to request permission for each study individually, this theoretically is somewhat problematic and could lead people astray. More and more I think as part of these large studies at least some basic phenotype and demographic information really needs to be available and readily accessible.