Entering edit mode
8.4 years ago
mamillerpa
▴
40
How would I write an SQL query to retrieve all of the frequencies, form all providers, such as seen on a page like http://useast.ensembl.org/Homo_sapiens/Variation/Population?db=core;r=16:89919209-89920209;v=rs1805007;vdb=variation;vf=1232953 ?
Connection info: http://useast.ensembl.org/info/data/mysql.html
I'm especially interested in the 1000 genomes frequencies.
I tried this (among many other queries), but it doesn't seem to include the 1000 Genomes Phase 3 data (for example, the frequency for '1000GENOMES:phase_3:TSI', population ID 373537, should be 0.023 )
SELECT * FROM allele
left join population
on allele.population_id = population.population_id
left join variation
on allele.variation_id = variation.variation_id
where variation.name = 'rs1805007'
- Do I need to learn about subsnps?
- Is the data masked for privacy? There sure are a lot of NULLs.
- Do I just need to keep staring at the Ensembl ERD? Like maybe I need to look in some of the sample or individual tables? http://useast.ensembl.org/info/docs/api/variation/variation_schema.html
Please do not post identical questions to Ensembl dev and to BioStars. This is stated in the dev posting guidelines and in various guidelines to forum posting found on BioStars. Ensembl provides user support both via BioStars and via the Ensembl dev list and we ask users to please not send the same issue to both. All of the questions eventually come to the same set of people and it is generally a better use of time to answer the questions only once.
Sorry about that, Emily. Am I supposed to withdraw one of the questions now?
I just looked at the BioStars guidelines and do see a request for no crossposting. I don't see it on the Ensembl dev list guidelines page you linked.