Hello,
The problem I am having is to reach the Clinical Assertsions Records under the ClinVar variation records. Specifically, I am trying to retrieve the Submission(s) and associated pubmed_ids related to some variants (some people like to call it allele).
If we take the following variant page as an example
http://www.ncbi.nlm.nih.gov/clinvar/variation/48074/
And, if we define a small variation with genomeversion_chromosome_position_refbase_altbase
Clinvar provides several FTP dumps, but for a couple of reasons we do not prefer to use them, rather try to fetch a json output from eSummary/eFetch or eSearch
For the above variant definition, one can use the eSearch API, by giving the chromosomal coordinates and retrieve the variationID as follows (this is a real variation record, so following the links would work)
the variation ID which is fetched from the above JSON will then be used to retrieve the variation record from the clinvar as follows: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=clinvar&id=48074&retmode=json
The problem is that, the latter will give me supporting_submissions
dictionary key, which contains SCV00000567 like IDs, but nothing else. And, this is where I am stuck now, how can I go forward to fetch the details of these submissions, such as pathogenicity, variation or the supporting Pubmed Ids?
Thanks for any comments, suggestions
Just wanted to point out that the FTP VCFs and the variants on the clinvar webpage are not in sync and the FTP files have much less variants in them. If you did not want to use the NCBI downloads, you can also download from UCSC, which I think is much more comprehensive, better curated and in sync with the web version and gives all the info you need.
EDIT: I added my reply at the bottom, where it belongs, not as a side comment.