I have a list of genes and I need to extract all the reported mutations in those genes. It would be really great if someone can suggest me the best way to do this using open source tools with high accuracy.
I have a list of genes and I need to extract all the reported mutations in those genes. It would be really great if someone can suggest me the best way to do this using open source tools with high accuracy.
You can try BioMart with all its different options of access (web, APIs, R package on bioconductor). The variation dataset of BioMart allows you to enter a list of Ensembl Gene IDs (you will need to convert to those, if your genes are not already in that format) as filters, than as attributes you can choose variant IDs and phenotype description. You will get the mutations from COSMIC and HGMD plus the SNPs and short indels from dbSNP.
Alternative, you can use the Open Targets batch search tool to get the diseases (pathways, drugs) associated with your genes and from further exploration of the results you can find the mutations (or variants) linking those genes to their associated diseases (such as the skeletal disorders you are interested in).
Since you are familiar with Python, perhaps the Open Targets and programmatic access will be the choice for you: get the diseases and the association scores for your genes depending on the genetic variants (or mutations) they carry. We have a Python client for easier communication with our REST API.
If this is useful but you get stuck along the way, just shout.
I will explore all these options. Thank you so much..
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I would be more specific with the goal of your task. What kind of mutations you want to extract? All variations or only those related to diseases? Any diseases? Any kind of mutation (SNP, indels, structural variants, etc.)? Also, it's worth to know the scale of the task (how many genes you have), and what kind of skills you have (web, R/Bioconductor, Python). As of now, it seems to be too generic to answer in a helpful way.
I want to extract all the SNVs and Indels which are known to cause different group of skeletal disorders. I have the list of genes for this conditions approx 200 genes. I am good in perl and python but I don't know writing a script will help me to get the accurate results.
I doubt any script will allow you do to accurate web scraping in one step. I would suggest you scrape the web for as much as you can find in the first instance, and then devise a second automated step to curate your database as best as possible. You’ll almost certainly have to check at least some of it by hand though.