I am working on an automatic extraction tool that should list all SNP published for a given disease ( all associations whether negative or positive). This is an attempt to help scientists target certain SNPs or loci for a specific disease...
I am researching the feasibility of the system so would such a system be helpful? and how else do they identify SNPs ?
I may be looking at this wrong but if I was in the scenario that I had a sequence from a diseased patient, then identified the SNP's in this sequence I would then use snpeff to annotate the predicted effect of the SNP (synonymous/non-synonymous). This would give me a loci for the gene. I could then go to a SNP specific database to look for any SNP's related to a disease (theres highly specific databases for these studies e.g sheephapmap). If theres nothing published, I'd then go onto Proteomics. To see where in my protein the SNP is effecting and its possible disease mechanism. If its synonymous or in a non coding region I'd then go to models on a population scale i.e is there a relationship between the presence of the SNP and the presence of the disease in the population.
If I'm right in saying what your trying to do it to allow someone to search a disease and list the SNP's. This would pose a rather backward approach to identifying new SNP's. That is, from your search tool you would only return SNP's which we already know about therefore if you used this criteria to search against your sequence data it would not return new SNP's. What most people do is identify your SNP's (.vcf file) annotate and then search for genes/diseases etc. This means you'll have a list of known SNP's and new SNP's from your sequence data and not introducing prejudice when looking for new SNP's.
In short, I don't think your system would be helpful in identifying new SNP's
your first paragraph can be replaced by one tool, namely VEP from Ensembl. Except for the disease presence in a population. I wonder where you can get such information from?
On the other hand, @nohaseddik: Inmho, your approach sounds more like GWAS, clinVar & COSMIC. Personally, I think your approach is valid as long as you bring something new other than what the formers do & how you are going to keep it up-to-date.
In principle yes! However, one can extend the information retrieved by VEP to include other databases (where the mutation was found), & can include the status of mutation on the protein level too.
I was (marginally) involved in a somewhat similar study. The question in that case was: Imagine I get 20 positive findings from a GWAS, which one should I prioritize for follow up?
It turned out that the prior knowledge that were most helpful were (in order):
1) The SNP was previously identified by MORE than one GWA study for the same (or related) phenotype,
2) The SNP is in a functional protein domain
3) The SNP has been associated to the phenotype in functional model.
The paper is freely available here and if I well remember we also published scripts for doing the information retrieval steps.
Hi,
your first paragraph can be replaced by one tool, namely VEP from Ensembl. Except for the disease presence in a population. I wonder where you can get such information from?
On the other hand, @nohaseddik: Inmho, your approach sounds more like GWAS, clinVar & COSMIC. Personally, I think your approach is valid as long as you bring something new other than what the formers do & how you are going to keep it up-to-date.
VEP does the same as snpEff, so it just depends on what other tools you are using (for compatibility).
In principle yes! However, one can extend the information retrieved by VEP to include other databases (where the mutation was found), & can include the status of mutation on the protein level too.
Oh cool, thats good to know, I may have to start using VEP!