Hi I'm working on a project where we need to check patients DNA sequences (for a single gene), and find all differences to ref seq for that and then check the effects. first I'll compare the sequence to it's ref seq and find differences. then i need to check if this variations were reported before.
i know that some variants witch have more than 1% frequency were reported and i can use dbSNP for this purpose, but i need more. i want to check if the single nucleotide variation between the reference and my patient sequence is a rare variant reported before or not? and if reported, is there any information on effects?
i heard about gene-specific databases some, is there any other valuable database? better to be available for programming. (e.g in R)
my sequences comes from sanger sequencing and .ab1 files are available
thanks all
i'm doing this for a hospital, they're now checking them handy online :D but we believe that we need more databases than what you said, for example 1000 genomes and ExAc projects are very well but there exist some gene specific databases having more info on some regions, isn't there any database that has them all together?
The population databases I listed are primarily used for filtering purposes based on allele frequencies. Unfortunately data silos are a thing, so you can't get all databases easily in one place for the various locus specific databases for instance. You do probably want to annotate with ClinVar at least as it is probably the most comprehensive clinical database of variants out there. HGMD Pro is good but I don't know if you have a subscription whether you can get a local download. GEMINI does annotate with ClinVar and OMIM.
Just saw that you are working with Sanger sequencing output. Likely if you want to automatically annotate your variants with all of this information you'll need to code something custom yourself. Identify all of the databases you can access and try and download the data, set things up into a custom local database. You'll also want to set up some sort of automated process or reminder for updating them as well.
Or you can convert your Sanger output into some sort of BED or VCF format containing just the variant(s) and not the whole sequence file and use that in conjunction with GEMINI, snpEff, VEP, annoation programs in R, or whatever works best for you.