Question

need a database for Variants with less freq than SNPs

0

Entering edit mode

8.7 years ago

Amirosein ▴ 70

Hi I'm working on a project where we need to check patients DNA sequences (for a single gene), and find all differences to ref seq for that and then check the effects. first I'll compare the sequence to it's ref seq and find differences. then i need to check if this variations were reported before.

i know that some variants witch have more than 1% frequency were reported and i can use dbSNP for this purpose, but i need more. i want to check if the single nucleotide variation between the reference and my patient sequence is a rare variant reported before or not? and if reported, is there any information on effects?

i heard about gene-specific databases some, is there any other valuable database? better to be available for programming. (e.g in R)

my sequences comes from sanger sequencing and .ab1 files are available

thanks all

SNP variants • 1.7k views

ADD COMMENT • link updated 8.7 years ago by DG 7.3k • written 8.7 years ago by Amirosein ▴ 70

score 1 · Answer 1 · 2016-03-17

1

Entering edit mode

8.7 years ago

DG 7.3k

The standard databases for variant frequency you need to be checking against given the current "state of the field" at the very least are 1000 Genomes, The Exome Sequencing Project data (Exome variant server), and The Exome Aggregation Consortium data (ExAc). ExAc in particular contains data from over 60,000 whole exome sequencing samples from a number of populations. There is also the UK10K data coming online, and many more on the horizon. A program that will make annotating with these sources easy is GEMINI, which will also put the data into a small database that can be easily searched given a variety of parameters. Otherwise you can get the individual VCF files for many of these projects and annotate using other tools as you desire.

ADD COMMENT • link 8.7 years ago by DG 7.3k

0

Entering edit mode

i'm doing this for a hospital, they're now checking them handy online :D but we believe that we need more databases than what you said, for example 1000 genomes and ExAc projects are very well but there exist some gene specific databases having more info on some regions, isn't there any database that has them all together?

ADD REPLY • link 8.7 years ago by Amirosein ▴ 70

0

Entering edit mode

The population databases I listed are primarily used for filtering purposes based on allele frequencies. Unfortunately data silos are a thing, so you can't get all databases easily in one place for the various locus specific databases for instance. You do probably want to annotate with ClinVar at least as it is probably the most comprehensive clinical database of variants out there. HGMD Pro is good but I don't know if you have a subscription whether you can get a local download. GEMINI does annotate with ClinVar and OMIM.

ADD REPLY • link 8.7 years ago by DG 7.3k

0

Entering edit mode

Just saw that you are working with Sanger sequencing output. Likely if you want to automatically annotate your variants with all of this information you'll need to code something custom yourself. Identify all of the databases you can access and try and download the data, set things up into a custom local database. You'll also want to set up some sort of automated process or reminder for updating them as well.

Or you can convert your Sanger output into some sort of BED or VCF format containing just the variant(s) and not the whole sequence file and use that in conjunction with GEMINI, snpEff, VEP, annoation programs in R, or whatever works best for you.

ADD REPLY • link 8.7 years ago by DG 7.3k