Question

Snps Present In Dbsnp But Absent In 1000Genome And Esp Database

2

Entering edit mode

12.1 years ago

michealsmith ▴ 800

Which database would you like to choose to filter those common SNPs and find out rare ones which may be disease-causing? I used to apply 1000Genome as well as ESP (exome sequencing project) database. (ESP is derived from exome data of about 6500 individuals, which is fairly large enough.) Also both databases contains MAF. I don't initially use dbSNP, because it simply contains everything thus less permissive.

But I find sth. interesting today that, there are some SNPs, for example rs73979896: http://genome.ucsc.edu/cgi-bin/hgc?hgsid=308088757&c=chr17&o=21319207&t=21319208&g=snp135Common&i=rs73979896

THis SNP, nonsynonymous, present in dbSNP-135, with a very high MAF=49% derived from around 2204 alleles; however, it's absent from either 1000Genome (2012-Apr) or ESP-6500 (The latest version with exome data from 6500 individuals)! If this is really a true SNP with MAF=49%, how can it NOT be captured in ESP with information of 6500 ppl? This is very confusing.

dbsnp • 5.3k views

ADD COMMENT • link updated 12.1 years ago by Laura ★ 1.8k • written 12.1 years ago by michealsmith ▴ 800

score 4 · Answer 1 · 2012-10-31

4

Entering edit mode

12.1 years ago

Laura ★ 1.8k

This snp was part of 1000 genomes original call set

tabix -h http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/analysis_results/input_call_sets/ALL.wgs.union_vqsr2b.20101123.snps.low_coverage.sites.vcf.gz 17:21319208-21319208

If you look in the reference genome this is a patched part of the reference plus this site does not fall within our strict accessibility mask

http://browser.1000genomes.org/Homo_sapiens/Location/View?r=17%3A21319208-21319208#r=17:21316709-21321708

ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/analysis_results/supporting/accessible_genome_masks/README_20120824_accessibility_mask_bed_files

It might be a false negative on our part, it might also be a false positive on the other groups who have called it

ADD COMMENT • link 12.1 years ago by Laura ★ 1.8k

0

Entering edit mode

thanks. Curious why this SNP is filtered later? Because of low coverage? Also I checked several bam files of unrelated individuals NOT from 1000Genome, this SNP does exist. However if checking bam files from 1000G, for example NA12878, NA12889, this SNP is not there. Problem is, this SNP has MAF=50%; it's common allele, not rare. Different groups should be consistent for common SNPs, right? That's where I'm confused.

ADD REPLY • link 12.1 years ago by michealsmith ▴ 800

0

Entering edit mode

Also ,what's special about genome patch in terms of calling variants? Is genome patch supposed to be regions holding many mutations?

ADD REPLY • link 12.1 years ago by michealsmith ▴ 800

0

Entering edit mode

Patched regions of the assembly are more likely to be regions with highly repetitive sequences or that are otherwise hard to assemble and thus to map to. That could be part of the issue here as well.

I do use dbSNP as well as 1000G and ESP MAF's but I tend to stick to older dbSNP versions. For newer versions I would want to go by the estimated MAF and not simple presence/absence, as I've seen entries in dbSNP with no population data at all and only seen in say one individual.

ADD REPLY • link 12.1 years ago by DG 7.3k

score 1 · Answer 2 · 2012-10-31

1

Entering edit mode

12.1 years ago

JC 13k

I don't know why 1000G or ESP are missing this SNP, this could be a low coverage area or many other factors, but Kaviar has it: http://db.systemsbiology.net/kaviar/cgi-pub/Kaviar2.pl?chr=chr17&pos=21319207

ADD COMMENT • link 12.1 years ago by JC 13k