Snps Present In Dbsnp But Absent In 1000Genome And Esp Database
2
2
Entering edit mode
12.1 years ago
michealsmith ▴ 800

Which database would you like to choose to filter those common SNPs and find out rare ones which may be disease-causing? I used to apply 1000Genome as well as ESP (exome sequencing project) database. (ESP is derived from exome data of about 6500 individuals, which is fairly large enough.) Also both databases contains MAF. I don't initially use dbSNP, because it simply contains everything thus less permissive.

But I find sth. interesting today that, there are some SNPs, for example rs73979896: http://genome.ucsc.edu/cgi-bin/hgc?hgsid=308088757&c=chr17&o=21319207&t=21319208&g=snp135Common&i=rs73979896

THis SNP, nonsynonymous, present in dbSNP-135, with a very high MAF=49% derived from around 2204 alleles; however, it's absent from either 1000Genome (2012-Apr) or ESP-6500 (The latest version with exome data from 6500 individuals)! If this is really a true SNP with MAF=49%, how can it NOT be captured in ESP with information of 6500 ppl? This is very confusing.

dbsnp • 5.3k views
ADD COMMENT
4
Entering edit mode
12.1 years ago
Laura ★ 1.8k

This snp was part of 1000 genomes original call set

tabix -h http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/analysis_results/input_call_sets/ALL.wgs.union_vqsr2b.20101123.snps.low_coverage.sites.vcf.gz 17:21319208-21319208

If you look in the reference genome this is a patched part of the reference plus this site does not fall within our strict accessibility mask

http://browser.1000genomes.org/Homo_sapiens/Location/View?r=17%3A21319208-21319208#r=17:21316709-21321708

ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/analysis_results/supporting/accessible_genome_masks/README_20120824_accessibility_mask_bed_files

It might be a false negative on our part, it might also be a false positive on the other groups who have called it

ADD COMMENT
0
Entering edit mode

thanks. Curious why this SNP is filtered later? Because of low coverage? Also I checked several bam files of unrelated individuals NOT from 1000Genome, this SNP does exist. However if checking bam files from 1000G, for example NA12878, NA12889, this SNP is not there. Problem is, this SNP has MAF=50%; it's common allele, not rare. Different groups should be consistent for common SNPs, right? That's where I'm confused.

ADD REPLY
0
Entering edit mode

Also ,what's special about genome patch in terms of calling variants? Is genome patch supposed to be regions holding many mutations?

ADD REPLY
0
Entering edit mode

Patched regions of the assembly are more likely to be regions with highly repetitive sequences or that are otherwise hard to assemble and thus to map to. That could be part of the issue here as well.

I do use dbSNP as well as 1000G and ESP MAF's but I tend to stick to older dbSNP versions. For newer versions I would want to go by the estimated MAF and not simple presence/absence, as I've seen entries in dbSNP with no population data at all and only seen in say one individual.

ADD REPLY
1
Entering edit mode
12.1 years ago
JC 13k

I don't know why 1000G or ESP are missing this SNP, this could be a low coverage area or many other factors, but Kaviar has it: http://db.systemsbiology.net/kaviar/cgi-pub/Kaviar2.pl?chr=chr17&pos=21319207

ADD COMMENT

Login before adding your answer.

Traffic: 2426 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6