1000Genome Project To Cover Gc-Rich Region?
1
0
Entering edit mode
12.0 years ago
michealsmith ▴ 800

I'm looking for rare variants from whole-genome sequencing data. I found a "rare" SNP in my patient sample which has never been found in any database including latest 1000-Genome and exome sequencing database. However when I check this in other 4 randomly-chosen control whole-genome sequences from 1000G, it turned out within GC-rich region and barely covered by any reads (but in my data, sequencer goes through this GC-rich region resulting good coverage).

Then I would argue I'm not sure if the SNP I found is really rare, or just common one but missed by NGS in 1000G because PCR simply cannot go over the GC-rich region.

But 1000G got huge number of samples and call SNP/indel from this aggregation of samples simultaneously; it'll be almost impossible that one certain region won't be covered by any read, right?

So should I trust 1000Genome SNP/indel database for those GC-rich region?

1000genomes • 2.9k views
ADD COMMENT
2
Entering edit mode

Due to various filtering, 1000g will miss a small fraction of common SNPs, which can hardly be avoided. Checking unfiltered SNPs is a better way to confirm if it is really rare. I do not know if unfiltered are still available.

Don't trust indels. 1000g still have a lot of troubles with them. They are trying hard to improve indel calling.

ADD REPLY
0
Entering edit mode

what is the sequencer you used for your data?

ADD REPLY
0
Entering edit mode

The sequencer is HiSeq2000

ADD REPLY
1
Entering edit mode
12.0 years ago
JC 13k

All sequencing technologies have problems in high GC content regions, so calling variants there is hard. I don't know how are you verifying your variants, but the 1000G VCF reports the total reads used to call a variant (DP=N), so you can filter by threshold. Also, you can check (if your variants are exoninc) in the ESP6500 http://evs.gs.washington.edu/EVS/

The simple way to integrate various sources is with Annovar

ADD COMMENT
1
Entering edit mode

Thanks. I'm using annovar; and I'm right now using both 1000G and ESP6500 for filtering with MAF cutoff 0.01; I would agree with lh3 that we should use unfiltered version because I came across many well-studied common SNP absent from 1000G dabatase probably due to various types of filtering

ADD REPLY

Login before adding your answer.

Traffic: 1611 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6