Question

1000Genome Project To Cover Gc-Rich Region?

0

Entering edit mode

12.0 years ago

michealsmith ▴ 800

I'm looking for rare variants from whole-genome sequencing data. I found a "rare" SNP in my patient sample which has never been found in any database including latest 1000-Genome and exome sequencing database. However when I check this in other 4 randomly-chosen control whole-genome sequences from 1000G, it turned out within GC-rich region and barely covered by any reads (but in my data, sequencer goes through this GC-rich region resulting good coverage).

Then I would argue I'm not sure if the SNP I found is really rare, or just common one but missed by NGS in 1000G because PCR simply cannot go over the GC-rich region.

But 1000G got huge number of samples and call SNP/indel from this aggregation of samples simultaneously; it'll be almost impossible that one certain region won't be covered by any read, right?

So should I trust 1000Genome SNP/indel database for those GC-rich region?

1000genomes • 2.9k views

ADD COMMENT • link updated 3.7 years ago by Ram 44k • written 12.0 years ago by michealsmith ▴ 800

2

Entering edit mode

Due to various filtering, 1000g will miss a small fraction of common SNPs, which can hardly be avoided. Checking unfiltered SNPs is a better way to confirm if it is really rare. I do not know if unfiltered are still available.

Don't trust indels. 1000g still have a lot of troubles with them. They are trying hard to improve indel calling.

ADD REPLY • link updated 3.7 years ago by Ram 44k • written 12.0 years ago by lh3 33k

0

Entering edit mode

what is the sequencer you used for your data?

ADD REPLY • link updated 3.7 years ago by Ram 44k • written 12.0 years ago by Raony Guimarães ★ 1.4k

0

Entering edit mode

The sequencer is HiSeq2000

ADD REPLY • link updated 3.7 years ago by Ram 44k • written 12.0 years ago by michealsmith ▴ 800

Ram · Answer 1 · 2012-11-28

1

Entering edit mode

12.0 years ago

JC 13k

All sequencing technologies have problems in high GC content regions, so calling variants there is hard. I don't know how are you verifying your variants, but the 1000G VCF reports the total reads used to call a variant (DP=N), so you can filter by threshold. Also, you can check (if your variants are exoninc) in the ESP6500 http://evs.gs.washington.edu/EVS/

The simple way to integrate various sources is with Annovar

ADD COMMENT • link updated 3.7 years ago by Ram 44k • written 12.0 years ago by JC 13k

1

Entering edit mode

Thanks. I'm using annovar; and I'm right now using both 1000G and ESP6500 for filtering with MAF cutoff 0.01; I would agree with lh3 that we should use unfiltered version because I came across many well-studied common SNP absent from 1000G dabatase probably due to various types of filtering

ADD REPLY • link updated 3.7 years ago by Ram 44k • written 12.0 years ago by michealsmith ▴ 800