Question

How To Retrieve Coding Snps Typed Only In 1000G Data

2

Entering edit mode

13.7 years ago

Sarah Tyrell ▴ 20

Good afternoon,

I have a list of 100 genes for which (in fact, for one of their transcripts in particular) I would like to get "synonymous coding" and "non-synonymous coding" SNPs that are observed in 1000G data (n=629).

Moreover, it would be fantastic to somehow extract the heterozygosity status for those SNPs.

I tried the ENSEMBLE 1000G browser, however, there are inconsistencies, that is, some SNPs that appear in the VCF file do not show up in the browser view. In addition, I do not want to mess with the dbSNP but am only interested in the SNPs observed in 1000G.

Any help would be much appreciated.

snp genome non • 3.7k views

ADD COMMENT • link updated 5.0 years ago by Biostar 20 • written 13.7 years ago by Sarah Tyrell ▴ 20

0

Entering edit mode

Do you have the VCF file describing the 1000G variants that you want to use?

ADD REPLY • link 13.7 years ago by Sean Davis 27k

0

Entering edit mode

The inconsistencies you see are probably caused by the fact that there are different 1000genomes releases. In particular, they have published a new one in October 2011, including almost 2000 individuals (http://www.1000genomes.org/announcements/october-2011-integrated-variant-set-release-ichg2011-2011-10-12). Which release are you interested to?

ADD REPLY • link updated 5.6 years ago by Ram 45k • written 13.5 years ago by Giovanni M Dall'Olio 28k

Giovanni M Dall'Olio · Answer 1 · 2011-10-20

1

Entering edit mode

13.6 years ago

Simon P ▴ 10

Sarah,

A pretty direct pipeline should allow you to do so.

Get the chromosomal coordinates of your genes
Extract the SNPs contained in the regions found in 1 (make sure that you use the same genome annotation version)
Use a variant annotation software to annotate the SNPs (IE : ANNOVAR)

ADD COMMENT • link updated 13.6 years ago by Giovanni M Dall'Olio 28k • written 13.6 years ago by Simon P ▴ 10