Get Coding Snps For 1000 Genomes Data
2
1
Entering edit mode
13.4 years ago
Korban ▴ 10

Edit. How can I get the coding SNPs (coding synonymous and non-synonymous polymorphisms) for a particular gene (say, BRCA1) from 1000 genomes data?

BioMart provides a nice interface to 1000 genomes data, but it just takes forever to output. Is there an alternative where I can simply look for the coding SNPs by providing the gene id or the corresponding genomic region?

Thanks

genome gene snp retrieval • 4.5k views
ADD COMMENT
0
Entering edit mode

Why would you want to retrieve this from 1000 Genomes data rather than Ensembl? Do you want to retrieve reads mapping to a gene or the sequence itself?

ADD REPLY
0
Entering edit mode

Hi Daniel, I want to check for the polymorphisms in a gene of interest.

ADD REPLY
2
Entering edit mode
13.4 years ago
Karl ▴ 350

To find polymorphisms in a gene of interest, I would get the gene's general region as chromosome and start/stop location (plus a kilobase or ten on each end for regulatory regions). Then go to Ensembl or Biomart or dbSNP and ask for known variants in that region.

ADD COMMENT
0
Entering edit mode

You can also search by gene name / symbol in Ensembl (e.g. 'BRCA2'). To get a list of the variants in a gene go to the 'Variant Table' page (e.g. http://www.ensembl.org/Homo_sapiens/Gene/Variation_Gene/Table?g=ENSG00000139618;r=13:32889611-32973805). You can configure the page using [Configure this page] in the side menu. Note that by default we show only variants that are located in the exons and up till 100 bp away from the exons. To see all intronic variants as well as the variants up to 5 kb up- and downstream of the gene, set 'Intron Context' on the configuration page to 'Full Introns'.

ADD REPLY
1
Entering edit mode
13.4 years ago
Prateek ★ 1.0k

You could use one of the SIFT tools to filter out non-coding variants. The name of the tool is "Restrict to coding variants" here is the link http://sift.bii.a-star.edu.sg/www/SIFT_intersect_coding_submit.html

Remember to choose the genome build and the input format. It can take in multiple formats like VCF4, pileup, MAQ or SIFT's own simple comma delimited formats.

ADD COMMENT

Login before adding your answer.

Traffic: 1806 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6