Downloading Snps From 1000Genomes, For A Given Individual.
3
2
Entering edit mode
13.0 years ago

Hi,

can you suggest a way to download from 1000genomes a set of SNPs (those located , for example, on chromosome 1) belonging to a certain individual?

Related to that, are all the individuals comparable when it comes to the reliability of the variation calling, or is there a subset which is more safe than others due to, say, a better sequencing technology, library preparation, etc...?

I'd appreciate if you could get me started on this.

genome snp samtools • 4.9k views
ADD COMMENT
5
Entering edit mode
13.0 years ago

You can do it with tabix and vcftools.

Isolate SNPs for your region:

tabix -fh ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20101123/interim_phase1_release/ALL.chr12.phase1.projectConsensus.genotypes.vcf.gz 12:2345295-2345295 > genotypes.vcf

Change these values for your region: 12:2345295-2345295.

Isolate Individuals:

vcftools_0.1.4a/perl/vcf-subset -c NA06984,NA069860 genotypes.vcf > temp.vcf

NA06984,NA069860 are your individuals.

Extract in others format:

vcftools_0.1.4a/cpp/vcftools --vcf temp.vcf --plink --out Output

You can change the output format: vcftools.sourceforge.net/options.html

ADD COMMENT
0
Entering edit mode

if i want all the SNPs from chr12, then what positions should i mention ?

ADD REPLY
0
Entering edit mode

Probably the length of chr12 from the 1000 genome browser, but it will take time. Maybe you can download all the data for chr12 and use this file as input.

ADD REPLY
2
Entering edit mode
13.0 years ago
Pascal ★ 1.5k

Have a look also to the 1KG project page explaining how to get a subsection of VCF file (e.g. on a given chromosome, loci range).

Link: how-do-i-get-sub-section-vcf-file

Regards.

ADD COMMENT
0
Entering edit mode

The data slicer also allows you to pick a particular individual or population http://www.1000genomes.org/data-slicer

ADD REPLY
1
Entering edit mode
13.0 years ago

You can download the SNP files in vcf format from their [?]ftp[?].

You can also view the SNP data from their 2010 study using their [?]genome browser[?]

You can read about how they did the SNP calling [?]here[?]

It was such a collaborative project with sequencing data from various sources, I don't think any strong case can be made as to which individual has much better data than another.

ADD COMMENT
1
Entering edit mode

Data quality does vary between individuals. Some are sequenced with old crappy 35bp reads to 4X coverage, while some with 100bp HiSeq reads to over 10X.

ADD REPLY

Login before adding your answer.

Traffic: 1924 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6