Question

How to process 1000GP data

0

Entering edit mode

10.1 years ago

kumbarov ▴ 10

I am willing to bulk process the 1000GP Y-SNP data for some projects of mine. I've downloaded the ALL.chrY.phase3_integrated.20130502.genotypes.vcf file and I would like to extract the data for only some samples or remove some samples. What will be the easiest way? Even better, is there a readily available script to import this sort of data into a database? I am new to this, so any advice on working with this sort of data is welcome.

next-gen SNP snp • 2.8k views

ADD COMMENT • link updated 2.9 years ago by Ram 45k • written 10.1 years ago by kumbarov ▴ 10

Ram · Accepted Answer · 2015-03-18

4

Entering edit mode

10.1 years ago

Pierre Lindenbaum 166k

using bcftools

$ curl -s  "ftp://ftp-trace.ncbi.nih.gov//1000genomes/ftp/release/20130502/ALL.chrY.phase3_integrated_v1a.20130502.genotypes.vcf.gz" | gunzip -c | bcftools view --samples NA19455,NA20291 -

(...)
#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    NA19455NA20291
Y    2655180    rs11575897    G    A    100    PASS    AA=G;AC=0;AF=0.0178427;AN=2;DP=84761;NS=1233;AMR_AF=0;AFR_AF=0;EUR_AF=0;SAS_AF=0;EAS_AF=0.0451;VT=SNP;EX_TARGET    GT    0    0
Y    2655471    .    A    C    100    PASS    AA=A;AC=0;AF=0.00405515;AN=2;DP=72067;NS=1233;AMR_AF=0;AFR_AF=0;EUR_AF=0;SAS_AF=0;EAS_AF=0.0102;VT=SNP;EX_TARGET    GT    0    0

(...)

ADD COMMENT • link updated 2.9 years ago by Ram 45k • written 10.1 years ago by Pierre Lindenbaum 166k

1

Entering edit mode

Note, curl not necessary here:

bcftools view --samples NA19455,NA20291 ftp://ftp-trace.ncbi.nih.gov//1000genomes/ftp/release/20130502/ALL.chrY.phase3_integrated_v1a.20130502.genotypes.vcf.gz

ADD REPLY • link updated 2.9 years ago by Ram 45k • written 10.1 years ago by Shane McCarthy ▴ 370

0

Entering edit mode

curl -s "ftp://ftp-trace.ncbi.nih.gov//1000genomes/ftp/release/20130502/ALL.chrY.phase3_integrated_v1a.20130502.genotypes.vcf.gz" | gunzip -c | bcftools view --samples NA19455,NA20291
view: invalid option -- '-'
open: No such file or directory
Segmentation fault (core dumped)

curl -s "ftp://ftp-trace.ncbi.nih.gov//1000genomes/ftp/release/20130502/ALL.chrY.phase3_integrated_v1a.20130502.genotypes.vcf.gz" | gunzip -c | bcftools view -samples NA19455,NA20291
open: No such file or directory
Segmentation fault (core dumped)

ADD REPLY • link updated 2.9 years ago by Ram 45k • written 10.1 years ago by kumbarov ▴ 10

0

Entering edit mode

bcftools -v
bcftools 1.2
Using htslib 1.2.1
Copyright (C) 2015 Genome Research Ltd.
License Expat: The MIT/Expat license
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

ADD REPLY • link updated 2.9 years ago by Ram 45k • written 10.1 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

oh and I forgot to paste the hyphen '-' after the command (read stdin)

ADD REPLY • link 10.1 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

bcftools -v
[main] Unrecognized command.

I am using the version that comes with Ubuntu 14.04. I've downloaded and compiled the htslib and samtools source code and compiled it but I don't get a bcftools binary.

ADD REPLY • link 10.1 years ago by kumbarov ▴ 10

0

Entering edit mode

The version of bcftools that comes with Ubuntu 14.04 is completely broken. I get segfaults all the time. I downloaded the source for htslib, samtools and bcftools from GitHub and compiled it. The above command works perfectly with the upstream version of bcftools.

ADD REPLY • link 10.1 years ago by kumbarov ▴ 10