Using Tabix And Vcf Tools To Get Cnv / Sv Frequencies From 1000 Genomes Data
1
2
Entering edit mode
13.1 years ago
Ryan D ★ 3.4k

I read this excellent post by Stephen on getting data from 1000 genomes with tabix, but it seems to not be working for me. I use tabix to get the data in the following manner:

tabix -fh ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/ALL.chr22.phase1_integrated_calls.20101123.snps_indels_svs.genotypes.vcf.gz 22:1000000-10000000 > ~/delete.vcf

It gets a vcf file, but the file only seems to have headers, no variants info... like so:

##INFO=<ID=SNPSOURCE,Number=.,Type=String,Description="indicates if a snp was called when analysing the low coverage or exome alignment data">
##reference=GRCh37
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  HG00096...

So after running vcftools, obviously I get nothing since there is not genotype data:

vcftools --vcf ~/delete.vcf --freq --out ~/delete.txt

VCFtools - v0.1.7
(C) Adam Auton 2009

Parameters as interpreted:
        --vcf /home/delahar/delete.vcf
        --freq
        --out /home/delahar/delete.txt

Reading Index file.
File contains 0 entries and 1092 individuals.
Applying Required Filters.
After filtering, kept 1092 out of 1092 Individuals
After filtering, kept 0 out of a possible 0 Sites
Error:No data left for analysis!

I'm guessing this is an issue with the way I'm using tabix. Ultimately I want to get the fields that have VT=SV in their column. So extra help on getting that would be greatly appreciated.

Thanks,

Rx

genome tabix vcftools cnv • 5.4k views
ADD COMMENT
0
Entering edit mode

I have heard from 2 people that they can't get Tabix to retrieve data from the internet...

ADD REPLY
4
Entering edit mode
13.1 years ago
Adam ★ 1.0k

Your tabix command is returning no data as there are no SNPs in that region of chr22. The first SNPs on chr22 are around the 16Mb mark. Try:

tabix -fh <ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/ALL.chr22.phase1_integrated_calls.20101123.snps_indels_svs.genotypes.vcf.gz> 22:1000000-16052250

And see if that returns some data.

ADD COMMENT
0
Entering edit mode

Thanks. I finally figured it out. You were exactly right.

ADD REPLY

Login before adding your answer.

Traffic: 1889 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6