I read this excellent post by Stephen on getting data from 1000 genomes with tabix, but it seems to not be working for me. I use tabix to get the data in the following manner:
tabix -fh ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/ALL.chr22.phase1_integrated_calls.20101123.snps_indels_svs.genotypes.vcf.gz 22:1000000-10000000 > ~/delete.vcf
It gets a vcf file, but the file only seems to have headers, no variants info... like so:
##INFO=<ID=SNPSOURCE,Number=.,Type=String,Description="indicates if a snp was called when analysing the low coverage or exome alignment data">
##reference=GRCh37
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT HG00096...
So after running vcftools, obviously I get nothing since there is not genotype data:
vcftools --vcf ~/delete.vcf --freq --out ~/delete.txt
VCFtools - v0.1.7
(C) Adam Auton 2009
Parameters as interpreted:
--vcf /home/delahar/delete.vcf
--freq
--out /home/delahar/delete.txt
Reading Index file.
File contains 0 entries and 1092 individuals.
Applying Required Filters.
After filtering, kept 1092 out of 1092 Individuals
After filtering, kept 0 out of a possible 0 Sites
Error:No data left for analysis!
I'm guessing this is an issue with the way I'm using tabix. Ultimately I want to get the fields that have VT=SV in their column. So extra help on getting that would be greatly appreciated.
Thanks,
Rx
I have heard from 2 people that they can't get Tabix to retrieve data from the internet...