[Error] getting fasta file from 1000genomes
0
0
Entering edit mode
10.4 years ago
Mari ▴ 30

Hi,

I'd like to get a fasta file of haplotypes from 1000 genomes.

I ran the commands below:

http://www.1000genomes.org/faq/are-there-any-fasta-files-containing-1000-genomes-variants-or-haplotypes

tabix -h ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/ALL.chr17.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz 17:1471000-1472000 | perl vcf-subset -c HG00098 | bgzip -c > HG00098.vcf.gz
tabix -p vcf HG00098.vcf.gz
cat ref.fa | vcf-consensus HG00098.vcf.gz > HG00098.fa

but got error massages,

[tabix] the index file is older than the vcf file. Please use '-f' to overwrite or reindex.
Can't open perl script "vcf-subset": No such file or directory

And when I tried to install vcf-tools, I got

E: Unable to locate package vcf-tools

How can I run these commands successfully?

My machine is Ubuntu 64 bit on Win7.0.

genome gene software-error • 4.0k views
ADD COMMENT
0
Entering edit mode

Can you give some more information about how you tried to install vcf-tools?

ADD REPLY
0
Entering edit mode

Matt,

Thank you. I tried to run this:

marie@ubuntu:~/Downloads$ sudo apt-get install vcf-tools
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package vcf-tools
ADD REPLY
1
Entering edit mode

vcftools is not in the distrib software repo. You have to install it 'manually'. See here.

ADD REPLY
0
Entering edit mode

Thank you Phil, maybe I move ahead a bit but got another error.

marie@ubuntu:~/Downloads/vcftools_0.1.8a/perl$ tabix -h ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/ALL.chr17.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz 17:1471000-1472000 | perl vcf-subset -c HG00098 | bgzip -c > HG00098.vcf.gz
[tabix] the index file is older than the vcf file. Please use '-f' to overwrite or reindex.
Broken VCF header, no column names?
 at /usr/share/perl5/Vcf.pm line 171
    Vcf::throw('Vcf4_1=HASH(0x160ee48)', 'Broken VCF header, no column names?') called at /usr/share/perl5/Vcf.pm line 845
    VcfReader::_read_column_names('Vcf4_1=HASH(0x160ee48)') called at /usr/share/perl5/Vcf.pm line 589
    VcfReader::parse_header('Vcf4_1=HASH(0x160ee48)') called at vcf-subset line 119
    main::vcf_subset('HASH(0x16051c8)') called at vcf-subset line 12
ADD REPLY
0
Entering edit mode

I think you can safely use the -f flag that tabix is warning you about. Currently it looks like tabix is not returning any data - you can check this by simply omitting everything after and including your first pipe character.

ADD REPLY
0
Entering edit mode

Thank you Matt, I added -f like this and the program ran:

tabix -h ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/ALL.chr17.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz 17:1471000-1472000 -f

(Initially I tried to run the code below and it didn't work)

tabix -h -f ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/ALL.chr17.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz 17:1471000-1472000

but at the last line, I got many errors:

ADD REPLY

Login before adding your answer.

Traffic: 2365 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6