Help On Merging Vcf Files By Using Vcftools
5
1
Entering edit mode
13.9 years ago
Jianfengmao ▴ 320

Dear BioStarers,

I am learning VCFtools by executing VCFtools commands on VCF files in Examples folder of the VCFtools installation path.

Please help me to fix the three problems followed and give me some tips or directions to merging VCF files.

Thanks in Advance.

(1). When I want to merge the three example VCF files, I failed.

commands:
merge-vcf merge-test-a.vcf merge-test-b.vcf merge-test-c.vcf > merg.vcf

results:
[main] fail to load the index file.
The command "tabix -l merge-test-a.vcf" exited with an error. Is the
file tabix indexed?

 at /Users/jianfengmao/programe_files/VCFtools/lib/Vcf.pm line 167
       Vcf::throw('Vcf4_0=HASH(0x10082df18)', 'The command "tabix -l
merge-test-a.vcf" exited with an error....') called at
/Users/jianfengmao/programe_files/VCFtools/lib/Vcf.pm line 1687
       VcfReader::get_chromosomes('Vcf4_0=HASH(0x10082df18)') called at
/Users/jianfengmao/programe_files/VCFtools/bin/merge-vcf line 139
       main::init_cols('HASH(0x10082a3d0)', 'Vcf4_0=HASH(0x10082e110)')
called at /Users/jianfengmao/programe_files/VCFtools/bin/merge-vcf
line 219
       main::merge_vcf_files('HASH(0x10082a3d0)') called at
/Users/jianfengmao/programe_files/VCFtools/bin/merge-vcf line 12

(2). Then I tried to compressed them. After I compressed and indexed the VCF files, I still failed to merge them.

bgzip merge-test-a.vcf
bgzip merge-test-b.vcf
bgzip merge-test-c.vcf

tabix -p vcf merge-test-a.vcf.gz
tabix -p vcf merge-test-b.vcf.gz
tabix -p vcf merge-test-c.vcf.gz

###########################################################################
merge Command:
merge-vcf merge-test-a.vcf.gz merge-test-b.vcf.gz merge-test-c.vcf.gz
| bgzip -c > merg.vcf.gz

results:
zcat: merge-test-a.vcf.gz.Z: No such file or directory
Error reading VCF file.

 at /Users/jianfengmao/programe_files/VCFtools/lib/Vcf.pm line 167
       Vcf::throw('Vcf=HASH(0x1008f32a8)', 'Error reading VCF file.\x{a}')
called at /Users/jianfengmao/programe_files/VCFtools/lib/Vcf.pm line
280
       Vcf::next_line('Vcf=HASH(0x1008f32a8)') called at
/Users/jianfengmao/programe_files/VCFtools/lib/Vcf.pm line 219
       Vcf::_open('Vcf=HASH(0x1008f32a8)') called at
/Users/jianfengmao/programe_files/VCFtools/lib/Vcf.pm line 161
       Vcf::new('Vcf', 'file', 'merge-test-a.vcf.gz') called at
/Users/jianfengmao/programe_files/VCFtools/bin/merge-vcf line 125
       main::init_cols('HASH(0x10082a3d0)', 'Vcf4_0=HASH(0x10082e110)')
called at /Users/jianfengmao/programe_files/VCFtools/bin/merge-vcf
line 219
       main::merge_vcf_files('HASH(0x10082a3d0)') called at
/Users/jianfengmao/programe_files/VCFtools/bin/merge-vcf line 12
###########################################################################
merge Command:
merge-vcf merge-test-a.vcf.gz merge-test-b.vcf.gz merge-test-c.vcf.gz
> merg.vcf.gz

results:
zcat: merge-test-a.vcf.gz.Z: No such file or directory
Error reading VCF file.

 at /Users/jianfengmao/programe_files/VCFtools/lib/Vcf.pm line 167
       Vcf::throw('Vcf=HASH(0x1008f32a8)', 'Error reading VCF file.\x{a}')
called at /Users/jianfengmao/programe_files/VCFtools/lib/Vcf.pm line
280
       Vcf::next_line('Vcf=HASH(0x1008f32a8)') called at
/Users/jianfengmao/programe_files/VCFtools/lib/Vcf.pm line 219
       Vcf::_open('Vcf=HASH(0x1008f32a8)') called at
/Users/jianfengmao/programe_files/VCFtools/lib/Vcf.pm line 161
       Vcf::new('Vcf', 'file', 'merge-test-a.vcf.gz') called at
/Users/jianfengmao/programe_files/VCFtools/bin/merge-vcf line 125
       main::init_cols('HASH(0x10082a3d0)', 'Vcf4_0=HASH(0x10082e110)')
called at /Users/jianfengmao/programe_files/VCFtools/bin/merge-vcf
line 219
       main::merge_vcf_files('HASH(0x10082a3d0)') called at
/Users/jianfengmao/programe_files/VCFtools/bin/merge-vcf line 12

(3). vcf-stats and vcf-validator can work on all the three uncompressed VCF files: merge-test-a.vcf, merge-test-b.vcf, merge-test-c.vcf. But can not on the compressed files.

Command:

vcf-validator merge-test-a.vcf.gz

Results:

zcat: merge-test-c.vcf.gz.Z: No such file or directory
Error reading VCF file.

 at /Users/jianfengmao/programe_files/VCFtools/lib/Vcf.pm line 167
       Vcf::throw('Vcf=HASH(0x10082a0d0)', 'Error reading VCF file.\x{a}')
called at /Users/jianfengmao/programe_files/VCFtools/lib/Vcf.pm line
280
       Vcf::next_line('Vcf=HASH(0x10082a0d0)') called at
/Users/jianfengmao/programe_files/VCFtools/lib/Vcf.pm line 219
       Vcf::_open('Vcf=HASH(0x10082a0d0)') called at
/Users/jianfengmao/programe_files/VCFtools/lib/Vcf.pm line 161
       Vcf::new('Vcf', 'file', 'merge-test-c.vcf.gz') called at
/Users/jianfengmao/programe_files/VCFtools/bin/vcf-validator line 53
       main::do_validation('HASH(0x100804ed0)') called at
/Users/jianfengmao/programe_files/VCFtools/bin/vcf-validator line 14
vcftools vcf merge • 41k views
ADD COMMENT
0
Entering edit mode

Had a similar issue.

Check that the .tbi and .gz files for your vcf-files are in the same directory.

The perl script for vcf-merge goes in and pulls the .tbi files.

ADD REPLY
4
Entering edit mode
13.9 years ago

For part 1, you want to bgzip and tabix index the files as you did in part 2; merge-vcf works on indexed VCF files:

% merge-vcf
About: Merge the bgzipped and tabix indexed VCF files.

Parts 2 and 3 work for me with the example files and the latest release (0.1.3.2):

% bgzip merge-test-a.vcf
% bgzip merge-test-b.vcf
% tabix -p vcf merge-test-a.vcf.gz
% tabix -p vcf merge-test-b.vcf.gz
% merge-vcf merge-test-a.vcf.gz merge-test-b.vcf.gz
Using column name 'A' for merge-test-a.vcf.gz:A
Using column name 'B' for merge-test-b.vcf.gz:B
##fileformat=VCFv4.0
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
[...]
% vcf-stats merge-test-a.vcf.gz
Rows with a call  .. 9
Genotypes total   .. 9
[...]

Which version are you using? Perhaps upgrading to the latest release will fix your problems.

ADD COMMENT
1
Entering edit mode

We'll probably need additional information to help more. What does the output of the following commands look like: ls -lh merge-test*, zcat -V, zcat merge-test-a.vcf.gz. My guess is you might want to change line 204 of Vcf.pm and replace zcat with gunzip -c Please post these as an edit to your initial question, and format them as code (highlight them and press the button with the 1s and 0s on it). Thanks.

ADD REPLY
0
Entering edit mode

My VCFtools is the updated version, VCFtools_0.1.3.2. And, I have checked for the update by using the commands listed in the website for VCFtools.

I am not so good at Unix and do not know much of Perl. I tried many to let my VCFtools workable. But, I failed all the times. I think the true reason is my bad configuration for VCFtools, for "Vcf.pm" or something else.

When I was merging the VCF files, I always got the same results relevant to Vcf.pm. So could you please give me further helps on verifying what has happen with my Vcf.pm.

ADD REPLY
4
Entering edit mode
13.9 years ago
Jianfengmao ▴ 320

I got helps from Dr. Petr Danecek, the author of VCFtools, on this problem. The problem is only occurred for Max OS platform.

Dr. Petr Danecek said:

the problem you are observing is caused by a peculiar behaviour of zcat on Mac OS X which adds .Z to tfile names. This has been fixed in the latest revision (r403) by calling "gunzip -c" instead.

Many thanks to him.

ADD COMMENT
0
Entering edit mode
13.9 years ago
Jianfengmao ▴ 320

It works, after I generated ".vcf.gz.Z" files by copying the original ".vcf.gz" files. But, till now I do not know why.

Could you please help me to explain it? I am using updated Mac OS.

% bgzip merge-test-a.vcf
% bgzip merge-test-b.vcf
% tabix -p vcf merge-test-a.vcf.gz
% tabix -p vcf merge-test-b.vcf.gz
$ cp merge-test-a.vcf.gz merge-test-a.vcf.gz.Z
$ cp merge-test-b.vcf.gz merge-test-b.vcf.gz.Z
% merge-vcf merge-test-a.vcf.gz merge-test-b.vcf.gz
ADD COMMENT
0
Entering edit mode

This is an addition to your initial question, and should be posted as an edit to that question instead of an answer. This helps keep things organized for future users.

ADD REPLY
0
Entering edit mode
12.5 years ago
user56 ▴ 300

I had a similar problem and I had to use windows :-(.

If you are working with small-ish VCF files you can use R to work with the data (e.g., split it)

To load the file use:

file='e:/d/genome/t300.txt'
v <- read.table(file,sep='\t',header = T,fileEncoding="utf-16")
str(v)

The UTF-16 encoding was particulary hard to troubleshoot. Eventually Notepad++ helped me to detect this encoding problem.

It correctly ignores the header lines and detects column headers as well.

To make VCF file smaller under windows, you can use PowerShell (alternative shell directly from Microsoft) you use these commands in the powershell: (e.g., first 3000 lines)

$a=(Get-Content C:\largeVCF.txt)[0 .. 3000]
$a>largeVCF-subset.txt
ADD COMMENT

Login before adding your answer.

Traffic: 2512 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6