I want to download the last release of the phased 1000Genomes (high coverage), that it is in the hg38 build but only for a set of samples (203 samples to pre precise)...
I have used the command line:
tabix -h http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/CCDG_14151_B01_GRM_WGS_2020-08-05_chr1.filtered.shapeit2-duohmm-phased.vcf.gz chr1 | vcf-subset -c Sample_1kgp.txt | bgzip -c > CCDG_14151_B01_GRM_WGS_2020-08-05_chr1.filtered.shapeit2-duohmm-phased_out.vcf.gz
It starts to download but then eventually I get this error that appears do be random (sometimes I get it right after starting the download and sometimes it past already 1 hour and then this happens).
[W::bgzf_read_block] EOF marker is absent. The input is probably truncated
Broken VCF: empty columns (trailing TABs) starting at chr1:35966205.
Wrong number of fields; expected 3211, got 1926.
and this error at the end too:
at /usr/local/Cellar/vcftools/0.1.16/lib/perl5/site_perl/Vcf.pm line 172, <STDIN> line 968801.
Vcf::throw(Vcf4_1=HASH(0x7fdcbe8b2c40), "Wrong number of fields; expected 3211, got 1926. The offendin"...) called at /usr/local/Cellar/vcftools/0.1.16/lib/perl5/site_perl/Vcf.pm line 507
VcfReader::next_data_hash(Vcf4_1=HASH(0x7fdcbe8b2c40)) called at /usr/local/Cellar/vcftools/0.1.16/lib/perl5/site_perl/Vcf.pm line 3479
Vcf4_1::next_data_hash(Vcf4_1=HASH(0x7fdcbe8b2c40)) called at /usr/local/Cellar/vcftools/0.1.16/libexec/bin/vcf-subset line 146
main::vcf_subset(HASH(0x7fdcbd8243c0)) called at /usr/local/Cellar/vcftools/0.1.16/libexec/bin/vcf-subset line 12
Any inputs to solve this?
Thanks
I've tried to do this before (using bcftools) and eventually gave up because I could find no way to keep the connection open.