I am trying to extract the variants falling within a given genomic window from a VCF file.
I have tried pretty much all suggestions from Post not found, but my terminal is staying mute!
The file in question is:
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/release/20190312_biallelic_SNV_and_INDEL/ALL.chr20.shapeit2_integrated_snvindels_v2a_27022019.GRCh38.phased.vcf.gz
The lead I was following was:
Unzipping the file.
Bgzip it with
bgzip chr20.vcf
Index it with
tabix -p vcf chr20.vcf.gz
Then I have tried
tabix chr20.vcf.gz 1,000-20,000,000
tabix -R window.txt chr20.vcf.gz
tabix -R window.bed chr20.vcf.gz
window.txt and window.bed are:
chr20 1000 20000000
(tab-separated)
I have also tried
vcftools --vcf chr20.vcf --bed oprwindow.bed --out trim --recode --keep-INFO-all
I get
VCFtools - 0.1.16 (C) Adam Auton and Anthony Marcketta 2009
Parameters as interpreted: --vcf chr20.vcf --recode-INFO-all --out hey --recode --bed window.bed
Error: VCF version must be v4.0, v4.1 or v4.2: You are using version VCFv4.3
What am I doing wrong? Any lead?
I have also tried
vcftools is deprecated, use bcftools
the format for tabix is
tabix file "chrom:start-end "
and NOTtabix file "start-end "
furthermore may be it is is "chr20" not "20"
Ah you know what, I had tried
chr20:1,000-20,000,000
but not20:1,000-20,000,000
, which worked! Thank you.So for anyone reading, correct commands are:
Can write to a file as:
it's useless, the file are already b-gzipped.