How to extract variants falling in a genomic window from a VCF?
0
0
Entering edit mode
3.4 years ago
francois ▴ 80

I am trying to extract the variants falling within a given genomic window from a VCF file.

I have tried pretty much all suggestions from Post not found, but my terminal is staying mute!

The file in question is:

ftp:/­/­ftp.­1000genomes.­ebi.­ac.­uk/­vol1/­ftp/­data_collections/­1000_genomes_project/­release/­20190312_biallelic_SNV_and_INDEL/­ALL.­chr20.­shapeit2_integrated_snvindels_v2a_27022019.­GRCh38.­phased.­vcf.­gz

The lead I was following was:

Unzipping the file.

Bgzip it with

bgzip chr20.vcf

Index it with

tabix -p vcf chr20.vcf.gz

Then I have tried

tabix chr20.vcf.gz 1,000-20,000,000
tabix -R window.txt chr20.vcf.gz
tabix -R window.bed chr20.vcf.gz

window.txt and window.bed are:

chr20 1000 20000000

(tab-separated)


I have also tried

vcftools --vcf chr20.vcf --bed oprwindow.bed --out trim --recode --keep-INFO-all

I get

VCFtools - 0.1.16 (C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted: --vcf chr20.vcf --recode-INFO-all --out hey --recode --bed window.bed

Error: VCF version must be v4.0, v4.1 or v4.2: You are using version VCFv4.3


What am I doing wrong? Any lead?

vcf • 1.0k views
ADD COMMENT
3
Entering edit mode

I have also tried

vcftools

vcftools is deprecated, use bcftools

ADD REPLY
2
Entering edit mode

tabix chr20.vcf.gz 1,000-20,000,000

the format for tabix is tabix file "chrom:start-end " and NOT tabix file "start-end "

furthermore may be it is is "chr20" not "20"

ADD REPLY
0
Entering edit mode

Ah you know what, I had tried chr20:1,000-20,000,000 but not 20:1,000-20,000,000, which worked! Thank you.

So for anyone reading, correct commands are:

tabix -p vcf chr20.vcf.gz

tabix chr20.vcf.gz 20:1,000-20,000,000

Can write to a file as:

tabix chr20.vcf.gz 20:1,000-20,000,000 > trim.vcf
ADD REPLY
2
Entering edit mode

Unzipping the file. Bgzip it with

it's useless, the file are already b-gzipped.

$ wget -q -O - "http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/release/20190312_biallelic_SNV_and_INDEL/ALL.chr20.shapeit2_integrated_snvindels_v2a_27022019.GRCh38.phased.vcf.gz" | file -
/dev/stdin: gzip compressed data, extra field
ADD REPLY

Login before adding your answer.

Traffic: 2642 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6