POS Length of 1000 genomes VCF
2
0
Entering edit mode
7.7 years ago

Fairly new to the whole population genetics field and have what is probably a very simple question: how do I obtain the position (POS) range of each VCF file from the 1000 genomes project (e.g. ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf)? Getting familiar with vcftools so if there is a way to do this with that, that would be great.

Thanks

vcftools vcf 1000genomes • 1.8k views
ADD COMMENT
0
Entering edit mode

it's not clear to me.

ADD REPLY
0
Entering edit mode

The 1000genomes VCF files are per chromosome and only have one genomic range. I was hoping to obtain this genomic range (i.e. chr1:min-max)

ADD REPLY
0
Entering edit mode
7.7 years ago
curl -s  "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz" |\
 gunzip -c |\
 awk -F '\t' 'BEGIN{m=250E6;M=0;}/^#/ {next;} {v=int($2);if(v<m) m="v;if(v">M) M=v;} END {printf("%d -> %d\n",m,M);}'


16050075 -> 51244237

or, as the POS are sorted you can simply use:

$ curl -s  "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz" | \
 gunzip -c | grep -v "^#" |\
 cut -f2 | (head -n1;tail -1)
16050075
51244237
ADD COMMENT
0
Entering edit mode

Thanks for the info, however I was wondering if there may be a way without decompressing the VCF by any chance? e.g. by using tabix?

ADD REPLY
0
Entering edit mode
7.7 years ago
Mitch Bekritsky ★ 1.3k

Using bcftools, you should be able to do something like this without decompressing your VCF:

bcftools query -f "%POS\n" <your VCF> | awk 'NR == 1; END{print}

A word of warning: I haven't tried this out yet, so it might contain an error or two.

ADD COMMENT
0
Entering edit mode

well, the VCF is decompressed under the hood....

ADD REPLY

Login before adding your answer.

Traffic: 2350 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6