So I downloaded a gigantic gzipped VCF from the Mota man genome (ftp://biodisk.org/Store/Genome/African/Mota_man/Bam_and_VCF/GB20_sort_merge_dedup_l30_IR_q30_mapDamage_Entire.vcf.gz).
I want to filter it down to just the sites where I have data from the Human Origins SNP array. I make a file called positions.txt, which has chromosome TAB basepair for all the SNPs I have data one. A lot of the SNPs don't have rs #s, so that route won't work. Luckily everything is Hg19. Here are the first few lines of positions.txt
1 842013
1 891021
1 903426
1 949654
1 1018704
I run
vcftools --gzvcf ../GB20_sort_merge_dedup_l30_IR_q30_mapDamage_Entire.vcf.gz --positions positions.txt --recode --out Mota_HuOrg
and then I get an error message as follows
VCFtools - v0.1.12b
(C) Adam Auton and Anthony Marcketta 2009
Parameters as interpreted:
--gzvcf ../GB20_sort_merge_dedup_l30_IR_q30_mapDamage_Entire.vcf.gz
--out Mota_HuOrg
--positions positions.txt
--recode
Using zlib version: 1.2.3
Versions of zlib >= 1.2.4 will be *much* faster when reading zipped VCF files.
After filtering, kept 1 out of 1 Individuals
Outputting VCF file...
After filtering, kept 0 out of a possible -1563604250 Sites
File does not contain any sites
Run Time = 11431.00 seconds
Does anyone know what I am doing wrong? Thanks!
Still, thank you very much for the post. I already know how to filter by position.