Hi all,
I'm interested in super-population level allele frequencies of SNPs on a gene. I cropped my variants of interest and made a smaller vcf file. Now, I'd like to extract the allele frequencies for these variants.
I've been following the instructions here, even though it gives a "micro-populations" level solution. However, I receive a bunch of warnings and an empty vcf file when I run the following command:
vcf-subset -c CEU.samples.list gene_of_interest_variants.vcf.gz | fill-an-ac |
bgzip -c > CEU.myGene.vcf.gz
Here are the warnings:
No such column in the VCF file: "NA20526"
at /usr/local/bin/vcf-subset line 21.
main::error('No such column in the VCF file: "NA20526"\x{a}') called at /usr/local/bin/vcf-subset line 111
main::check_columns('HASH(0x1ee51a8)', 'Vcf4_1=HASH(0x1eed3f0)', 'ARRAY(0x2384a78)') called at /usr/local/bin/vcf-subset line 133
main::vcf_subset('HASH(0x1ee51a8)') called at /usr/local/bin/vcf-subset line 12
Broken VCF header, no column names?
at /usr/local/share/perl/5.18.2/Vcf.pm line 172.
Vcf::throw('Vcf4_2=HASH(0x21e5a90)', 'Broken VCF header, no column names?') called at /usr/local/share/perl/5.18.2/Vcf.pm line 867
VcfReader::_read_column_names('Vcf4_2=HASH(0x21e5a90)') called at /usr/local/share/perl/5.18.2/Vcf.pm line 602
VcfReader::parse_header('Vcf4_2=HASH(0x21e5a90)') called at /usr/local/bin/fill-an-ac line 45
main::fill_an_ac(undef) called at /usr/local/bin/fill-an-ac line 9
Is it basically because my down-sampled vcf is not well formatted? If so, is there a better way to crop the variants that I'm interested in? (I used vcftools with --from-bp & --to-bp options)
Any help is much appreciated. Cheers - Gökberk
As I knew, you don't need to do it by yourself. Gnomad have done it in his VCF files and you just need to extract this information.
https://gnomad.broadinstitute.org/downloads
Thanks Shicheng. I've just managed to do it using
bcftools query
. In case someone needs it, here is the command I used: