Extracting super-population allele frequencies from 1000 genomes (phase 3) data
0
2
Entering edit mode
5.4 years ago
gokberk ▴ 90

Hi all,

I'm interested in super-population level allele frequencies of SNPs on a gene. I cropped my variants of interest and made a smaller vcf file. Now, I'd like to extract the allele frequencies for these variants.

I've been following the instructions here, even though it gives a "micro-populations" level solution. However, I receive a bunch of warnings and an empty vcf file when I run the following command:

vcf-subset -c CEU.samples.list gene_of_interest_variants.vcf.gz | fill-an-ac |
    bgzip -c > CEU.myGene.vcf.gz

Here are the warnings:

No such column in the VCF file: "NA20526"
 at /usr/local/bin/vcf-subset line 21.
    main::error('No such column in the VCF file: "NA20526"\x{a}') called at /usr/local/bin/vcf-subset line 111
    main::check_columns('HASH(0x1ee51a8)', 'Vcf4_1=HASH(0x1eed3f0)', 'ARRAY(0x2384a78)') called at /usr/local/bin/vcf-subset line 133
    main::vcf_subset('HASH(0x1ee51a8)') called at /usr/local/bin/vcf-subset line 12
Broken VCF header, no column names?
 at /usr/local/share/perl/5.18.2/Vcf.pm line 172.
    Vcf::throw('Vcf4_2=HASH(0x21e5a90)', 'Broken VCF header, no column names?') called at /usr/local/share/perl/5.18.2/Vcf.pm line 867
    VcfReader::_read_column_names('Vcf4_2=HASH(0x21e5a90)') called at /usr/local/share/perl/5.18.2/Vcf.pm line 602
    VcfReader::parse_header('Vcf4_2=HASH(0x21e5a90)') called at /usr/local/bin/fill-an-ac line 45
    main::fill_an_ac(undef) called at /usr/local/bin/fill-an-ac line 9

Is it basically because my down-sampled vcf is not well formatted? If so, is there a better way to crop the variants that I'm interested in? (I used vcftools with --from-bp & --to-bp options)

Any help is much appreciated. Cheers - Gökberk

vcf 1000genomes • 1.5k views
ADD COMMENT
1
Entering edit mode

As I knew, you don't need to do it by yourself. Gnomad have done it in his VCF files and you just need to extract this information.

https://gnomad.broadinstitute.org/downloads

ADD REPLY
1
Entering edit mode

Thanks Shicheng. I've just managed to do it using bcftools query. In case someone needs it, here is the command I used:

bcftools query -f '%CHROM %POS %EAS_AF %AMR_AF %AFR_AF %EUR_AF %SAS_AF \n' myFile.bcf
ADD REPLY

Login before adding your answer.

Traffic: 1705 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6