Question

Why does gnomAD have vcf and tbi chromosome file

0

Entering edit mode

2.9 years ago

arya.sagittarius ▴ 10

I wanted to extract a vcf of a given sample from gnomAD (https://gnomad.broadinstitute.org/downloads) and do downstream analysis on it. My question is why is that they have bith vcf and tbi given. if tbi is for tab index file why is that I get an error while using it in the following command:

bcftools view -s HGDP00076 gnomad.genomes.v3.1.2.sites.chr1.vcf.bgz.tbi

[E::hts_hopen] Failed to open file gnomad.genomes.v3.1.2.sites.chr1.vcf.bgz.tbi [E::hts_open_format] Failed to open file "gnomad.genomes.v3.1.2.sites.chr1.vcf.bgz.tbi" : Exec format error Failed to read from gnomad.genomes.v3.1.2.sites.chr1.vcf.bgz.tbi: Exec format error

vcf tbi gnomad • 2.2k views

ADD COMMENT • link updated 2.7 years ago by cpad0112 21k • written 2.9 years ago by arya.sagittarius ▴ 10

score 3 · Accepted Answer · 2022-01-07

3

Entering edit mode

2.9 years ago

Pierre Lindenbaum 164k

bcftools view gnomad.genomes.v3.1.2.sites.chr1.vcf.bgz

the tbi file is an index used by bcftools to allow a quick random access at a defined genomic region.

there is not sample available in the public release of gnomad.

The output of bcftools query -l gnomad.genomes.v3.1.2.sites.chr1.vcf.bgz will be empty.

don't use -s HGDP00076 as there is no genotype/sample in this file.

ADD COMMENT • link 2.9 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

Im using the gnomADv3-variants-genomes data in the link (gnomad.genomes.v3.1.2.hgdp_tgp.chr1.vcf.bgz ). They have multiple samples in the vcf This command takes a very long time to process. Is there any workaround for that?

ADD REPLY • link 2.9 years ago by arya.sagittarius ▴ 10

0

Entering edit mode

well the file for chr1 is 261G so it should take a few minutes/hours to complete. You can always try to use the --threads option of bcftools to speed up things.

ADD REPLY • link 2.9 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

yes. but any workaround to extract a particular sample as it takes a really long time! So thats where I wanted to know if indexing helps or only threads is an option?

ADD REPLY • link 2.9 years ago by arya.sagittarius ▴ 10

2

Entering edit mode

indexing only helps if you only need a genomic interval.

ADD REPLY • link 2.9 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

ok. got it! thank you so much. So im concluding that

cmd:

bcftools view -s HGDP00076 gnomad.genomes.v3.1.2.hgdp_tgp.chr1.vcf.bgz --threads

is the only option to work this out fast.

ADD REPLY • link 2.9 years ago by arya.sagittarius ▴ 10

0

Entering edit mode

don't use -s HGDP00076 as there is no genotype/sample in this file. (copy/pasted from Pierre's message)

ADD REPLY • link 2.7 years ago by cpad0112 21k