Why does gnomAD have vcf and tbi chromosome file
1
0
Entering edit mode
2.9 years ago

I wanted to extract a vcf of a given sample from gnomAD (https://gnomad.broadinstitute.org/downloads) and do downstream analysis on it. My question is why is that they have bith vcf and tbi given. if tbi is for tab index file why is that I get an error while using it in the following command:

bcftools view -s HGDP00076 gnomad.genomes.v3.1.2.sites.chr1.vcf.bgz.tbi

[E::hts_hopen] Failed to open file gnomad.genomes.v3.1.2.sites.chr1.vcf.bgz.tbi [E::hts_open_format] Failed to open file "gnomad.genomes.v3.1.2.sites.chr1.vcf.bgz.tbi" : Exec format error Failed to read from gnomad.genomes.v3.1.2.sites.chr1.vcf.bgz.tbi: Exec format error

vcf tbi gnomad • 2.2k views
ADD COMMENT
3
Entering edit mode
2.9 years ago

bcftools view gnomad.genomes.v3.1.2.sites.chr1.vcf.bgz

the tbi file is an index used by bcftools to allow a quick random access at a defined genomic region.

there is not sample available in the public release of gnomad.

The output of bcftools query -l gnomad.genomes.v3.1.2.sites.chr1.vcf.bgz will be empty.

don't use -s HGDP00076 as there is no genotype/sample in this file.

ADD COMMENT
0
Entering edit mode

Im using the gnomADv3-variants-genomes data in the link (gnomad.genomes.v3.1.2.hgdp_tgp.chr1.vcf.bgz ). They have multiple samples in the vcf This command takes a very long time to process. Is there any workaround for that?

ADD REPLY
0
Entering edit mode

well the file for chr1 is 261G so it should take a few minutes/hours to complete. You can always try to use the --threads option of bcftools to speed up things.

ADD REPLY
0
Entering edit mode

yes. but any workaround to extract a particular sample as it takes a really long time! So thats where I wanted to know if indexing helps or only threads is an option?

ADD REPLY
2
Entering edit mode

indexing only helps if you only need a genomic interval.

ADD REPLY
0
Entering edit mode

ok. got it! thank you so much. So im concluding that

cmd:

bcftools view -s HGDP00076 gnomad.genomes.v3.1.2.hgdp_tgp.chr1.vcf.bgz --threads

is the only option to work this out fast.

ADD REPLY
0
Entering edit mode

don't use -s HGDP00076 as there is no genotype/sample in this file. (copy/pasted from Pierre's message)

ADD REPLY

Login before adding your answer.

Traffic: 1623 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6