Question

Filtering GnomAD VCF To a Subset of Samples

0

Entering edit mode

2.1 years ago

Arda • 0

I am trying to create a database using the varient data gathered from gnomAD. Their sample size is unnecessarily large for me so I want to subset the sample size down to about 1k. To do so I thought I would use bcftools to randomly get 1k sample ID's to a file with

bcftools query -l | shuf | head -1000 > gnomad.genomes.r2.1.1.sites.21.vcf.bgz myRandomIDs.txt

and then use these ids to extract a subset of the variants from all chromosome vcf files one by one with the following

bcftools view --samples-file myRandomIDs.txt gnomad.genomes.r2.1.1.sites.21.vcf.bgz -o myNewVCF.vcf

The problem is, unlike with other variant databases (like 1000 genomes), the first bcftools to return sample ID's return nothing, does this mean gnomAD holds no sample ID's data? How is this possible? How can I go about achieving my goal?

gnomAD VCF • 622 views

ADD COMMENT • link updated 2.1 years ago by Pierre Lindenbaum 164k • written 2.1 years ago by Arda • 0

score 1 · Answer 1 · 2022-10-26

1

Entering edit mode

2.1 years ago

Pierre Lindenbaum 164k

gnomAD holds no sample ID's data?

yes

How is this possible?

they keep their secrets, patients don't wan't their health data to be shared etc...

How can I go about achieving my goal?

with gnomad ? you can't unless you're a member of the broad institute (or you ask for an access ?)

ADD COMMENT • link 2.1 years ago by Pierre Lindenbaum 164k