Filtering GnomAD VCF To a Subset of Samples
1
0
Entering edit mode
2.1 years ago
Arda • 0

I am trying to create a database using the varient data gathered from gnomAD. Their sample size is unnecessarily large for me so I want to subset the sample size down to about 1k. To do so I thought I would use bcftools to randomly get 1k sample ID's to a file with

bcftools query -l | shuf | head -1000 > gnomad.genomes.r2.1.1.sites.21.vcf.bgz myRandomIDs.txt

and then use these ids to extract a subset of the variants from all chromosome vcf files one by one with the following

bcftools view --samples-file myRandomIDs.txt gnomad.genomes.r2.1.1.sites.21.vcf.bgz -o myNewVCF.vcf

The problem is, unlike with other variant databases (like 1000 genomes), the first bcftools to return sample ID's return nothing, does this mean gnomAD holds no sample ID's data? How is this possible? How can I go about achieving my goal?

gnomAD VCF • 622 views
ADD COMMENT
1
Entering edit mode
2.1 years ago

gnomAD holds no sample ID's data?

yes

How is this possible?

they keep their secrets, patients don't wan't their health data to be shared etc...

How can I go about achieving my goal?

with gnomad ? you can't unless you're a member of the broad institute (or you ask for an access ?)

ADD COMMENT

Login before adding your answer.

Traffic: 1540 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6