I am trying to create a database using the varient data gathered from gnomAD. Their sample size is unnecessarily large for me so I want to subset the sample size down to about 1k. To do so I thought I would use bcftools to randomly get 1k sample ID's to a file with
bcftools query -l | shuf | head -1000 > gnomad.genomes.r2.1.1.sites.21.vcf.bgz myRandomIDs.txt
and then use these ids to extract a subset of the variants from all chromosome vcf files one by one with the following
bcftools view --samples-file myRandomIDs.txt gnomad.genomes.r2.1.1.sites.21.vcf.bgz -o myNewVCF.vcf
The problem is, unlike with other variant databases (like 1000 genomes), the first bcftools to return sample ID's return nothing, does this mean gnomAD holds no sample ID's data? How is this possible? How can I go about achieving my goal?