From where can I obtain vcf files of healthy exomes
2
0
Entering edit mode
14 months ago

I want to do a simulation analysis for my project which requires benchmarking on a cohort of 200-300 exomes of healthy people. I tried to download such data from GnomAD or the 1000 Genome project but each VCF contains thousands of samples and I do not need more than 200-300 exome vcfs. Any idea how I could access such files?

exome VCF • 1.1k views
ADD COMMENT
1
Entering edit mode

How are you defining ‘healthy’?

ADD REPLY
0
Entering edit mode

The only purpose of this part of my project is to spike in specific ClinVar pathogenic mutations associated with rare diseases into a background (exome) of an otherwise "healthy" individual or an individual that does not have a congenital rare disease. I just need exome VCFs mapped to hg19 of a cohort of 200-300 individuals for this.

ADD REPLY
1
Entering edit mode
14 months ago

I tried to download such data from GnomAD or the 1000 Genome project but each VCF contains thousands of samples

VCF is tab delimited. use cut to keep 200-300 samples from those big VCFs...

wget -O - "https://url/genotypes.vcf.gz" | gunzip -c | cut -f1-200 | bcftools view -O z -o out.vcf.gz
ADD COMMENT
0
Entering edit mode

Thank you, can you please suggest a link for vcf of healthy exomes?

ADD REPLY
1
Entering edit mode
14 months ago

No publicly available resource will give you VCF of individuals, because this would be revealing confidential information about those individuals. Almost all publically available resources will give you variant frequencies within a population. This is usually suitable for most purposes.

If you do really need VCFs of individuals, you will have to apply for access to protected information at one of the big resources. Most big resources have a way to request access to protected information. They will need evidence that you are a genuine researcher, that your computer systems are sufficiently secure to handle protected data, and that you have a good reason for wanting access to the data.

ADD COMMENT
2
Entering edit mode

gnomad provides the "HGDP + 1KG callset" genotypes https://gnomad.broadinstitute.org/downloads

ADD REPLY

Login before adding your answer.

Traffic: 2599 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6