background exome gVCFs for haplotype calling

0

Entering edit mode

4.3 years ago

ccagg ▴ 60

I have a relatively small sample of human exomes (n=11) that I would like to call SNPs for, using the GATK pipeline. From reading the GATK documentation, it seems that the best way to do this is to use many exomes as "background" for the genotype calling and refinement steps. My lab, however, is very new to exome-seq, and we only have the 11 we generated on hand.

Is there a database somewhere of exome data that I could use as the background? I think gVCFs would be preferable, but could potentially work from fastq or bam if necessary. We have access to the UKBiobank, but it seems like there were some issues with their exome data that might dissuade me from using their gVCFs. If there isn't available exomes, would there be a problem with using genomic data (like the gVCFs available from HGDP) for this step?

exome GATK • 1.1k views

ADD COMMENT • link 4.3 years ago by ccagg ▴ 60

0

Entering edit mode

GATK should offer resources if they recommend something in their pipeline. Can you show us the page where they make this recommendation?

ADD REPLY • link 4.3 years ago by Ram 44k

0

Entering edit mode

Yeah I guess they don't explicitly say it, but here they definitely seem to elude to having a large cohort, but also several people at my institute have recommended that you run it with other background exomes. I think this makes sense since the refinement is a machine learning-based method.

ADD REPLY • link 4.3 years ago by ccagg ▴ 60

0

Entering edit mode

Take a look at this page: https://gatk.broadinstitute.org/hc/en-us/articles/360035890811-Resource-bundle

ADD REPLY • link 4.3 years ago by Ram 44k

Login before adding your answer.