Fast genotyping for PRS calculation

0

Entering edit mode

3.1 years ago

German.M.Demidov ★ 3.0k

Dear community members,

I have a lot of variants for genotyping (>6 millions) and a lot of WGS samples (represented as BAM and VCF files).

My strategy for genotyping before was to read the list of variants and then iterate through VCF files, using a custom Python script. However I anticipate it will work very slow for such a huge number of samples.

Is there a way to quickly genotype a huge WGS cohort? Should I use BAM or VCF files for that?

Another issue is that VCF are called in GRCh38 and the variants for genotyping are in hg19, so for some variants where reference allele was changed in GRCh38 VCFs could be not enough, but this is a minor problem...

PRS • 761 views

ADD COMMENT • link 3.1 years ago by German.M.Demidov ★ 3.0k

1

Entering edit mode

liftover your list of variants, split your list of variants per regions of XXX variants to call the BAMs in GVCF mode with GATK. Combine and Genotype the GVCFs , concatenate each region. use a workflow manager to run everything in parallel.

ADD REPLY • link 3.1 years ago by Pierre Lindenbaum 165k

0

Entering edit mode

Thanks a lot! I am not very used to GATK infrastructure, but I guess it is time to learn =)

ADD REPLY • link 3.1 years ago by German.M.Demidov ★ 3.0k

Login before adding your answer.