Hi guys,
Very new to all this so please bear with me.
I just finished running through GATK's best practices and am finally at the genotype refinement step. I am confused about a few things. From their old videos it seems like I need to proceed with phasing and then imputation, which they recommend a few of their own tools for phasing and then Beagle for imputing.
For background, I am looking at a set of control human genomes, a set of low coverage cases, and a set of high coverage cases (different from the low coverage individuals). I did joint genotyping with all 3 sets together and now I have a large vcf filtered by tranche level.
- GATK websites says if I made my vcfs as GVCF (I think it was -ERCP?) then the physical phasing would have been done already. Does this mean I still need to phase my data? Or only the individual genome was phased and now that I have a larger cohort after joint genotyping I need to phase it again? Because I don't see the "|" in my genotype calls.
- I did stumble upon their 2015 best practice video and it seems they have two tools to perform the phasing after. If I use their phasing tools then does that mean I can go straight to imputation step with tools like IMPUTE2 and Beagle? I am assuming I would have to split it back into the 3 sets for imputation?
- Beagle 4.0 doesn't have really a website with getting started portion. Any recommendations on what software to use for imputation and how to get started?
Thanks for any help or direction.