I'm new in bioinformatics, and i'm trying to do phasing and imputation to WGS-level.
For imputation with Beagle, I would like to make a bref file from a vcf file.
And I have to phase the reference panel for that.
Is a BAM file required for phasing by GATK?
Is there a way to do phasing that doesn't require a BAM file?
BAMs, which are a binary sequence alignment file (i.e., reads mapped to a reference genome), will typically be used at some point in a variants discovery and phasing pipeline. So to my knowledge, yes BAMs will usually be required for phasing.
GATK's phasing is very different from Beagle's phasing. The former only uses a single sample and produces information about physical phasing, where the latter generates statistical phase information using whole population data.
I can understand that there are several types of phasing. Currently, I am facing the following error when creating a bref file.
java -jar bref3.22Jul22.46e.jar ref_panel.vcf.gz > ref_panel.bref
Exception in thread "main" java.lang.IllegalArgumentException: ERROR: unphased or missing genotype for reference sample ... at marker [chr1 100 . G *]
at vcf.VcfRecGTParser.throwIllegalArgException(VcfRecGTParser.java:415)
at vcf.VcfRecGTParser.phasedAlleles(VcfRecGTParser.java:359)
at vcf.VcfRecGTParser.nonMajRefIndices(VcfRecGTParser.java:490)...
...
at bref.Bref3.refIt(Bref3.java:101)
at bref.Bref3.writeBref(Bref3.java:82)
at bref.Bref3.main(Bref3.java:47)
In this case, which is more suitable for converting unphased genotypes to phased genotypes, GATK's physical phasing or Beagle's phasing?
I am unfamiliar with the tool you are using, so I cannot say, but it looks like it may be due to GATK's * character. But if you want longer haplotype blocks to be inferred, I would suggest using a statistical phasing method like Beagle.
I can understand that there are several types of phasing. Currently, I am facing the following error when creating a bref file.
In this case, which is more suitable for converting unphased genotypes to phased genotypes, GATK's physical phasing or Beagle's phasing?
I am unfamiliar with the tool you are using, so I cannot say, but it looks like it may be due to GATK's
*
character. But if you want longer haplotype blocks to be inferred, I would suggest using a statistical phasing method like Beagle.