Entering edit mode
3.2 years ago
adam
•
0
I have a FASTQ file representing a WGS from Dante Labs. Is it possible to phase the sequence into haplotypes, and what software should I be looking at to do this? Any working command line examples would be greatly appreciated for learning. Thank you.
You can't phase a fastq file. You would need to call the variants first and produce a .vcf or similar to phase.
Thank you. Once I call the variants (is gatk a good tool for this?) from the FASTQ file, what tool(s) should I be looking at to phase?
It goes fastq -> bam through read alignment and then bam -> vcf through variant calling. I would suggest following these instructions http://www.htslib.org/workflow/#fastq_to_bam. GATK is supposedly the gold standard but it's slow and a bit of a pain to use, so unless you need to, I would maybe use bcftools or octopus. For phasing, I would suggest either Beagle 5.2 or shapeit4.
Thank you so much!
Any chance you would have a guide for phasing (beagle 5.2 or shapeit4) that is as clear and helpful as the above link for FastQ -> BAM?
Would that be helpful if you have just a single sample?
Yeah it's fine as long as you have a reference panel or the 1000 genomes. IN fact, you could actually just submit your data to get phased by the Mighigan / Sanger imputation server / topmed server and they will do the computations for you.
Hi adam,
Could you maybe elaborate on what things you are trying to determine the phase of? And what is the distance between the variants that you care about?
Wouter