Hi everyone,
I have illumina reads for some hybrid (monacha mates with lucida) samples and a pure monacha sample, and I want get the maternal haplotype (monacha) for each of the hybrids. I have genome of occidentalis (not actually the whole genome but 5276 genes) and I know that occidentalis is sister to lucida. So I'm now trying to the use genome of occidentalis as reference to call variants, phased by whatshap and then use bcftools to generate consensus. The command I used is
bcftools consensus -H 2 -f reference.fasta phased_vcf.gz > monacha_haplotype.fasta
The problem is that bcftools will fill the gaps with sequence from the reference, but I want to fill gaps with "N". According to bcftools manual, the command -m can mask regions with N but it looks like I need to define the region by hands. And that's impossible because I have 5276 genes.
Does anyone know any tools that can do this job more easily? Or does anyone know a better method to haplotype according to my circumstance.
Thanks, Yunyang Wang
- I don't know how feasible this is but can you de-novo assemble and then use this genome as the reference for phasing with whatshap?
- when you say you have 5276 genes, how is this data organised?
- what are the gaps that bcftools will fill?