Hi,
I'm working on a vegetal diploid genome (aroud 200 Mb) trying to obtain the best assembly with PacBIO RSII P6-C4 reads (70X coverage).
I have used Falcon and Falcon Unzip (0.7+git.2059148090374ac08a494d842dc1def105aeee50) Then I have run fc_quiver from this Falcon 0.7
I have now a "phased" assembly consisting in two files : cns_p_ctg.fasta (primary contigs) and cns_h_ctg.fasta (haplotigs) After this step, I thinks that it is a good choice to do an additional polishing with Arrow from (smrtlink/5.0.1.9585). Is this OK ?
But then I'm a bit confused. The final assembly is ONLY cns_p_ctg.fasta (primary contigs) ? or Should I merge cns_p_ctg.fasta (primary contigs) and cns_h_ctg.fasta (haplotigs) ?
I have read and heard so many diffrent things about this strategy ...
My opinion is that the final assembly would be only the primary contigs and the haplotigs (cns_h_ctg) are useful to determine the differents alleles in the diploid genome.
Am I right or completly wrong ?
Thanks
I would say you're right. Given that you try to assemble a diploid heterozygous genome.
The final assembly would only consist of the primary contigs. In an homozygous (diploid) assembly you would also only result in a single haplotype. The other haplotigs are indeed to determine the different allelic variation.
Thank you for confirming my opinion and removing my doubts ;)