I'm trying to assemble small (20Mb), diploid fungal genome from MiSeq reads (~400bp after merging, 100x coverage).
The tricky thing: it's heterozygous. The divergence is ~4%, but there are hundreds (thousands?) loss of heterozygosity (LOH) regions, accounting in total to almost half of the genome...
Do you know of any assembler (methodology) capable of handling with such data?
I have tried many assemblers/tricks:
- de Bruijn graph: Velvet, ABySS, SOAPdenovo
- overlap-based: MIRA, Newbler
- clustering reads and then assembling with CAP3
but with rather bad effects. Every time, the assembly was very fragmented (N50 ~10kb), homozygous regions (LOH) were collapsed with ~200x, while heterozygous regions were separated with 100x.
I will be happy to hear about any ideas:)
EDIT
As no satisfactory solution exists, I'm developing Redundans (check below for more info).
Hello Leszek,
I am attempting to run your program but am having issues. I first installed using the pre-compiled binaries via your link and then tried to manually install all of the programs. After confirming all the dependencies have been acquired I am still getting some errors in the redudans.py script. Any suggestions are greatly appreciated! I am running ubuntu 16.04 on an Intel Core i7.
This is not an answer but rather another question. It would be better to open a new thread for your question if you need people to respond quickly.
Also, the error seems to be with test/run1, do you have a sub-directory named test in your directory already?
thanks, I'll ask a new question. But the directory does not. Using my own data set, the error is now failure to recognize GapCloser
despite GapCloser in directory:
you need to add the path to GapCloser to your system PATH. Have a look how to do it: https://github.com/lpryszcz/redundans/issues/23