Hello Everyone
I have query regarding tool for de novo assembly of PacBio data. I have plant genome data at 52X coverage. The genome size is around 500 mb
I used HGAP (RS_HGAP_assembly2
, RS_HGAP_assembly3
, RS_preassembly 2
) tool through SMRT portal it giving me error of "ERROR! Reading fasta files greater than 4Gbytes is not supported" . It is not supporting large gnome size
Then I used falcon it run successfully but in assembly folder the all files are empty except preads.ovl
, 2-asm-falcon/run_falcon_asm.sh.log
Can you please suggest me tool for assembly for data having 52x coverage and predicted genome size is 500mb.
Thank you
Thank you for kind reply,
I reran falcon on whole data. It ran successfully, but the output is in kb
preads.db log
Config file
How many bases are in the
1-preads_ovl/preads4falcon.fasta
file? Two things stand out as being things to change, if thepreads4falcon.fasta
file does not have >15x of the expected genome size, then thelength_cutoff
andlength_cutoff_pr
parameters should be decreased, this will be dependent on your library quality and subread size. The second parameter that needs to be changed is the--min_cov
in theoverlap_filtering_setting
I would set it at 2.Hey thanks I got your point. Now I will rerun process with following parameter just tell me they are good to go.
length_cutoff
= 500length_cutoff_pr
= 2500--min_cov 20
after completing process with above parameter I will get back to you.
And one more thing If you have any good reading material regarding falcon parameter for diploid genome please let me know
Thank you
I tried with new parameter Now I got below error
1)
2)
500 and 2500 are too low for the cutoffs, this should be calculated as the sequence length for which ~30x of you expected genome size is covered.