Hi everyone I was wondering what kind of tool you use to assemble your genome.? I already tried IDBA_ud but to make it works I have to reduce de number of read.
Then I tried ALLPATH-LG but it is not suitable for my data, I only have paired-end read R1 and R2 in fastaq format.
I also tried Masurca but there is a problem in the process for a lot of people and the devellopers seem to do not answer about this issue.
So, maybe you know a good programm to do it with my data?
The 2 fasta files are comming from an illumina Hiseq 3000 150bp and the genome size of my specie is around 1.5 GB. Any clue would be kind.
Have a nice day
How did you reduce the number of reads? Did you just down-sampled, or you performed digital normalization? And did you get an assembly or not? I will be honest with you, with a 1.5Gb genome and only paired-end Illumina sequencing, you won't get a good assembly anyway, no matter what you do.
You have been having problems with your assembly for some time, so lets take a step back and check for some potential problems:
1) How is the quality of you sequencing?
2) Are you trimming adapters?
3) What is the expected coverage?
4) Did you check for contaminants (bacterial / human /whatever other species) on your reads?
I would add to this: have you checked the overall kmer abundance distribution, to see whether the sample is heterozygous, coverage is as expected, etc? Jellyfish + GenomeScope are useful here. Similarly preQC worked well for us in identifying problematic assemblies, which can do a bit of the above but compares your results to other known assemblies of varying complexity.
But I completely agree w/ h.mon, a 1.3Gb genome requires much more than simply paired-end data. You need contextual information to work around repetitive regions or problematic areas (high quality mate-pairs, long reads, 10x, HiC, etc). Every large-scale assembly (and the strategy used) is different, but your original paired-end can at least give you some hints as to how complex it may get and maybe the best options to improve it.
In case you appreciate a more pragmatic outlook on the expected results, compare the genome assemblies of the Chinese Hamster Ovary cell lines ~2.5 Gbp genome:
The >100k contig assembly used many different library sizes and was nevertheless difficult to work with.