I have 160 Gb R1 reads and 160 Gb R2 reads from Small Insert and Long insert Illumina sequencing 151*2. I tried different assembler for assembly of the medicinal plant genome.
First i Tired Abyss , second Platanus and last Soap. but non of them are able to give the desired result.
Anyone has some idea how to go with this data. IDBA i have not tried since it takes lot of time and good for 500 mb genome.
I have around 1.2 TB space and 750 Gb RAM size.
Please help me in this regard.
what is your desired result? did you tried any gab closing software after assembly? what is the average coverage?
1.5 Gb is what my genome assembly result should be after finshing.
Assembly is not satisfactory so no point of going further ahead and doing gap closing.
With SOAP I am getting 650 mb scaffold sequence (656,762,102 bp) and with a N50 value of 191 and Total Number of Non-ATGC Characters : 19,888,112 The Average Coverage is 100X
coverage? long insert you mean jumping libraries mate pairs?
coverage of ~100X . No it is not matepair.
you have a very good coverage, But also as you can see from other answers existence of mate pairs play crucial rule in detection of structure variations, rearrangement and also form contigs
Did you tried any other tools from the suggested in answers?
I am assuming (for obvious reasons) that by "desired results", you mean the no. of scaffolds and the average scaffold size, gaps (may be!). Do you have any idea about the genome of "the" medicinal plant? Else, how are you comparing/judging your assembly?
You may try using Minia or All-paths-LG.
Yes correct the number of scaffolds is too many and the N50 value is very less across different assembly tool i used. The genome we are expecting is 1.5 Gb.
You realize that having plenty of sequence data (if that is what you are basing your question on) is not a guarantee that you will get a successful/useful assembly. Plant genomes are notorious to work with due to ploidy issues etc (do you expect a simple diploid genome?) so this result may not be unexpected.
Trying the All-paths-LG but its giving error.
I have Given ploidy as 1.
Is it necessary to have a mate pair library.
Genome size determination was done using flow cytometry.
From Read level, KmerGenie was ran and below is the results
Predicted best k: 101
Predicted assembly size: 1625708994 bp
Hi, First you should try to know how much percent of your genome is repeat and whether your sample is inbread line. And there are some parameters will affect the result of assembly, you should try different parameters. Also the pre-process,like trimed reads, remove contamination and error correct will also affect the result. You can try Allpaths-LG or Masurca,they usually give good result, but they probably need more space than 1.2TB. You can also try Platanus with parameter u 1.0 and scaffolding with SSPACE, and then gap-closer. The scaffold and gap-closer you can run multiple literates.