Entering edit mode
8 weeks ago
Umer
▴
130
Hello,
I have 12 fungal genomes sequences with both Nanopore and Illumina-PE-150bp
genome size= 60mb
For all 12 samples I generated genome assemblies using flye v2.9.4
assembler
- Nanopre data
NP1.fastQ
->flye v2.9.4
=>NP1.fasta
(example)
Then I polished genome assemblies with Racon v1.5.0
using Nanopore Raw fastQ files
- Mapped
NP1.fastQ
+NP1.fastA
->minimap2
=>NP1_1.bam
and used this bam file to polishNP1.fastA
viaRacon v1.5.0
. =>NP1.racon1.fastA
- Mapped
NP1.fastQ
+NP1.racon1.fastA
->minimap2
=>NP1_2.bam
and used this bam file to polish NP1.racon1.fastA viaRacon v1.5.0
. =>NP1.racon2.fastA
- Mapped
NP1.fastQ
+NP1.racon2.fastA
->minimap2
=>NP1_3.bam
and used this bam file to polish NP1.racon2.fastA viaRacon v1.5.0
. =>NP1.racon3.fastA
This way I now have genome assemblies for all 12 samples after 3 rounds of racon polishing.
Next I am running Polishing using Illumina short read data via tool Pilon v 1.24
- Illimina data
ILL1.fastQ
+NP1.racon3.fastA
->bwa -mem
=ILL1_1.bam
- NP1.racon1.fastA + ILL1_1.bam ->
pilon
=> NP1.pilon1.fastA (round1)
Problem
-------------
Pilon command:
java -Xmx220G -jar /vast/$USER/tools/miniconda3/envs/pilon/share/pilon-1.24-0/pilon.jar \
--genome ./flye_assembly/ONT_rawdata_flye_racon_r3/NP01/NP01_racon_polished_r3.fasta \
--bam ./flye_assembly/temp_alignment/NP01/NP01_ILL02_alignment.bam \
--output NP01_pilon_polished_r1 --outdir ./flye_assembly/ONT_rawdata_flye_racon_r3_pilon_r1/NP01 \
--fix all --changes --threads 90 2>&1 | tee ./flye_assembly/ONT_rawdata_flye_racon_r3_pilon_r1/NP01/NP01_pilon_polished_r1.log
Pilon takes a lot of time. Some samples out of 12 completed within a day but some are taking too long.
Questions:
---------
- Is my Racon Polishing setup correct?
- What can be done for polishing using Illumina data and Pilon to speed things
- Is their any alternative tool to pilon for polishing assembly? If yes which one is better to use?
Racon and Pilon are old programs. I suggest using NextPolish instead. It polishes using Illumina and Nanopore reads at the same time.
thank you for the suggestion. which is the prefered way of polishing with NextPolish ?
Also, the authors of NextPolish made a genome assembler NextDenovo. In my experience, it usually makes better assemblies of eukaryotic genomes than Flye. Maybe, you should give it a try.
I will Give It a try. Thank You for information.
You should give short and long reads to NextPolish at the same time. NextPolish performs several iterations of polishing. When you use the option "task = best", NextPolish does two iterations of polishing by long reads and then two iterations of polishing by short reads.
It's better to use reads after adapter trimming. I doubt that trimming by quality will be benefitial for genome polishing, because even low-quality bases carry useful information.
Regarding the performance: I haven't used
Pilon
, but it seems with 90 threads it should be running at a pretty good speed. Though it really depends on how the application is programmed, as in certain cases higher number of threads can lead to a decrease in performance due various overheads. If the program creates a lot of intermediary files it could also be that the main issue could be I/O related as reading and writing to disk is relatively slow.Actually pilon-v1.24 totally ignores the
--threads
option now. forgot to remove it fromt he code.