Genome Assembly Workflow assessment
0
0
Entering edit mode
8 weeks ago
Umer ▴ 130

Hello,

I have 12 fungal genomes sequences with both Nanopore and Illumina-PE-150bp

genome size= 60mb

For all 12 samples I generated genome assemblies using flye v2.9.4 assembler

  • Nanopre data NP1.fastQ -> flye v2.9.4 => NP1.fasta (example)

Then I polished genome assemblies with Racon v1.5.0 using Nanopore Raw fastQ files

  1. Mapped NP1.fastQ + NP1.fastA -> minimap2 => NP1_1.bam and used this bam file to polish NP1.fastA via Racon v1.5.0. => NP1.racon1.fastA
  2. Mapped NP1.fastQ + NP1.racon1.fastA -> minimap2 => NP1_2.bam and used this bam file to polish NP1.racon1.fastA via Racon v1.5.0. => NP1.racon2.fastA
  3. Mapped NP1.fastQ + NP1.racon2.fastA -> minimap2 => NP1_3.bam and used this bam file to polish NP1.racon2.fastA via Racon v1.5.0. => NP1.racon3.fastA

This way I now have genome assemblies for all 12 samples after 3 rounds of racon polishing.

Next I am running Polishing using Illumina short read data via tool Pilon v 1.24

  • Illimina data ILL1.fastQ + NP1.racon3.fastA -> bwa -mem = ILL1_1.bam
  • NP1.racon1.fastA + ILL1_1.bam -> pilon => NP1.pilon1.fastA (round1)

Problem

-------------

Pilon command:

java -Xmx220G -jar /vast/$USER/tools/miniconda3/envs/pilon/share/pilon-1.24-0/pilon.jar \
--genome ./flye_assembly/ONT_rawdata_flye_racon_r3/NP01/NP01_racon_polished_r3.fasta \
--bam ./flye_assembly/temp_alignment/NP01/NP01_ILL02_alignment.bam \
--output NP01_pilon_polished_r1 --outdir ./flye_assembly/ONT_rawdata_flye_racon_r3_pilon_r1/NP01 \
--fix all --changes --threads 90 2>&1 | tee ./flye_assembly/ONT_rawdata_flye_racon_r3_pilon_r1/NP01/NP01_pilon_polished_r1.log

Pilon takes a lot of time. Some samples out of 12 completed within a day but some are taking too long.

Questions:

---------

  1. Is my Racon Polishing setup correct?
  2. What can be done for polishing using Illumina data and Pilon to speed things
  3. Is their any alternative tool to pilon for polishing assembly? If yes which one is better to use?
flye genome-assembly pilon racon • 558 views
ADD COMMENT
2
Entering edit mode

Racon and Pilon are old programs. I suggest using NextPolish instead. It polishes using Illumina and Nanopore reads at the same time.

ADD REPLY
0
Entering edit mode

thank you for the suggestion. which is the prefered way of polishing with NextPolish ?

  1. Using both Short-read & Long-Reads togather for polishing or individually, ? (short-reads polishing then long-read Polishing)
  2. Do I need to run multiple rounds ?
  3. For polishing, THe fastQ files should be QC trimmed or I can use un trimmed RAW fastQ files ?
ADD REPLY
2
Entering edit mode

Also, the authors of NextPolish made a genome assembler NextDenovo. In my experience, it usually makes better assemblies of eukaryotic genomes than Flye. Maybe, you should give it a try.

ADD REPLY
0
Entering edit mode

I will Give It a try. Thank You for information.

ADD REPLY
1
Entering edit mode

You should give short and long reads to NextPolish at the same time. NextPolish performs several iterations of polishing. When you use the option "task = best", NextPolish does two iterations of polishing by long reads and then two iterations of polishing by short reads.
It's better to use reads after adapter trimming. I doubt that trimming by quality will be benefitial for genome polishing, because even low-quality bases carry useful information.

ADD REPLY
0
Entering edit mode

Regarding the performance: I haven't used Pilon, but it seems with 90 threads it should be running at a pretty good speed. Though it really depends on how the application is programmed, as in certain cases higher number of threads can lead to a decrease in performance due various overheads. If the program creates a lot of intermediary files it could also be that the main issue could be I/O related as reading and writing to disk is relatively slow.

ADD REPLY
0
Entering edit mode

Actually pilon-v1.24 totally ignores the --threads option now. forgot to remove it fromt he code.

ADD REPLY

Login before adding your answer.

Traffic: 1850 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6