Hi everybody!
I'm having a problem in the consensus stage of the DBG2OLC pipeline. I'm using the script "split_and_run_sparc.sh" to obtain the "final_assembly.fasta" file from my backbone file (backbone_raw.fasta) and my reads (ctg_pb.fasta). I ran the script using the following command:
sh ./split_and_run_sparc.sh backbone_raw.fasta DBG2OLC_Consensus_info.txt ctg_pb.fasta /tmp/consensus_dir 2 >cns_log.txt
While running the script, an error messages appeared:
Traceback (most recent call last): File "./split_reads_by_backbone.py", line 131, in <module> File "./split_reads_by_backbone.py", line 122, in main IOError: [Errno 24] Too many open files: '/tmp/consensus_dir/backbone-1627.reads.fasta'
After the analysis, I observed some inconsistencies between the "backbone_raw.fasta" file and the "final_assembly.fasta" file:
---------------- Information for assembly 'backbone_raw.fasta' ----------------
Number of contigs 1906
Number of contigs in scaffolds 0
Number of contigs not in scaffolds 1906
Total size of contigs 252974640
Longest contig 2502428
Shortest contig 4957
Number of contigs > 1K nt 1906 100.0%
Number of contigs > 10K nt 1872 98.2%
Number of contigs > 100K nt 512 26.9%
Number of contigs > 1M nt 31 1.6%
Number of contigs > 10M nt 0 0.0%
Mean contig size 132725
Median contig size 35400
N50 contig length 449759
L50 contig count 147
---------------- Information for assembly 'final_assembly.fasta' ----------------
Number of contigs 1020
Number of contigs in scaffolds 0
Number of contigs not in scaffolds 1020
Total size of contigs 223116219
Longest contig 2502428
Shortest contig 83
Number of contigs > 1K nt 1018 99.8%
Number of contigs > 10K nt 1009 98.9%
Number of contigs > 100K nt 470 46.1%
Number of contigs > 1M nt 31 3.0%
Number of contigs > 10M nt 0 0.0%
Mean contig size 218741
Median contig size 82745
N50 contig length 548456
L50 contig count 117
The main inconsistencies between both files is that:
- The number of contigs almost halved
- The total size of the assembled genome is reduced (since I have 886 less contigs)
- Some contigs became smaller (as observed in the "Shortest contig" section)
- N50, mean and median contig sizes inflated (as a by-product of losing contigs)
Does anyone know if the inconsistencies observed between both files is determined by the error message that appeared while the script was running? Or is this the normal output one should expect after running the consensus stage of the pipeline?
P.D.: I could not run the command "ulimit -n unlimited" before running the script, since I don't have root privileges in the cluster I'm working on. Not sure if this explains the inconsistencies or the error message.
I'll try it out.
Thank you very much!
I am also having issues with the consensus stage of dbg2olc, but in my case the "final_assembly.fasta" that is generated is empty, even though there is no error message.
So I would like to try your suggestion and run Racon with the "backbone_raw.fasta" assembly from dbg2olc. However, I don't know which file to use as the "overlap/alignment" input file, which is necessary for Racon ("Racon takes as input only three files: contigs in FASTA/FASTQ format, reads in FASTA/FASTQ format and overlaps/alignments between the reads and the contigs in MHAP/PAF/SAM format"). The manual of dbg2olc is not very clear, and I'm not sure if such a file is actually generated during the assembly. Would you remember which file you used in your case or if you have to generate an overlap/alignment file with a different software?