Entering edit mode
6.0 years ago
zorrilla
•
0
Hi,
I am attempting to assemble the dataset from ERP002469 using megahit. The dataset consists of ~140 paired end fastq files, between 2-10 GB in size each, about 1 TB in total.
Using k list: 27,37,47,57,67,77,87,97,107,117, I am currently running the assembly on a 512 GB RAM node using 20 cores. It has been running for around 30 hours, and the last log entry is: Assembling contigs from SdBG for k = 37 ---
My questions:
- Do you have a rough idea of how long it will take for the entire assembly process to finish on a metagenomic dataset of such size?
- Do you have any additional assembly tips for my particular dataset, besides the ones presented here?
- Are there any pre-assembly steps that you would recommend? e.g. quality score filtering, will this result in a significant improvement in terms of computational time?
Thanks in advance!
No idea about runtimes, but it seems slow. Try different kmer sizes, I would expect the larger kmers to be better, i.e. give longer contigs.
One thing first - you have trimmed the dataset first, right (essential!).