Questions about assembly of large metagenomics dataset
0
0
Entering edit mode
6.0 years ago
zorrilla • 0

Hi,

I am attempting to assemble the dataset from ERP002469 using megahit. The dataset consists of ~140 paired end fastq files, between 2-10 GB in size each, about 1 TB in total.

Using k list: 27,37,47,57,67,77,87,97,107,117, I am currently running the assembly on a 512 GB RAM node using 20 cores. It has been running for around 30 hours, and the last log entry is: Assembling contigs from SdBG for k = 37 ---

My questions:

  • Do you have a rough idea of how long it will take for the entire assembly process to finish on a metagenomic dataset of such size?
  • Do you have any additional assembly tips for my particular dataset, besides the ones presented here?
  • Are there any pre-assembly steps that you would recommend? e.g. quality score filtering, will this result in a significant improvement in terms of computational time?

Thanks in advance!

assembly metagenomics megahit • 1.2k views
ADD COMMENT
0
Entering edit mode

No idea about runtimes, but it seems slow. Try different kmer sizes, I would expect the larger kmers to be better, i.e. give longer contigs.

One thing first - you have trimmed the dataset first, right (essential!).

ADD REPLY

Login before adding your answer.

Traffic: 2441 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6