Hi guys,
This is an open question.
While dealing with microbiome metagenomics data by humann3, I found that it is quite time-consuming. With 40-core cups and 240 GB MEM, single sample analysis took 4 hours for nucleotide-searching only and 9 hours for translated-searching included and this is inacceptable for a institute server.
After some tests, I noticed that disk transmission speed is probably the choking point.
Part of the reason is that the Humann3 pipeline is a collection of separated tools. Humann3 pipeline takes input file, analyzes it, outputs/writes it on disk and then the second tool takes this output as input and then goes on. The consequence is that process waited long time for HDD reading and writing which slows down the whole analysis speed.
The first thing I was thinking is a m.2 SSD, but it is even slower because of administration paper work. So is there any better way by software or disk management for the pipeline speedup?
Thank in advance
thanks Raygozak,
I started to run humann3 with gnu parallel, and it is much faster
Using parallel, the program is able run the second(or third ... depends on -j parameter, your mem size and your number of cores) sample while huamnn3 is writing result of first sample on disk.