I have several CPU nodes connected to the same network mounted directory. I want to parallelize the abyss-map step after abyss generates unitigs.
When I run abyss-map on my reads, it outputs a message saying it is generating a suffix array for the mapping. Does it store that on disk? Or does it just store it in memory for the mapping?
If I parallel run several abyss-map in the same directory, will the suffix array generated overwrite the suffix array generated by other abyss-map runs?
abyss-map builds the indexes in memory by default, so there should be no conflict about running multiple instances on the same file in parallel.
However, you can also use abyss-index to pre-build the indexes (.fai and .fm files). Any subsequent abyss-map runs will re-use those index files (similar to how it works with BWA), which allows you to run abyss-map more quickly and with a smaller memory requirement. (The initial indexing step requires an amount of RAM that is about 10 times the size of the input FASTA file.)
ADD COMMENT
• link
updated 5.1 years ago by
Ram
44k
•
written 9.1 years ago by
benv
▴
730
0
Entering edit mode
Hi Ben,
I had a similar setup as Damian in mind so your comment greatly helped me too. Indeed pre-building the indexes somewhat reduces runtime for the abyss-map step (I have to align 50+ libraries so in the end that saved me quite some time).
Related: I'm actually wondering if there is any advantage in building the index in memory for every library over and over? I'm just curious why the 'creating the index once and store it on disk' is not the default behaviour, especially when you run an assembly with several different input files/libraries?
The only downside I can see it that it would cause abyss-pe runs with the -j option (multithreading option for GNU make) to fail, because parallel processes would try to write the index file at the same time.
Hi Ben,
I had a similar setup as Damian in mind so your comment greatly helped me too. Indeed pre-building the indexes somewhat reduces runtime for the abyss-map step (I have to align 50+ libraries so in the end that saved me quite some time). Related: I'm actually wondering if there is any advantage in building the index in memory for every library over and over? I'm just curious why the 'creating the index once and store it on disk' is not the default behaviour, especially when you run an assembly with several different input files/libraries?
Fair enough. I've filed an issue here: https://github.com/bcgsc/abyss/issues/107
The only downside I can see it that it would cause abyss-pe runs with the -j option (multithreading option for GNU make) to fail, because parallel processes would try to write the index file at the same time.