Entering edit mode
24 months ago
O.rka
▴
740
On our new servers we have to request the amount of memory and time needed for a job. We are charged per thread per memory requirement for the time taken to complete the job (not the time requested). Anyways, I'm trying to minimize costs for a larger job.
I have a database that is 68G and 48170345 protein sequences (11GB gzipped, ~19GB uncompressed).
I can either do the following:
- Run Diamond against all of the proteins at once (I feel like this would be the most expensive)
- Split 100 files and run separately (each one is about 189MB)
Which method would use less resources?
How can I estimate how many resources would be required per job?