Entering edit mode
8.5 years ago
debitboro
▴
270
Hi Everyone,
I have submitted a script of Mapping to a cluster of 896 cores (32 nodes x 16 cores + 32 nodes x 12 cores), Frequency(core) = 2.66 GHz. For my job, I used 100 G of RAM. I want to map PE RNASeq reads (two files R1 and R2, and each file contains 40M reads) to the hg_GRCh38 from Ensembl using Tophat2. I have launched the script since 5 days, and it still running. Is it a normal situation ?
That's a rather long time for 40M reads. Look and see if any of the files are still being updated and also in the log directory to see what it's actually doing now.
Hi Devon,
This is the content of the log file:
I suspect it's using a single thread. That'd explain why each step is taking forever.
I used the following command with 24 threads:
But I set the #SBATCH --ntasks=1 (since I used slurm to submit my job)
Right, so you told tophat2 to use more threads than cores and then told slurm that it's only using a single thread (you'll need
#SBATCH -c 24
, though that won't work since you don't have nodes with that many cores).It's likely that either slurm is only allowing a single thread to be used or something else is also running on that node and using most of the resources.
Did you add
#SBATCH -N 1
(from what I can see on web) to keep all threads on a single physical server (since you have a max of 16 cores, I would use a max of 16 threads if your cluster allows you to reserve a whole node) for SLURM? Having these threads spread across physical nodes is going to lead to strangeness like this.2x40M reads should not take more than a day with up to16 cores.Having the threads split across nodes won't even work :)
You should use STAR, you would be surprised that in 25min you would finish it.