Entering edit mode
9.5 years ago
E Chen
•
0
Hi all,
I've been using STAR to align RNA-Seq reads to the C elegans reference genome. I know that STAR should be among the fastest aligners currently available, but for the samples which I currently have, I noticed that it pauses very frequently for hours at a time, which means that some of my samples (~40M reads), take days to align (whereas previous files of similar size would max take a few hours).
The command I use is:
STAR --genomeDir $GENOME --readFilesIn ../$FASTQ1 ../$FASTQ2 --runThreadN 28 --outFileNamePrefix $NAME. --outReadsUnmapped Fastx
An excerpt from my $NAME.Log.progress
file shows the problem:
Time Speed Read Read Mapped Mapped Mapped Mapped Unmapped Unmapped Unmapped Unmapped
M/hr number length unique length MMrate multi multi+ MM short other
May 06 19:01:56 0.0 124845 191 0.0% 144.0 6.2% 0.4% 0.0% 0.0% 99.5% 0.1%
May 06 19:02:58 0.1 373374 192 0.0% 168.0 5.6% 0.4% 0.0% 0.0% 99.5% 0.1%
May 06 19:04:46 0.1 621247 191 0.0% 164.0 5.7% 0.4% 0.0% 0.0% 99.5% 0.1%
May 06 19:05:51 0.2 1241791 191 0.0% 163.4 5.8% 0.4% 0.0% 0.0% 99.6% 0.1%
May 06 19:07:36 0.3 1613995 191 0.0% 165.5 5.7% 0.4% 0.0% 0.0% 99.6% 0.1%
May 06 19:08:40 0.4 1985785 191 0.0% 166.6 5.6% 0.4% 0.0% 0.0% 99.5% 0.1%
May 06 19:09:45 0.4 2357722 191 0.0% 167.2 5.6% 0.4% 0.0% 0.0% 99.5% 0.1%
May 06 19:10:53 0.5 2978209 191 0.0% 166.6 5.5% 0.4% 0.0% 0.0% 99.6% 0.1%
May 06 19:11:55 0.6 3102639 191 0.0% 166.6 5.5% 0.4% 0.0% 0.0% 99.6% 0.1%
May 06 19:13:36 0.6 3350787 191 0.0% 166.7 5.5% 0.4% 0.0% 0.0% 99.6% 0.1%
May 06 19:15:22 0.6 3474752 191 0.0% 166.3 5.5% 0.4% 0.0% 0.0% 99.6% 0.1%
May 07 00:10:36 0.3 3598492 191 0.0% 166.8 5.5% 0.4% 0.0% 0.0% 99.6% 0.1% ## pause
May 07 00:13:33 0.4 3722708 191 0.0% 167.0 5.5% 0.4% 0.0% 0.0% 99.6% 0.1%
May 07 00:15:14 0.4 4094620 191 0.0% 165.0 5.5% 0.4% 0.0% 0.0% 99.6% 0.1%
May 07 00:16:17 0.4 4466826 191 0.0% 165.8 5.5% 0.4% 0.0% 0.0% 99.6% 0.1%
May 07 00:17:47 0.5 5335039 191 0.0% 167.2 5.5% 0.4% 0.0% 0.0% 99.6% 0.1%
May 07 00:19:17 0.5 5583020 191 0.0% 167.2 5.5% 0.4% 0.0% 0.0% 99.6% 0.1%
May 07 00:20:45 0.6 5954523 191 0.0% 167.5 5.4% 0.4% 0.0% 0.0% 99.6% 0.1%
May 07 00:21:59 0.6 6326494 191 0.0% 168.3 5.4% 0.4% 0.0% 0.0% 99.6% 0.1%
May 07 00:23:18 0.6 6574471 191 0.0% 168.5 5.4% 0.4% 0.0% 0.0% 99.6% 0.1%
May 07 00:24:34 0.6 6698384 191 0.0% 168.5 5.4% 0.4% 0.0% 0.0% 99.6% 0.1%
May 07 00:25:52 0.7 6946035 191 0.0% 168.8 5.4% 0.4% 0.0% 0.0% 99.6% 0.1%
May 07 04:51:23 0.5 7069937 191 0.0% 168.9 5.4% 0.4% 0.0% 0.0% 99.6% 0.1% ## pause
May 07 04:53:36 0.5 7317786 191 0.0% 168.9 5.4% 0.4% 0.0% 0.0% 99.6% 0.1%
May 07 04:55:20 0.5 7441781 191 0.0% 167.9 5.4% 0.4% 0.0% 0.0% 99.6% 0.1%
May 07 04:56:50 0.5 7813786 191 0.0% 168.1 5.4% 0.4% 0.0% 0.0% 99.6% 0.1%
May 07 04:58:02 0.5 8185726 191 0.0% 167.9 5.4% 0.4% 0.0% 0.0% 99.6% 0.1%
May 07 04:59:21 0.6 8557667 191 0.0% 167.4 5.4% 0.4% 0.0% 0.0% 99.6% 0.1%
May 07 05:01:04 0.6 8805745 191 0.0% 167.8 5.4% 0.4% 0.0% 0.0% 99.6% 0.1%
May 07 05:03:13 0.6 9301645 191 0.0% 168.3 5.3% 0.4% 0.0% 0.0% 99.6% 0.1%
Any suggestions as for why there are these frequent pauses, and how I can fix this would be much appreciated!
Thanks!
I've used STAR for some time and have never seen that, so no comment on that front. However, I've recently switched from STAR to HISAT, which is faster again, and runs with a fraction of the memory footprint, so I can run more concurrent alignment processes. Might be worth a look?
Thanks for the suggestion - I'll give it a go, if STAR keeps on misbehaving!
We use OSA for processing, which has a lower memory footprint (http://www.omicsoft.com/osa), and in some cases is as fast, if not faster. It's free for academic usage, but requires you to install the mono framework to get it to run on Linux (it also runs on Windows).
Just guessing, but perhaps resource contention from other processes on the system is coming into play? Have you tracked memory use, I/O, and load during the STAR runs?
Thanks for the suggestion! I don't think that memory usage should be a problem though, as I run my alignment processes on a local cluster, which should allocate sufficient memory to it.
E.g. my summary file for the run that I quoted above is:
Otherwise, I'm not really sure how I can track the memory usage during the run (sorry, I'm still very new to Bioinformatics!)
EDIT: another thing to add, is that when I re-run the alignment process using the same file, the pauses actually occur at the same read, so that seems to suggest it wouldn't be a memory allocation problem. Though when I excise the problematic reads and put them in a separate file, the pauses are non-replicable, so it doesn't seem to be a problem with specific reads either..
I'd suggest writing to the STAR google group for support, then.
Good suggestion - I've done that now!