STAR aligner pauses frequently during mapping
0
0
Entering edit mode
9.5 years ago
E Chen • 0

Hi all,

I've been using STAR to align RNA-Seq reads to the C elegans reference genome. I know that STAR should be among the fastest aligners currently available, but for the samples which I currently have, I noticed that it pauses very frequently for hours at a time, which means that some of my samples (~40M reads), take days to align (whereas previous files of similar size would max take a few hours).

The command I use is:

STAR --genomeDir $GENOME --readFilesIn ../$FASTQ1 ../$FASTQ2 --runThreadN 28 --outFileNamePrefix $NAME. --outReadsUnmapped Fastx

An excerpt from my $NAME.Log.progress file shows the problem:

           Time    Speed        Read     Read   Mapped   Mapped   Mapped   Mapped Unmapped Unmapped Unmapped Unmapped
                    M/hr      number   length   unique   length   MMrate    multi   multi+       MM    short    other
May 06 19:01:56      0.0      124845      191     0.0%    144.0     6.2%     0.4%     0.0%     0.0%    99.5%     0.1%
May 06 19:02:58      0.1      373374      192     0.0%    168.0     5.6%     0.4%     0.0%     0.0%    99.5%     0.1%
May 06 19:04:46      0.1      621247      191     0.0%    164.0     5.7%     0.4%     0.0%     0.0%    99.5%     0.1%
May 06 19:05:51      0.2     1241791      191     0.0%    163.4     5.8%     0.4%     0.0%     0.0%    99.6%     0.1%
May 06 19:07:36      0.3     1613995      191     0.0%    165.5     5.7%     0.4%     0.0%     0.0%    99.6%     0.1%
May 06 19:08:40      0.4     1985785      191     0.0%    166.6     5.6%     0.4%     0.0%     0.0%    99.5%     0.1%
May 06 19:09:45      0.4     2357722      191     0.0%    167.2     5.6%     0.4%     0.0%     0.0%    99.5%     0.1%
May 06 19:10:53      0.5     2978209      191     0.0%    166.6     5.5%     0.4%     0.0%     0.0%    99.6%     0.1%
May 06 19:11:55      0.6     3102639      191     0.0%    166.6     5.5%     0.4%     0.0%     0.0%    99.6%     0.1%
May 06 19:13:36      0.6     3350787      191     0.0%    166.7     5.5%     0.4%     0.0%     0.0%    99.6%     0.1%
May 06 19:15:22      0.6     3474752      191     0.0%    166.3     5.5%     0.4%     0.0%     0.0%    99.6%     0.1%
May 07 00:10:36      0.3     3598492      191     0.0%    166.8     5.5%     0.4%     0.0%     0.0%    99.6%     0.1% ## pause
May 07 00:13:33      0.4     3722708      191     0.0%    167.0     5.5%     0.4%     0.0%     0.0%    99.6%     0.1%
May 07 00:15:14      0.4     4094620      191     0.0%    165.0     5.5%     0.4%     0.0%     0.0%    99.6%     0.1%
May 07 00:16:17      0.4     4466826      191     0.0%    165.8     5.5%     0.4%     0.0%     0.0%    99.6%     0.1%
May 07 00:17:47      0.5     5335039      191     0.0%    167.2     5.5%     0.4%     0.0%     0.0%    99.6%     0.1%
May 07 00:19:17      0.5     5583020      191     0.0%    167.2     5.5%     0.4%     0.0%     0.0%    99.6%     0.1%
May 07 00:20:45      0.6     5954523      191     0.0%    167.5     5.4%     0.4%     0.0%     0.0%    99.6%     0.1%
May 07 00:21:59      0.6     6326494      191     0.0%    168.3     5.4%     0.4%     0.0%     0.0%    99.6%     0.1%
May 07 00:23:18      0.6     6574471      191     0.0%    168.5     5.4%     0.4%     0.0%     0.0%    99.6%     0.1%
May 07 00:24:34      0.6     6698384      191     0.0%    168.5     5.4%     0.4%     0.0%     0.0%    99.6%     0.1%
May 07 00:25:52      0.7     6946035      191     0.0%    168.8     5.4%     0.4%     0.0%     0.0%    99.6%     0.1%
May 07 04:51:23      0.5     7069937      191     0.0%    168.9     5.4%     0.4%     0.0%     0.0%    99.6%     0.1% ## pause
May 07 04:53:36      0.5     7317786      191     0.0%    168.9     5.4%     0.4%     0.0%     0.0%    99.6%     0.1%
May 07 04:55:20      0.5     7441781      191     0.0%    167.9     5.4%     0.4%     0.0%     0.0%    99.6%     0.1%
May 07 04:56:50      0.5     7813786      191     0.0%    168.1     5.4%     0.4%     0.0%     0.0%    99.6%     0.1%
May 07 04:58:02      0.5     8185726      191     0.0%    167.9     5.4%     0.4%     0.0%     0.0%    99.6%     0.1%
May 07 04:59:21      0.6     8557667      191     0.0%    167.4     5.4%     0.4%     0.0%     0.0%    99.6%     0.1%
May 07 05:01:04      0.6     8805745      191     0.0%    167.8     5.4%     0.4%     0.0%     0.0%    99.6%     0.1%
May 07 05:03:13      0.6     9301645      191     0.0%    168.3     5.3%     0.4%     0.0%     0.0%    99.6%     0.1%

Any suggestions as for why there are these frequent pauses, and how I can fix this would be much appreciated!

Thanks!

rna-seq STAR • 3.6k views
ADD COMMENT
2
Entering edit mode

I've used STAR for some time and have never seen that, so no comment on that front. However, I've recently switched from STAR to HISAT, which is faster again, and runs with a fraction of the memory footprint, so I can run more concurrent alignment processes. Might be worth a look?

ADD REPLY
0
Entering edit mode

Thanks for the suggestion - I'll give it a go, if STAR keeps on misbehaving!

ADD REPLY
0
Entering edit mode

We use OSA for processing, which has a lower memory footprint (http://www.omicsoft.com/osa), and in some cases is as fast, if not faster. It's free for academic usage, but requires you to install the mono framework to get it to run on Linux (it also runs on Windows).

ADD REPLY
1
Entering edit mode

Just guessing, but perhaps resource contention from other processes on the system is coming into play? Have you tracked memory use, I/O, and load during the STAR runs?

ADD REPLY
0
Entering edit mode

Thanks for the suggestion! I don't think that memory usage should be a problem though, as I run my alignment processes on a local cluster, which should allocate sufficient memory to it.

E.g. my summary file for the run that I quoted above is:

Resource usage summary:

    CPU time :               43185.00 sec.
    Max Memory :             6718 MB
    Average Memory :         6675.14 MB
    Total Requested Memory : 10000.00 MB
    Delta Memory :           3282.00 MB
    (Delta: the difference between total requested memory and actual max usage.)
    Max Swap :               10182 MB

    Max Processes :          4
    Max Threads :            32

Otherwise, I'm not really sure how I can track the memory usage during the run (sorry, I'm still very new to Bioinformatics!)

EDIT: another thing to add, is that when I re-run the alignment process using the same file, the pauses actually occur at the same read, so that seems to suggest it wouldn't be a memory allocation problem. Though when I excise the problematic reads and put them in a separate file, the pauses are non-replicable, so it doesn't seem to be a problem with specific reads either..

ADD REPLY
2
Entering edit mode

I'd suggest writing to the STAR google group for support, then.

ADD REPLY
0
Entering edit mode

Good suggestion - I've done that now!

ADD REPLY

Login before adding your answer.

Traffic: 2505 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6