Question

STAR-RSEM RNA-seq pipeline is deterministic?

1

Entering edit mode

8.1 years ago

a.e.c ▴ 40

Dear all, Do you know if STAR/RSEM are deterministic or do they have some random component to the mapping and/or quantification? Thanks,

RNA-Seq STAR RSEM • 6.4k views

ADD COMMENT • link updated 8.1 years ago by datascientist28 ▴ 570 • written 8.1 years ago by a.e.c ▴ 40

score 1 · Answer 1 · 2016-10-17

1

Entering edit mode

8.1 years ago

datascientist28 ▴ 570

I'm not sure about RSEM but STAR is deterministic. Here is an answer from Alex (developer of star) from the past. If I recall, RSEM is a transcript identifier, if you haven't upgraded STAR recently, I'd look into --quantMode in STAR, it'll save you time, and is an in program identifier.

ADD COMMENT • link 8.1 years ago by datascientist28 ▴ 570

3

Entering edit mode

RSEM does not identify transcripts, it quantifies transcript abundance and accounts for multimapping between transcript isoforms (and between genes sharing similar sequence). As far as I understand, --quantMode simply counts the number of reads hitting a genomic locus, and so will not quantify transcript abundance like RSEM (and it does not have a way of properly assigning reads that map to multiple genomic loci). The --quantMode flag in STAR does something much more akin to e.g. HTSeq or featureCounts; those programs aren't appropriate for transcript level quantification.

ADD REPLY • link 8.1 years ago by Rob 6.9k

0

Entering edit mode

Ah you are correct sir. I've never actually used RSEM. That being said my 'STAR is deterministic' statement still holds true.

ADD REPLY • link 8.1 years ago by datascientist28 ▴ 570

0

Entering edit mode

No worries. Indeed, this has no bearing on your statements about STAR ;P.

ADD REPLY • link 8.1 years ago by Rob 6.9k

0

Entering edit mode

But what about this from the release notes of STAR version 2.5.0a:

By default, the order of the multi-mapping alignments for each read is not truly random. The —outMultimapperOrder Random option outputs multiple alignments for each read in random order, and also also randomizes the choice of the primary alignment from the highest scoring alignments. Parameter —runRNGseed can be used to set the random generator seed. With this option, the ordering of multi-mapping alignments of each read, and the choice of the primary alignment will vary from run to run, unless only one thread is used and the seed is kept constant.

ADD REPLY • link 8.1 years ago by a.e.c ▴ 40

0

Entering edit mode

Oh interesting. I'm curious now too, Alex is pretty good at responding quickly so I posted this question on the RNA-STAR forum and hopefully he'll respond soon if you want to follow

ADD REPLY • link 8.1 years ago by datascientist28 ▴ 570

0

Entering edit mode

RSEM has a setting "--sort-bam-by-read-name" which according to the authors will provide deterministic "maximum-likelihood" estimates from independent runs. It is disabled by default.

http://deweylab.biostat.wisc.edu/rsem/rsem-calculate-expression.html

ADD REPLY • link 5.1 years ago by adam.faranda ▴ 110

score 0 · Answer 2 · 2016-10-19

Looks like OP saw it but for future people going to this thread, here is developer of STAR's answer:

" STAR alignments are always deterministic, however, the order in which they are output, as well some flags/attrbiutes (primary flag, HI) are not always deterministic.

With default parameters, and running on one thread, STAR is fully deterministic, down to the order of output alignments. With multiple threads, the order of the reads in the output is not deterministic, as STAR reads/maps/writes reads in chunks. However, the alignments for each read are completely deterministic.

With --outMultimapperOrder Random, the order of alignments for each read is no longer deterministics, which also affects primary flag and HI attribute. "