Question

Efficient way for Multiple Sample Mapping with STAR ?

0

Entering edit mode

3.3 years ago

rlatjsgns129 ▴ 20

Hello, everyone !

Studying STAR manual, I learned that multiple samples can be mapped at once with parameters.

For paired-end reads,

--readFilesIn sample1read1.fq,sample2read1.fq sample1read2.fq, sample2read2.fq

But I have done multiple mapping with "for" loop until now. That is, mapping have been done one by one.

Multiple samples mapping at once using parameter

vs

Multiple samples mapping one by one using "for" loop

if I use the same number of Threads, Which way is more efficient?

star • 1.8k views

ADD COMMENT • link 3.3 years ago by rlatjsgns129 ▴ 20

0

Entering edit mode

I think the STAR developer is best positioned to answer this, as he'd have run tests (most probably). In any case, I think a gain will be caused by the genome loaded in shared memory in the former use case, although that could be enforced in the latter case too.

The latter case, when modified to run one sample per node, allows for better parallelization. A loop is the least efficient way to do things IMO.

ADD REPLY • link 3.3 years ago by Ram 44k

score 3 · Accepted Answer · 2021-09-02

I would suggest avoiding calling this multiple mapping. Multi-mapping typically means something else entirely. I am making this point for those readers of the future that get here via a google search on multiple mapping :-)

This is a question on the advantages of listing files at once or separately. There is no "multiple mapping" here, every sample is mapped only once.

If you think about there will be a

fixed cost of starting the mapping
then you align N1 + N2 + N3 + ... reads with T threads

in both cases the work done in stage 2 is the same, by the end, you have mapped N1+N2+N3 reads with T threads, so that time won't change.

What will change is the fixed startup cost will be added in each loop. This may or may not be a substantial addition to the total runtime.

Long story short, listing all samples at once is probably more advantageous.