Question

STAR aligner - Is this still maintained?

1

Entering edit mode

3 months ago

Joshi ▴ 10

I (my lab) has been using STAR aligner for all RNASeq alignment so far. It hasn't been updated since Jan 2024 and there appear to be a number of spam-y issues logged against it in Github. At this point, it appears to be unmaintained.

For those dealing with RNASeq alignment longer than I have (5/6 years), would you recommend

1) Locally fork'ing STAR and trying to maintain it (although I or my team don't have experience in C/C++, so this is a seriously uphill climb)

2) Hoping STAR continues being maintained - and trying to contact Alex Dobin (the original author)

3) Switching to another open source aligner (but the same issue around maintainability will arise later if not now). Any recommendations?

4) Switching to a commercial RNA Seq aligner e.g. Illumina's DRAGEN

Would be great to hear from the community.

STAR aligner rnaseq • 3.3k views

ADD COMMENT • link updated 6 weeks ago by GenoMax 151k • written 3 months ago by Joshi ▴ 10

1

Entering edit mode

Every software package reaches a state where it does what most (dare I say 90+%) people want it to do in a stable fashion. So not seeing new updates/releases for some time (e.g. also with bwa/minimap2, even @Rob's salmon) should not be your sole consideration for stopping to use the software.

If you are seeing genuine bugs piling up in "issues" that are being left unaddressed and they are affecting critical functionality/need then that can then be a valid reason to consider switching. But that is where people like @Rob have our back.

It is unfortunate that Alex has not dealt with spam in "issues" section. I can see how the optics would lead one to believe that the software has been abandoned. Perhaps someone here who knows him personally could get him to clean that up.

Edit (Apr 2025): Issues section has been cleaned up in the last 10 weeks since this comment was written.

ADD REPLY • link 6 weeks ago by GenoMax 151k

2

Entering edit mode

The spam only started about 2-3 weeks ago (during the holidays) by automated spam bots. I think STAR will be ok haha and, yeah, I don't think there are any urgent "fixes" that STAR needs right now anyway (still works like a charm on all my data). Personally, I'm unconcerned.

ADD REPLY • link 3 months ago by dsull ★ 7.5k

2

Entering edit mode

Exactly my thought: bwa has not had another release for six and a half years until April 2024, and that was mostly to add ARM64 support and fix a compiler issue with GCC10.

ADD REPLY • link 3 months ago by Matthias Zepper 5.1k

score 11 · Answer 1 · 2025-01-18

11

Entering edit mode

3 months ago

Rob 7.1k

I think the maintenance frequency has decreased as Alex has moved on to a new job. However, I an my team are working to make several contributions. Hopefully Alex will be available and open to upstream them, otherwise a fork is inevitable. However, among the options you pose, the only one I would strongly discourage is option 4. For tools like aligners, commercial offerings lack methodological transparency and can change under your nose with no ability to trace the changes. I would argue these things have no real place in science in particular, where methodological transparency is paramount and where plenty of good open source tools exist.

ADD COMMENT • link 3 months ago by Rob 7.1k

2

Entering edit mode

I'm with Rob on that option 4 :)

Moreover I would , to some extend, also vote against option 1, mainly given your lack in C/C++ and consequently the code of STAR.... I'm not saying it's a bad choice but it will take you much time and effort to get to a point to be able to maintain the STAR code in house, time and effort perhaps not worth the effort then.

ADD REPLY • link 3 months ago by lieven.sterck 15k

2

Entering edit mode

Definitely not worth the effort unless there is a critical bug that concerns your specific analysis AND Alex Dobin by personal email will not fix it AND there is no alternative aligner/mapper (subread, hisat2, kallisto, salmon, ...) cannot serve as a replacement. Imagine the effort only to understand the extensive codebase...

ADD REPLY • link 3 months ago by ATpoint 88k

0

Entering edit mode

I also feel a fork is inevitable - looking forward to incorporating your changes in our pipeline. I am hoping that Alex incorporates the changes you're proposing.

ADD REPLY • link 3 months ago by Joshi ▴ 10

0

Entering edit mode

You have not stated what even the problem with the current software is. Is there any bug affecting your analysis?

ADD REPLY • link 3 months ago by ATpoint 88k

0

Entering edit mode

To be frank, in it's current state - it is working fine for most of the part. There are a couple of instances where it tends to soft-clip reads instead of aligning them to the annotated splice junctions(s) and it's not always clear why.

Other enhancements that would be useful is better detection when a cryptic splice site is created. (it does admittedly provide a way to inject novel splice junctions)

We're planning on using STAR long term so the question arose more from a long term support perspective.

ADD REPLY • link 3 months ago by Joshi ▴ 10

0

Entering edit mode

Thanks all for the feedback. I agree that option 1 is a no-no. option 2 (or a fork is probably the way to go)

It would have been incredible if Illumina open source'd DRAGEN (and not made it a source code upon request matter).

ADD REPLY • link 3 months ago by Joshi ▴ 10

0

Entering edit mode

Illumina announced recently that they are porting DRAGEN to NVIDIA GPU's. This would likely make the DRAGEN suite more accessible. We don't know/hope that they will make the software available for no cost (probably not open source though).

ADD REPLY • link 3 months ago by GenoMax 151k

1

Entering edit mode

While an interesting development, the accessibility of hardware acceleration is not my concern. My concern is the methodological opaqueness of the tool. In my opinion, we should avoid, wherever possible, using non-open methods and, as reviewers of scientific literature, should request the same of our colleagues. Methodological transparency is critical to science. It cannot be universally achieved, but in choosing software for common tasks like alignment, preprocessing, and analysis, it can be achieved trivially. Thus, we should make every effort possible to eschew closed and proprietary methods in favor of open and transparent alternatives for such tasks.

ADD REPLY • link 3 months ago by Rob 7.1k

0

Entering edit mode

Do you know for a fact that Illumina is actually making changes to the underlying logic/algorithms? If the implementations are strictly in terms of code that can now run on different/widely accessible hardware that should be acceptable/commendable?

It was my understanding that for GATK they worked on accelerating some of the code and those improvements are openly available via WARP repository from Broad. Not being a software developer I can't judge if that is an adequate "public/open" release or not.

Majority of end users are going to be less concerned with software licensing/opacity in practical terms. If they can access the software (even if by request) and it produces results that help inform other downstream experiments for them, then they will use that option.

ADD REPLY • link 3 months ago by GenoMax 151k

1

Entering edit mode

Take, for example, the Dragen aligner. The repository hosting the code has not been updated in 3 years. It is almost certainly the case that the version of the tool currently shipping has deviated (perhaps substantially) from the code that can be evaluated publicly. Further, it is unclear if the (outdated) provided codebase even contains the modules relevant for the FPGA (or upcoming GPU) accelerated components. My main argument isn't about what people tend to do; there I am in complete agreement with you. Rather, my argument is about what we, as scientists, should do. Methodological transparency is very important to the regular practice of science. It might seem like one is nitpicking or being pedantic, but we often don't understand the importance of this practice until we encounter specific instances where it is too late to rectify. Especially, in the case of software such as aligners / variant callers / quantification tools / assemblers etc., where the academic community has bent over backwards to provide fully open, transparent, state-of-the-art, and often reasonably well-maintained tools, I feel it is critically important that we urge those in the community to use these open alternatives wherever possible. A small number of companies practically control the sequencing market, and while new companies can breakthrough, it is exceedingly difficult. We should not let these same companies corner the market on software and methods as well (especially when they are not fully committed to openness and scientific transparency in their methods).

ADD REPLY • link 3 months ago by Rob 7.1k

1

Entering edit mode

Agreed on the software that is owned by Illumina and which continues to remain opaque.

Illumina seems to have co-opted open-source packages so my comments were limited to that software. Are they making code/algorithm changes for the software that are not being made publicly available? That would be a valid concern, especially if people use that software (on DRAGEN/GPU) and simply say that software "X" was used for analysis.

Perhaps we should stop this discussion (and take it up elsewhere) to keep this thread on topic.

ADD REPLY • link 3 months ago by GenoMax 151k