Forum:On the usage of Golang
5
4
Entering edit mode
6.1 years ago
Lucas Peres ▴ 80

Hello everyone,

I would like to ask opinions on a topic that I am very sure has been widely discussed among people who build tools, but I am eager for answers I have not yet found. I will try to make precise questions.

Bioinformatics tools like assemblers and aligners are typically written in C/C++ for high performance, precise memory management, among other features provided by such "low level" languages. In recent years, Golang was used in a couple of applications in bioinformatics software development, such as workflows (where Python is dominant I believe). I would like to know your opinions/experiences on the usage of Golang for building tools like those written in C/C++. Although it may be close, I am pretty sure it's not as fast as C or C++, but given its advantages like easy of use, good environment, tooling, readability, etc, does it fit this niche? Is it worth learning for someone willing to start a career in this field? Or should I focus on C/C++? I am also aware of Rust, I think it is much closer to C++, any observations on this one are very welcome. Since my questions are not concerned with the spectrum of data analysis, machine learning and high level stuff, I am leaving Python, R and Julia out of the discussion.

I am a computer science major interested in further graduate studies in the other side of the spectrum (algorithms design, optimization, data structures, etc). My main concern is on which of those languages would be worth investing my time to learn at first. I think I should say I have experience with Python, Java and the very basics of C, but not at a level to write production code though.

I searched similar posts here and other sites regarding this topic, but the answers tend to be pretty vague. Hope I am not being repetitive.

Thank you. =)

assembly next-gen alignment • 5.5k views
ADD COMMENT
0
Entering edit mode

tagging: shenwei356
He may be the best person to answer this since his tools are developed in golang.

ADD REPLY
0
Entering edit mode

I have used SeqKit some time ago and I noticed immediately that it was implemented in Golang. That was the first time I saw a tool written in the language, what left me intrigued. I would appreciate his comments.

ADD REPLY
4
Entering edit mode
6.1 years ago
Samuel Lampa ★ 1.3k

I have mostly written workflow software in Go (SciPipe), not that much low-level algorithmic stuff. Based on my experiences though, if I'd go into algorithmic development, I'd:

  1. Choose Go if I'd aim to create a library of components or functions, where I want to use the same language both for the component implementation, as well as for the "glue code" to build the pipeline consisting of many such components.
  2. Consider Rust if I'd be implementing stand-alone tools that are supposed to be consumed via commandline, where the last ounce of performance might be critical.

The reason for this is that from my experience, Rust seems way too complex to be used as a "glue language" in a similar manner to how python can be used. Go fits much better into that category, while also being fast enough most of the time, to also implement algorithmic stuff in it.

Go actually has a number of things speaking for it in terms of performance too, although it won't probably ever compete with a language that can make away with a garbage collector completetly. One such thing is that the data structures (structs etc) in Go, are mapped pretty directly to how data is laid out in memory (struct fields will occur sequentially in memory, etc), which makes it easier to optimize performance than in some other langauges of similar complexity.

For getting the last ounces of performance, I'd still at least consider Rust though. I'd personally definitely consider Rust over C++, mainly because I think we BADLY need more reliability and robustness in research code, and I think we should do what we can to achieve that. I think the complexity and ease of shooting yourself in the foot in C++, does not align well with the idea of robust, verifiable, understandable and reliable code, which we need to tackle the reproducibility crisis in science.

ADD COMMENT
1
Entering edit mode

That pretty much answers my question.

Choose Go if I'd aim to create a library of components or functions, where I want to use the same language both for the component implementation, as well as for the "glue code" to build the pipeline consisting of many such components.

Go may stay along side languages like Java to build huge workflows, especially when they are distributed systems. It was conceived for this kind of environment, right?

Consider Rust if I'd be implementing stand-alone tools that are supposed to be consumed via commandline, where the last ounce of performance might be critical.

That is, I believe, the kind of tool that is built at the Lab I intend to apply. As far as I know, their research is focused in combinatorial algorithms applied to Bioinformatics. They have a strong focus on the theoretical side, but implementing the algorithms to build tools is very desirable. I have searched some works from there and the majority of the code is written in C/C++/Java.

I'd personally definitely consider Rust over C++, mainly because I think we BADLY need more reliability and robustness in research code, and I think we should do what we can to achieve that. I think the complexity and ease of shooting yourself in the foot in C++, does not align well with the idea of robust, verifiable, understandable and reliable code, which we need to tackle the reproducibility crisis in science.

Thank you very much for this observation. I said in another comment the problems I face when compiling tools and handling dependencies. I hope Rust would bring more safety, reliability and easier code distribution. Nevertheless, I think I can't run away from C/C++ given the huge legacy code we have out there, but I definitely will give Rust a try.

ADD REPLY
3
Entering edit mode
6.1 years ago

My recent experience is that C and C++ are still king for performant one-person projects, but Golang is a better choice for new collaborative projects. Golang computational cost is roughly 1.5x that of similarly optimized C or C++ code; that's an acceptable price to pay for mutually comprehensible, mostly-footgun-free code that can more easily utilize all cores and machines you have access to.

ADD COMMENT
0
Entering edit mode

My recent experience is that C and C++ are still king for performant one-person projects

Interesting. So do you think for one-person projects it would be more valuable to stay with C/C++? I ask that because next year I will be applying for graduate school and (hopefully) do research in algorithms/data structures for bioinformatics. I have seen some works from there and, indeed, they implement everything in C or C++, but today I wonder if that needs to be the case since we have others systems programming languages that are safer and way more productive for development. See, I don't want to have headaches with deadlines because my programming language is giving me troubles! haha

ADD REPLY
1
Entering edit mode

Depends on how interested you are in low-level computational details like memory layout and SIMD instructions, vs. mid- and higher-level optimizations like use of Burrows-Wheeler. Golang will save you time when you stick to the latter. C and C++ are still the best places to start if you want to dive into the former.

ADD REPLY
3
Entering edit mode
6.1 years ago
Joe 21k

I've had similar discussions with colleagues of mine. My feeling from browsing around this forum and GitHub etc, is that Go would be a good compromise between the speed of C, but with a little more user friendly-ness. My colleagues however don't particularly rate Go as an up and coming bioinformatics language, and many of them are leaning toward Rust (and a couple are on Nim).

I've been weighing up what language I learn next personally, and it would be wise for it to be something more performant that Python I think. It seemed like Go might be an easier transition than C.

For something heavy duty like an aligner, were every iota of performance could really count, I don't personally see Go replacing C variants any time soon.

Based on my colleagues input, I'm leaning toward Rust, as a moderate step down in ease from Python - though I have also been jumping in the deep end with C (but with a long, long way to go).

Hope that's of use!

ADD COMMENT
0
Entering edit mode

For something heavy duty like an aligner, were every iota of performance could really count, I don't personally see Go replacing C variants any time soon.

Well, that is a pity. You mentioned Rust. So far as I know, this one is comparable to C++ and has a tooling environment similar to Golang, although it's much more difficult to learn. Do you think Rust can fit this niche?

ADD REPLY
0
Entering edit mode

I'm not sure about fitting the heavy performance niche, but I won't claim to know that much about Rust. When I say I don't think Go will replace C variants, I mainly mean that C skills are much more entrenched already, and there will be an inertia in replacing it ('if it ain't broke, don't fix it' etc.)

Personally, I waste more time trying to compile bioinformatics tools than probably anything else, so the less thats written in tricky-to-compile code, the better IMO, but obviously interpreted languages like python are always going to be left in the dust speed wise.

Thinking as a user, rather than a developer, its easy to forget that many, if not most, of the users of Bioinformatics tools, are often not expert bioinformaticians. There is a lot to be said for ease of installation and usage over pure power (again, IMO). If an assembler takes an extra hour to run, that doesn't really bother me particularly, since there's almost always something else I can go and do. Obviously if you're assembling 100,00 genomes, the story might be different, but then again, amateur bioinformaticians are unlikely to be the ones doing those sorts of projects. Plus if the developers package things up in to conda/docker etc, that can take a lot of the headache out of it.

Naively, to my mind I reasoned the following:

Difficult + Quick                                                   Easy + Slow
          |--------------------------------------------------------------|
          ^        ^                                            ^        ^
         C-like  Go(?)                                        Julia(?) Python

And everything else (Go, Rust, Nim, Julia etc) is somewhere in the middle, either slightly more toward python or C. I figure if I learned some C, I'd have both ends of the spectrum covered, and everything inbetween would be a little easier still.

ADD REPLY
1
Entering edit mode

Indeed a strong foundation in C/C++ and Python covers a lot of ground (if not everything one can do!). The middle ones may be adequate for specific situations, although I think Julia and Rust have so much potential. I've heard of Nim, but don't know anything about it.

Personally, I waste more time trying to compile bioinformatics tools than probably anything else, so the less thats written in tricky-to-compile code, the better IMO

This is very true. I spend a lot of time compiling stuff written in C/C++ and it is extremely common to happen some errors due to missing dependencies. That's one of the reasons I was motivated to open this discussion. I think these novel packaging systems from Go and Rust make code distribution a lot easier.

ADD REPLY
3
Entering edit mode
6.1 years ago
Medhat 9.8k

Did you though also about Julia? It is more productive than C++ and has a comparable speed to C++ (No comparison here with C at all)

Note: This is an open ended discussion.

ADD COMMENT
0
Entering edit mode

Absolutely! But I think it's designed toward the same niche to Python and R and numerical computing in general. Do you think it is feasible to develop low level stuff like aligners, assemblers, etc? What passes through my mind now is that it may be nice to prototype algorithms, but not for production code.

ADD REPLY
0
Entering edit mode

There is already some initiative towards biology : https://github.com/BioJulia

But will it work for assembly and alignment ? I have doubts (I would choose Rust over Julia If I do not go towards C/C++) For prototype (prove of concept) Python is always the answer from my point of view.

ADD REPLY
3
Entering edit mode
6.1 years ago
Botond Sipos ★ 1.7k

I have built some bioinformatics tools in Go, and my opinion is that it is very much worth learning. Probably the best thing about it that it truly follows the zen of Python principle that "There should be one-- and preferably only one --obvious way to do it." :) Hence it is a small language, but it offers everything you need for comfortable development. For somebody who already knows programming the barrier to entry is really low. With resources like effective Go, Go by example and tour of Go one can get up to speed in no time.

ADD COMMENT

Login before adding your answer.

Traffic: 2139 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6