The Five Most Annoying Bioinformatics Problems You Face Every Week?
6
6
Entering edit mode
11.2 years ago
davidcassali ▴ 60

Hi,

I am a CS student and I am taking a bioinformatics module at my university. As I will probably be coding some visualization tools soon, I would like to better understand how you bioinformaticists and bioinformaticians think and work and what issues do you face. I am also interested in the "gap" between the "computer guys" and the "bio guys" and what can we, the "computer guys" do, to narrow the gap.

So: What are the five most annoying bioinformatics problems you face every week or even every day?

Thanks. David

• 9.4k views
ADD COMMENT
8
Entering edit mode

For starters -- try not to exclude women off the bat by referring to "computer guys" and "bio guys".

ADD REPLY
2
Entering edit mode

"2. guys Informal Persons of either sex." guys

ADD REPLY
1
Entering edit mode

I too, will say "Hey, you guys" to a mixed group. But "computer guys" and "bio guys" is different. Maybe it's because I would never say, "Hi, I'm a computer guy".

ADD REPLY
1
Entering edit mode

I don't think anyone says, "I am a computer guy." regardless of sex.

ADD REPLY
4
Entering edit mode

Bioinformatics is just one big annoyance. And yet, one that is strangely addictive and occasionally, quite satisfying.

ADD REPLY
14
Entering edit mode
11.2 years ago

Are you looking to create applications for non-computational biologists or bioinformaticians? Requirements for these two audiences don't really overlap that much.

For bioinformaticians:

  • For whatever function you want to implement, have a command line interface. Use POSIX standards for the CLI.
  • Test out your application on linux/osx/windows/whatever you want to support. Make sure it installs correctly. Make sure you list all the dependencies. Most of the time, if I face more than 3-4 installation errors due to dependencies that weren't listed in the installation instructions, I will just give up and move on to another tool that'll do a similar job.
  • Try not to come up with any new file formats. If you have to use an internal file format for data manipulation between steps, please keep it simple (tab delimited, csv) and document the format specs.
  • Have options to output as much intermediate files as possible. This is always helpful for debugging the run.
  • If it takes more than 2 hours to run and your users will most likely have to run it multiple times, think about a multi-thread/core implementation.
  • Always always have default values in place for your parameters so users can run everything once to check if the application finishes without having to screw around with settings.
  • Upload your source code to github.

For non-computational biologists: (This will be harder for you to implement because you actually need biological knowledge and experience to know how biologists think)

  • Follow the KISS principle (Keep It Simple Stupid). Don't flood the GUI with tons of options and choices. Think about how your audience will use the app. If 90% of your audience will probably use a specific function, make that function prominent and put the various other options in an "advanced" menu.
  • Don't put your self in feature hell by coming up with a ton of features that only 5% of your audience will use. Think about the core 2-3 functions that people will use and make that extremely robust.
  • Output your file in formats that can be opened with common office software (yes that means excel, sorry Pierre)
  • Vector (pdf, svg) over raster (png, tiff) for visualization outputs, unless the visualization is extremely data intensive.
  • Try not to use too many technical jargons in your documentation. Your audience will probably not care about the technical details. Save that for your github.
  • Try to explain your algorithm as clear as possible in your documentation. Give example cases on how to run your software.
  • I would spend a lot of time on input checking functions. Give correct error messages if the user inputs the incorrect format or have an error in their file.
  • Don't get too creative with the UI design. Try to follow conventions. It's about usability, not eye candy.
ADD COMMENT
2
Entering edit mode

Somehow, the first bullet point under "For non-computational biologists:" reminds me of Just with one button from biocomicals :)

ADD REPLY
7
Entering edit mode
11.2 years ago
  • please don't use excel ( knime.org is a nice solution)
  • I use a 0-based half open interval, that is why you see a +1 shift on your web browser
  • Do you really want of a web interface or a heavy client instead of a command line program ?
  • I cannot install and test all the tools published in Bioinformatics, I don't know this new great database published in "Obscure Archives of North Finistere[In-French]" with no option to download or automatically access the data.
  • I don't know everything.
  • please, don't use excel
  • I hate the sentences starting with "you just have to..."

:-)

Update:

  • please, don't use excel
ADD COMMENT
4
Entering edit mode

you forgot to talk about excel.. ;)

ADD REPLY
1
Entering edit mode

Don't use excel and don't use cell colors to encode information in excel as this will get lost when you export the data to CSV files.

ADD REPLY
0
Entering edit mode

Thanks, fixed

ADD REPLY
0
Entering edit mode

what exactly is wrong with excel ?

ADD REPLY
5
Entering edit mode
ADD REPLY
7
Entering edit mode
11.2 years ago
brentp 24k
  1. Poor or no experimental design
  2. Problems that result from 1.
ADD COMMENT
2
Entering edit mode

I wish I could up-vote this answer more than once.

ADD REPLY
2
Entering edit mode

Josh Herr +1 brentp +1. "So I have this NGS project..."

ADD REPLY
2
Entering edit mode
11.2 years ago
Mary 11k

First I would like to say: good on you for asking. Turning to the practitioners in this field was very wise, and I was delighted to see that. An excellent start to your career :) .

I come more from the end-user/biology side, and what irks me a lot is mystery versions. What version of a tool was used to generate this data? What release/version/date of X was grabbed for this display?

I actually had kind of an argument recently at a bioinformatics tool forum because their interface was giving me entirely untraceable data in the visualization. I had no way to check and see if that was valid at all. And they didn't seem to understand why this mattered to me. Harrumph.

ADD COMMENT
1
Entering edit mode
11.2 years ago
Irsan ★ 7.8k

Reordering chromosomes from chr1, chr10, chr11 to chr1, chr2, chr3

ADD COMMENT
0
Entering edit mode

or the type of solution that aims to establish the "right" sort order by re-naming these chromosomes as chr01, chr02 ...

ADD REPLY
1
Entering edit mode

do you know the new GNU sort "alphanumeric" option ? :-) A: How to sort bed format file

ADD REPLY
0
Entering edit mode

Not until now ;-)

ADD REPLY
0
Entering edit mode
11.2 years ago

Converting between ENSEMBL chromosome names (e g 1, 2, 3 ... , X, Y, MT for human) and the other (but more often used) system with chr1, chr2, chr3, ...., chrX, chrY, chrM for human. Not difficult but extremely annoying to need to do this. There is a little "gotcha" with the mitochondrial genome so that you can't simply add "chr" in front of everything :-)

ADD COMMENT
1
Entering edit mode
ADD REPLY
0
Entering edit mode

True, if you want to really do it right. Looks like a useful script!

ADD REPLY

Login before adding your answer.

Traffic: 1632 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6