Question

Forum:Best Way To Learn Bioinformatics For System Level Programmers.

9

Entering edit mode

10.8 years ago

amjadcsu ▴ 90

Hello,

I have a CS degree with experience in Linux system level programming.In my current job at medical research centre doing Next generation sequencing, I am involved in maintaining HPC infrastructure for researchers. To get a better understanding of researchers needs, I would like to learn bioinformatics.

Can someone guide me with best books and tools for this transition.

Thanks

NGS HPC • 6.4k views

ADD COMMENT • link updated 21 months ago by Ram 44k • written 10.8 years ago by amjadcsu ▴ 90

1

Entering edit mode

http://ds9a.nl/amazing-dna/ DNA seen through the eyes of a coder

ADD REPLY • link 10.8 years ago by Biomonika (Noolean) 3.2k

0

Entering edit mode

Probably not the best book, but good for an easy start: Bioinformatics For Dummies. Advice: talk to researchers regularly, and spend some time on this forum! ;)

ADD REPLY • link 10.8 years ago by zx8754 12k

Ram · Answer 1 · 2014-01-21

5

Entering edit mode

10.8 years ago

to.stephen.henderson ▴ 50

My experience is that there are relatively few good practical bioinformatics books (as opposed to some good bioinformatics algorithm books).

They date very very quickly.

That said I am hopeful for the forthcoming: Bioinformatics Data Skills (Vince Buffalo), http://shop.oreilly.com/product/0636920030157.do

enter image description here

ADD COMMENT • link 10.8 years ago by to.stephen.henderson ▴ 50

4

Entering edit mode

Interesting, I hope the book will turn out well and avoids the classic trap of being a Unix/Perl/Python book with a little bit of biology mixed into it

ADD REPLY • link 10.8 years ago by Istvan Albert 101k

4

Entering edit mode

I share this frustration with too, which is why I am writing this book. It's the book I wish I had when learning bioinformatics. It's an intermediate book (assumes readers know a bit of a scripting language), as this is what is lacking in current bioinformatics books. Many biologists learn a scripting language and a bit of Unix, and then begin doing bioinformatics. I think this can be dangerous and lead to non-reproducible or incorrect results. My book emphasizes working with data in a careful way using existing robust open-source tools and libraries. Fundamentally, a book on bioinformatics is a tricky thing, because everything goes out of date so quickly in this rapidly changing field. Bioinformaticians are able to keep ahead of changing technology because they have a core skillset - they can easily manipulate big datasets and actively check whether new software is working with their data. My book's goal is to share these data skills. I hope people enjoy it and find it useful!! Folks can tweet or email me if they have inquiries.

ADD REPLY • link updated 4.9 years ago by Ram 44k • written 10.8 years ago by Vince Buffalo ▴ 470

2

Entering edit mode

Sounds great. A lot what we do is problem solving in the data context, the actual details of which particular tool we run will change all the time.

I am adding it to my pre-order list on Amazon, keep us posted on any developments:

http://www.amazon.com/Bioinformatics-Data-Skills-Reproducible-Research/dp/1449367372/ref=sr_1_1?ie=UTF8&qid=1390391311&sr=8-1&keywords=Bioinformatics+Data+Skills

ADD REPLY • link 10.8 years ago by Istvan Albert 101k

score 4 · Answer 2 · 2014-01-20

The best way to gauge the needs of the researchers would be to arrange a meeting with some of the most active groups and have them explain their needs/uses of the cluster. We did this about a year ago at my university and the Sys Admin and IT staff said it was immensely helpful. That will give the researchers a chance to learn some different approaches to using the cluster more efficiently, and it will help someone in your position to find out where money should be spent on infrastructure.

Bioinformatics is so vast I don't think it's easy to give advice on what to learn without knowing the intended applications. The type of research and also the computational skill level will determine what is needed in terms of support. I can tell you that bioinformatics involves a lot of scripting, so being proficient in a scripting language is a benefit. The most helpful thing I can offer is to be aware of what tools are available (e.g., there is a bioinformatics software list by category at SEQanswers). A common pitfall I see of people from different fields is trying to tackle every problem with a custom approach when tools already exist for the job.

score 2 · Answer 3 · 2014-01-20

2

Entering edit mode

10.8 years ago

Istvan Albert 101k

I would also look for review papers that summarize a technology or those that have introduced popular tools. Those always have a lot of data that you can use to familiarize yourself with the process.

ADD COMMENT • link 10.8 years ago by Istvan Albert 101k

score 2 · Answer 4 · 2014-01-21

Nothing will be better than taking time to talk to potential users as different fields have different requirements. To highlight this point:

Cryo-electron microscopy benefits for fast GPU-boxes
De-novo assembly of deep sequencing data (e.g. using the Velvet program) requires a server with lots of RAM. (A group here uses a server with 1TB of memory for some more difficult genomes)
Population variant analysis can benefit from a larger clusters (e.g. The GATK pipeline from the BROAD institute)
Deep-sequencing in general can generate a lot of data, some of which can be compressed and intermediate files removed. Managing these files cost efficiently and sanely in a HPC environment is extremely useful.
Mass-spec analysis still uses a lot of commercial software that is pay-by-processor, which can be cost crippling if not installed on a VM
Balance between the latest tool and stability: some users need reproducible results, some need the latest tool or library installed, all (should) require that provenance metadata from the programs and libraries are stored.

Once you know what is needed, I would then familiarise yourself with some of the key tools and repositories. Details of some I have not yet mentioned are in the OBF and I would also look at Bioconductor, myExperiment and Galaxy

Ram · Answer 5 · 2014-04-09

2

Entering edit mode

10.6 years ago

Drio ▴ 920

I found this book particularly useful when I started to work with genomic datasets. It helped me to get my head around the biology, which in the end I think is what matters the most.

http://cl.ly/image/0C3v2g093m0d/download/999703.jpg

http://www.larrygonick.com/wp-content/uploads/2011/12/Genetics2.jpgmd5-d85084c6f7fad3d5ed41a4cb7ddfa2e3

ADD COMMENT • link updated 4.9 years ago by Ram 44k • written 10.6 years ago by Drio ▴ 920

score 1 · Answer 6 · 2014-01-20

For NGS, the Broad Institute has a lot of information on their GATK pipeline which is applicable to most NGS pipelines.

And the videos from the Workshop are great.

These resources are the first place I send people interested in NGS and will give you a good idea of what the researchers are trying to do.

score 1 · Answer 7 · 2014-01-21

It would certainly help if researchers could communicate to you their research hypothesis and what is the quantitative result they are looking for -- a table, or some number. Knowing this goal, not only helps to plan a sequence of steps that will get you there, but greatly contributes to your motivation and understanding of the "bigger picture". In turn, knowing the sequence of steps and tools chain, together with a researcher you will be able to assess the risks - i.e. potential biases tools can introduce and so on - which is also important.