Forum:Best Way To Learn Bioinformatics For System Level Programmers.
7
9
Entering edit mode
10.8 years ago
amjadcsu ▴ 90

Hello,

I have a CS degree with experience in Linux system level programming.In my current job at medical research centre doing Next generation sequencing, I am involved in maintaining HPC infrastructure for researchers. To get a better understanding of researchers needs, I would like to learn bioinformatics.

Can someone guide me with best books and tools for this transition.

Thanks

NGS HPC • 6.4k views
ADD COMMENT
1
Entering edit mode

http://ds9a.nl/amazing-dna/ DNA seen through the eyes of a coder

ADD REPLY
0
Entering edit mode

Probably not the best book, but good for an easy start: Bioinformatics For Dummies. Advice: talk to researchers regularly, and spend some time on this forum! ;)

ADD REPLY
5
Entering edit mode
10.8 years ago

My experience is that there are relatively few good practical bioinformatics books (as opposed to some good bioinformatics algorithm books).

They date very very quickly.

That said I am hopeful for the forthcoming: Bioinformatics Data Skills (Vince Buffalo), http://shop.oreilly.com/product/0636920030157.do

enter image description here

ADD COMMENT
4
Entering edit mode

Interesting, I hope the book will turn out well and avoids the classic trap of being a Unix/Perl/Python book with a little bit of biology mixed into it

ADD REPLY
4
Entering edit mode

I share this frustration with too, which is why I am writing this book. It's the book I wish I had when learning bioinformatics. It's an intermediate book (assumes readers know a bit of a scripting language), as this is what is lacking in current bioinformatics books. Many biologists learn a scripting language and a bit of Unix, and then begin doing bioinformatics. I think this can be dangerous and lead to non-reproducible or incorrect results. My book emphasizes working with data in a careful way using existing robust open-source tools and libraries. Fundamentally, a book on bioinformatics is a tricky thing, because everything goes out of date so quickly in this rapidly changing field. Bioinformaticians are able to keep ahead of changing technology because they have a core skillset - they can easily manipulate big datasets and actively check whether new software is working with their data. My book's goal is to share these data skills. I hope people enjoy it and find it useful!! Folks can tweet or email me if they have inquiries.

ADD REPLY
2
Entering edit mode

Sounds great. A lot what we do is problem solving in the data context, the actual details of which particular tool we run will change all the time.

I am adding it to my pre-order list on Amazon, keep us posted on any developments:

http://www.amazon.com/Bioinformatics-Data-Skills-Reproducible-Research/dp/1449367372/ref=sr_1_1?ie=UTF8&qid=1390391311&sr=8-1&keywords=Bioinformatics+Data+Skills

ADD REPLY
4
Entering edit mode
10.8 years ago
SES 8.6k

The best way to gauge the needs of the researchers would be to arrange a meeting with some of the most active groups and have them explain their needs/uses of the cluster. We did this about a year ago at my university and the Sys Admin and IT staff said it was immensely helpful. That will give the researchers a chance to learn some different approaches to using the cluster more efficiently, and it will help someone in your position to find out where money should be spent on infrastructure.

Bioinformatics is so vast I don't think it's easy to give advice on what to learn without knowing the intended applications. The type of research and also the computational skill level will determine what is needed in terms of support. I can tell you that bioinformatics involves a lot of scripting, so being proficient in a scripting language is a benefit. The most helpful thing I can offer is to be aware of what tools are available (e.g., there is a bioinformatics software list by category at SEQanswers). A common pitfall I see of people from different fields is trying to tackle every problem with a custom approach when tools already exist for the job.

ADD COMMENT
1
Entering edit mode

I would also add that this is one of those examples of why teams of people are so important now in this field. To be frank, it would take an immense amount of study and practice for a developer or systems engineer, etc to become a bioinformatics expert. It is good to know something about it in your case, but to be a good bioinformatician you need the biology background as well. Ideally you should have the wet-lab researchers, bioinformaticians, and developers/IT/engineers working together.

ADD REPLY
2
Entering edit mode
10.8 years ago

I would also look for review papers that summarize a technology or those that have introduced popular tools. Those always have a lot of data that you can use to familiarize yourself with the process.

ADD COMMENT
2
Entering edit mode
10.8 years ago

Nothing will be better than taking time to talk to potential users as different fields have different requirements. To highlight this point:

  1. Cryo-electron microscopy benefits for fast GPU-boxes
  2. De-novo assembly of deep sequencing data (e.g. using the Velvet program) requires a server with lots of RAM. (A group here uses a server with 1TB of memory for some more difficult genomes)
  3. Population variant analysis can benefit from a larger clusters (e.g. The GATK pipeline from the BROAD institute)
  4. Deep-sequencing in general can generate a lot of data, some of which can be compressed and intermediate files removed. Managing these files cost efficiently and sanely in a HPC environment is extremely useful.
  5. Mass-spec analysis still uses a lot of commercial software that is pay-by-processor, which can be cost crippling if not installed on a VM
  6. Balance between the latest tool and stability: some users need reproducible results, some need the latest tool or library installed, all (should) require that provenance metadata from the programs and libraries are stored.

Once you know what is needed, I would then familiarise yourself with some of the key tools and repositories. Details of some I have not yet mentioned are in the OBF and I would also look at Bioconductor, myExperiment and Galaxy

ADD COMMENT
2
Entering edit mode
10.6 years ago
Drio ▴ 920

I found this book particularly useful when I started to work with genomic datasets. It helped me to get my head around the biology, which in the end I think is what matters the most.

http://cl.ly/image/0C3v2g093m0d/download/999703.jpg

http://www.larrygonick.com/wp-content/uploads/2011/12/Genetics2.jpgmd5-d85084c6f7fad3d5ed41a4cb7ddfa2e3

ADD COMMENT
1
Entering edit mode
10.8 years ago
donfreed ★ 1.6k

For NGS, the Broad Institute has a lot of information on their GATK pipeline which is applicable to most NGS pipelines.

And the videos from the Workshop are great.

These resources are the first place I send people interested in NGS and will give you a good idea of what the researchers are trying to do.

ADD COMMENT
1
Entering edit mode
10.8 years ago
Pavel Senin ★ 1.9k

It would certainly help if researchers could communicate to you their research hypothesis and what is the quantitative result they are looking for -- a table, or some number. Knowing this goal, not only helps to plan a sequence of steps that will get you there, but greatly contributes to your motivation and understanding of the "bigger picture". In turn, knowing the sequence of steps and tools chain, together with a researcher you will be able to assess the risks - i.e. potential biases tools can introduce and so on - which is also important.

ADD COMMENT

Login before adding your answer.

Traffic: 1509 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6