Update (October 26, 2016)
in the meantime the book has moved from LeanPub to GitBook. It also turned out to be a lot more demanding than I thought. It is coming together though:
http://read.biostarhandbook.com/
Looking for contributors for various chapters:
- dbSNP
- 1000 genomes project
and many others.
I have been teaching bioinformatics courses for a few years now and I have always felt that existing resources were inadequate.
Most are either too programming and unix oriented or too focused on one particular "protocol" ignoring alternatives that may produce different results. In addition most resources tend to focus on installation and running the tool rather than understanding the outputs. Disclaimer: I am guilty of this as well! I always felt that I had to start from zero each time I write a guide and towards the end there is too much material already and I have to cut short at the most interesting parts. But that is because there has never been an updated and reliable resource that I can refer people to. Until now.
I am starting a "bioinformatics handbook" resource it is called the Biostar Handbook. I would like it be a repository of practical advice on bioinformatics methods, a resource that is useful to both beginners and advanced users, a collection of curated experiences of bioinformaticians around the world. The book will be comprehensive with ebook and online components that will continue to grow and expand over the years. It will come at very low cost of about $25 to ensure that the task of maintaining, correcting and supporting it won't solely require personal enthusiasm and could be contracted out if necessary.
I would like to invite everyone to contribute via GitHub: you will retain authorship, copyright and distribution rights on all content you create. And since we are creating the ultimate guide to Bioinformatics ;-) I think it will be a great adventure for everyone involved.
Contribution guide: http://biostarhandbook.com/contribute.html
Book website: https://leanpub.com/biostarhandbook
Help us create the best bioinformatics resource that was ever conceived!
Update
in the meantime the book has moved from LeanPub to GitBook. It also turned out to be a lot more demanding than I thought. It is coming together though:
http://read.biostarhandbook.com/
Looking for contributors for various chapters:
and many others.
Oh this is interesting. But what happened between 13 months ago (when the thread was created) and 12 hours ago (when you posted an update)? Are people contributing or is this all 'just' your effort as far?
For me this worked out a bit like software development - where some code that I write is not published. The previous year was mostly exploring what works and what does not.
After the announcement I put a lot work into it, there is an almost full book worth of material that I wrote and then I taught from it for a semester in the Spring of 2016. In the end I disliked that format and ditched it (though I have reused some chapters).
That book was primarily in pdf format, then there were slides based on the book and there was an associated website with code snippets - it ended up a bit disjointed and confusing - I myself got confused after a while and could not find what I was looking for.
The lessons that I learned from that experience made me rethink the book format. So this is the Book 2.0 with 1.0 staying in the drawer.
Ok, if yourself are confused, then I don't feel bad that I am as well. I do understand that this is work in progress, anyway:
1 The release date December 2016 is not relevant no more I assume...? (Since some chapters are not there at all and some other look like they could use some work)
2 So am I right to assume that what we see online on github is the book 2.0? And the pdf and additional material is ditched? Because it makes no sense to edit the github stuff, if all we see is just 'additional' material and there is more somewhere else that we don't know about. Did you also ditch the idea of releasing it as hard copy? (Just aiming for open source online version of it? Or Do you still think about doing a proper book, which, in my opinion has quite a few consequences ... copyrights, quality, etc)
3 Do you want to keep it that an author is responsible for a chapter? What happens if somebody just changes little bits of a chapter - there could be a conflict with an author of a chapter. (Except if you wrote most of it and you say you don't care, that makes it easier)
4 If you want people to contribute, should it all be via github? Is that the plan? I think, especially after your experiences the last year, a little bit of a plan is needed. So far, not too many people did contribute, is that correct?
I am going to make a new top post on this once I get more feedback. Here is what I learned over the past year and what my current approach is:
How would you feel if I replaced all the installation code with conda? Is that really less informative?
I never got to use conda myself hence I don't know what state it is in. I always considered it a python package manager but it may be more than that.
I do know from experience that homebrew is robust, and stays out of the way - the downside is that is OSX only.
If conda works comparably well I'd be more than happy to add that as either an alternative or as replacement if it turns out to be better.
For reference, conda is now largely the preferred packages in Galaxy...so that covers the popular stuff (bowtie2, bwa, salmon, samtools, etc.) at least. You can also install bioconductor packages with it.
OK so I will slowly provide replacements to the installation routines throughout this guide. By providing a universal means of specifying project dependencies, IMHO conda/bioconda have tremendous momentum in the bioinformatics community well outside of python.
Send me your github account and I can add you as collaborator.
This applies to everyone else that wants to collaborate.
I also feel there should be links to actual Biostars questions and tags at the bottom of each page. This handbook seems a bit divorced from Biostars as it stands.
Sounds good to me.
Also you'd be pleased to know that based on your suggestion conda has been integrated to the installation instructions up to the point I am hoping to be able to provide a single link that installs everything in one shot like so:
http://data.biostarhandbook.com/bash/README.html
Great to see it. Can we categorize for which areas contributors are needed?
We need contributors for everything that is missing ;-)
On a more serious tone - everything can be improved or expanded. What I am trying to address and I am hoping that this will come through is to go beyond the "typing
velvetg hashsize=35 cov=auto
is how you assemble a genome "- because that is not true really.When we run these tools but we continuously asses and evaluate the results - I really want to demonstrate and teach the thought process rather than just the method. Any contribution that helps clarify and strengthen this aspect is welcome.
How in detail did you want to discuss the sequencing technologies (can't help but mention that your data about MinION is outdated)? In my humble opinion it's vital to have a good understanding of the technology you are working with before starting an analysis.
Some of the challenges with the new technologies is that they evolve so fast. The chapter on MinION and other technologies should reflect the state of the art - and would need to be updated regularly. The level of detail needs to be the one that has relevance to a bioinformatics data analysis.
How the technique works can be written out in a few paragraphs - what the data looks like and how to deal with it - is probably more complicated and may need other sections.
One realization that I felt liberating was to not feel the pressure to provide complete and encyclopedic content. We'll let that come later if ever. We'll just put in important things that are of high relevance and see what happens.
At the same time data from these instruments is being deposited and stored in SRA hence we need to also know what it looked like last year - we may need to re-analyze that as well. So we have to address more outdated technologies as well. Here is the content of the SRA broken down by platforms (how many runs per platform) - I think these should be mentioned to some extent as well:
as per: http://read.biostarhandbook.com/data/automate-sra.html
I was always thinking about this but not as a book. I wished there could be something like a wiki page in biostars with summarized information (from biostars) on very common problems/pipelines to avoid duplicate questions and also which should serve as reference material for beginners. But a book will be really a very good initiative.
The book sections will be a high level overview of the "what and why", with examples that may not be runnable without extra setup. The web sections will present every single command systematically.
I found that these two goals cannot be satisfied in a single resource. They either make the book way to long or the code too verbose. You can't really interrupt the commands with lengthy interpretation as they require a very different mindset.
The NGS wikibook has those kind of aims and contains a lot of migrated content from the seqanswers wiki. Earlier this year there were talks about reinvigorating the project but I'm unsure of its current status.
I think efforts like this need to be framed in the context of a bigger goal beyond just create a comprehensive resource.
My goal is to use and reuse this resource in my courses/workshops and other training efforts. Up to this point I always create a brand new site with partial overlap with the old one and that is not a good way of approaching this. I have at least four sites like that. The same thing happened with the MSU Bioinformatics workshop, there are six or seven versions of it, all in various states of abandon now as the old ones are not fully maintained and may contain outdated information. This just makes bioinformatics even more confusing, you can easily find these guides that don't actually work anymore.
I am hoping that this will align with other people's goals and we can build a resource we can all use and keep up to date for reasons other than just greater good - but actually saves time and makes us all more productive.
Neat initiative. It will be ebook/online only or what? I'm guessing this will be a great way to write guides for software you get published.
Would you even consider a chapter for dinky scripts or is it for published software only? As an example: I have one that allows you to access biomart from the command line, and while it is useful (and could probably be used for 95% of the "convert my genes" qs here) it is obscure. Would/could it get a short chapter in the biomart/conversion section or what?
I think scripting is what most people need from bioinformatics. Putting together the pieces in a coherent and useful way that solve a real problem.
So yes, what you describe is what I hope the book will primarily do - show how to get stuff done in the real world.
What we will do to these scripts is bring a level of consistency and uniformity to the formatting across all, document them the same way etc.
Somehow I totally missed it. I would love to contribute. I am working on my dissertation right now but would love to participate once I am done.
You are most welcome, please do! I've been hacking away at this furiously.
I am now planning a whole other section that is modeled after Mission Impossible: you have 1 minute to analyze the Ebola Genome where one would show how to solve a realistic problem in a very short amount of time.
Ok perhaps the goal won't be to do it in just one minute but say a way that is doable on a laptop perhaps by solving the problem on a subset of the data (shortest chromosome). The goal is still to demonstrate a complete workflow.