What are some good resources to learn shell script for NGS pipeline development?
How much shell script should one know to develop an intermediate level pipeline for NGS data analysis?
Can someone suggest some good resources, tutorials?
What are some good resources to learn shell script for NGS pipeline development?
How much shell script should one know to develop an intermediate level pipeline for NGS data analysis?
Can someone suggest some good resources, tutorials?
I use http://www.ruffus.org.uk/ to develop python pipelines for NGS data and it is a great library.
What is an "intermediate level pipeline"? What is your target audience? Release the pipeline into the wild? Internal lab use? Personal use? Anyway, to learn shell scripting for NGS pipeline development, you must learn shell scripting, so look at the "Bash Guide for Beginners" and "Advanced Bash-Scripting Guide".
With a very basic understanding of bash scripting you may easily put together a simple pipeline which will, for example, clean your reads, assemble a genome, map the reads / additional reads into assembled genome, and annotate assembled genome. In fact, I wrote such simple pipeline - it is really crude, no error checking, no optimizations, no whatever, but I feed fastq files and some hours later get a draft genome and its annotation.
Here are some good links that you can try. 1.http://lh3lh3.users.sourceforge.net/biounix.shtml 2.http://userweb.eng.gla.ac.uk/umer.ijaz/bioinformatics/linux.html
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I suggest looking into Snakemake. It's an excellent tool for developing NGS pipelines with good documentation, and is easy to use.
That being said, it may require some very basic background in the Unix command line, and bash. Python for the more advanced options as well.