Hello all,
Is there a good example of a genomics pipeline ready to be used for mapping/alignment of NGS data (preferably whole genome) followed by variant calling / annotation along with generation / visualization of quality matrices? It will be even better if the suggested pipeline is Python based.
I would like to use publicly available Fastq and/or bam files to 'learn' and demonstrate the entire DNA analysis workflow.
Your help and suggestions will be greatly appreciated.
Thanks much.
Not a ready to use workflow, but if your goal is to learn, you might want to have a look at the tutorial about Creating workflows with snakemake and conda I've wrote some time ago.
Thanks finswimmer for the workflow ... certainly will help me to learn.
By any chance do you have links for the .fa and multiple fastq files for me to give this example a try?
Do I also have to provide an index file?
TIA
Hello caspase8mach ,
you can search in the European Nucleotide Archive for a suitable public dataset (This tutorial by ATpoint might be useful for you as well)
What index file do you mean?
fin swimmer
Awesome, thanks a lot for the link to the nice tutorial! Its great!
What index file do you mean?
For mapping the Fastq file using a reference genome, do I need to create an index first?
Thanks a lot.
Yes, you need to create an index for the reference genome. How you create this index, depends on the aligner you like to use. E.g. for
bwa
it's a simplebwa index genome.fa
Thanks a lot. As suggested, I created index file using
bwa index hg19.fasta
and got the following files:hg19.fasta hg19.fasta.amb hg19.fasta.ann hg19.fasta.bwt hg19.fasta.pac hg19.fasta.sa
I did manage to align a pair of FastQ files using your Snakemake tutorial, hurray ... my first NGS DNA Analysis pipeline!Now my questions is .... how is the analysis done in production, to analyze several samples, is it possible to do in parallel fashion, cloud computing, etc., any examples?
Thanks a ton for your help.
If you start snakemake with the
--cores
parameter e.g.--cores 4
it runs 4 jobs in parallel.snakemake
can also be used with cluster and cloud support. See the manual for it. Unfortunately I have no experiences with this.Certainly helpful, will give it a try and let you know. Any one with an experience with the Apache Spark based DNA NGS Pipeline(s)?
Thanks
nextfow pipelines: https://github.com/search?q=bwa+extension%3Anf+HaplotypeCaller (not python)
Thanks for the info, but somehow I am not able to access the URL you wrote/suggested. Could you please give me the correct URL? Thanks