Hello,
I've seen Snakemake pipelines for, for example variant calling that start with Fastq files, does the quality check, trim the files and goes all the way to produce a PCA plot.
What I have trouble understanding is that how one then checks the outputs of various steps that requires to be assessed before deciding the next step. For example, after checking fastq quality, one needs to visually assess the read quality, then decide on filtering, check the quality again and only then proceeds to mapping. Similar situation after mapping, similar after variant calling.
For me, it seems it's best to have multiple workflows, for quality check, for mapping, for variant calling, etc, so the output of each step can be checked first.
Of course, it seems wonderful to be able to have one file for a pipeline from start to finish but I'm wondering if I'm missing something, for example, there is a way to do these checks while having it all in one workflow. How do you usually deal with this?
Thanks so much!
You can also have as output files of your workflow the QCs of each steps and from these choose to remove samples afterward.
But one of the point of using workflow manager is to automate a process and ensure reproducibilty. The more you introduce manual filtering steps, the less your analysis will fulfill these 2 criteria.
I understand but most of the time, we don't remove the samples. We need to filter them, re-run the quality check and do this in an iterative process until we get the clean data.