Hi everyone,
I am trying to figure out the most efficient method to perform variant analysis on a large dataset. I have 200 samples and forward and reverse reads for a total of 400 fastq files. I was able to load all of these files into Galaxy and create a workflow that looks like this:
fastQ Groomer -> Trim -> BWA-mem -> flagstat -> Generate pileup -> Filter pileup
I have now realized that there is no way to loop through or automate my workflow on my fastq files. Is there a better way to do this other than running this workflow 200 times manually? Can create a script through the command line and use my fastq files as the input? If anyone has any suggestions or is aware of software to handle this type of job I would really appreciate your help.
Do you have to use galaxy? If so, you might want to post on the galaxy-specific version of this site. If not, you can certainly just create a script to do this for you (that's what most of us do).
I am completely open writing a script and leaving Galaxy behind, I just don't know where to start. I have some programming experience (Java, python) any suggestions would be helpful.
Popular options would be to use shell scripts or a Makefile. You could also use python, but I imagine that'd prove a bit more work. There's also ngsxml, though I have to confess not being very familiar with it (though the author, Pierre, is a regular here and writes great stuff, so I expect it's good).
Do you have Linux system? I am bioinformatician and we have MiSeq and HiSeq - I wrote lot of shell script designed for Illumina reads - Filtration - Alignment - Variant calling. Do you want o share it?