I am new to NGS data analysis and I'm working in a multiple-sample variant calling workflow. I have Illumina-Miseq fastq files (paired end, raw reads) for a father, mother and child trio, one pair for each individual, totalling 6 files. I could trim, align, do the pre-processing and variant calling for each individual pair separately (I'm skipping indel-realignment and quality recalibration, for the sake of simplicity, as this workflow is intended for learning only), but I wish to merge the samples into a single file. I wish that the alignment step (with BWA-MEN), the pre-processing steps (with Picard) and the variant calling step (with FreeBayes), are done at once for all samples, if possible and correct, while taking in consideration the correct paired end mates and the respective read groups (when applicable).
My final goal is to obtain a single vcf file from which I'll compute the total number of different kinds of variants.
At which step, in which file format and with which Galaxy tools can I merge the samples in a manner that I can get correct, meaninful results at the variant calling step?
I would suggest following the GATK best practices.
Hello eurioste!
It appears that your post has been cross-posted to another site: http://seqanswers.com/forums/showthread.php?p=208960
This is typically not recommended as it runs the risk of annoying people in both communities.
Sorry, I wasn't aware this was a bad practice, but, why could it annoy someone?
because people here will spend some time to answer you while you don't care anymore because the question has already been answered on another site.
because we are a finite pool of volunteers who sacrifice time to help people, and it's not efficient if someone on seqanswers AND someone here invests time in answering the same question.