Entering edit mode
6.0 years ago
vlrieg
▴
20
I want to work with some data from the Short Read Archive (SRA), but some of the experiments are comprised of samples that seemingly underwent multiple sequencing runs (technical replicates). What is the correct way to combine these runs when using the GATK pipeline for mapping and variant calling? Or is it better to only use one run per sample (perhaps the sample with the highest coverage?). My first instinct is to concatenate all of the fastq files for each sample together but I'm not sure if this is best practice.
With aligners like BWA it will not be an issue if you concatenate all the fastq files form the same experiment and run alignment or align and merge the aligned files.
http://seqanswers.com/forums/showthread.php?t=23207