I have four sets of SOLiD NGS reads that are technical replicates of the same biological sample. My instinct is to map the reads separately for each replicate and then combine them for downstream analyses.
However, does anyone know whether it makes any difference combining the reads pre- or post-mapping?
The only thing i can think might be a problem about the former is that read ids might be replicated...
Normally we wouldn't have technical replicates, but this was done during a trial run of the bar-coding work-flow on a SOLiD 4 machine.
If you're using BAM format to store your data, there's really no reason to lose provenance, even at the resolution of individual reads. Put each tech rep in its own Read Group - the Group is annotated in the header and each read is flagged with a Group identifier. Then you can merge them in files and analysis operations, but still unmerge them later.
As these are technical replicates i am not so much interested in common themes between biological samples, but more like the right way to combine them (as they are from the sample biological sample).
It may be for want i want to do it makes no difference when they are combined.
Sorry if i am not making sense!
I see, though some of the issues mentioned still apply, you may be able to asses variability with respect to the techniques that the replication covers. The moment you merge you lose the fact that there were replication in your data. That may or may not matter in a particular study.
Keith have you got any pointers for using 'samtools merge'? '-n' does not seem to sort and I am not sure how '-r' works. How are the merged BAM files resolved back into their separate files?