Entering edit mode
8 months ago
shpak.max
▴
50
I have bam files from several runs on the same samples, e.g. MyFile_run1.bam, MyFile_run2.bam, etc. If I merge the files with
samtools merge MyFile_run* -o MyFile_Merged.bam
would there be any issues with the structure of the merged file due to different headers, indexing, etc across the initial files that would create problems with downstream analyses? I wouldn't think so, but I vaguely recall getting end of file errors when I did this with multiple runs of the same sample in the past, and don't remember what has to be done to prevent this (perhaps applying certain functions/arguments to samtools merge).
Are they all aligned to an identical reference?
As far as I know (I didn't run the alignments myself), they are aligned to the same reference genome.
Compare the headers before you merge. Otherwise you have the option of converting back to fastq and realigning.
If they were created using the same reference, the headers should match and there shouldn't be any issues with merging, correct?
However, I get header files of different length using
for the different runs x.
Extracting fastq followed by realigning may be a safe bet.