SO:coordinate in first line of your BAM says that your file is coordinate-sorted (SO = Sort Order). The merging of two files will destroy this sorting, that's why samtools generates a warning. Don't worry, just re-sort the merged BAM and it should be fine.
Hi Santosh, thanks for the clarification, I will re-sort the file. But does that mean that sorting before merging is not necessary? I always though sorting before merging was for the better, but this suggest that there should be two merging steps for my files?
Actually, sorting is required for merging. The point of merging is not just concatenating the two files, but to also preserve the sort and create a well-formatted header. From samtools manual: http://www.htslib.org/doc/samtools.html
Merge multiple sorted alignment files, producing a single sorted output file that contains all the input records and maintains the
existing sort order.
If -h is specified the @SQ headers of input files will be merged into
the specified header, otherwise they will be merged into a composite
header created from the input headers. If in the process of merging
@SQ lines for coordinate sorted input files, a conflict arises as to
the order (for example input1.bam has @SQ for a,b,c and input2.bam has
b,a,c) then the resulting output file will need to be re-sorted back
into coordinate order.
The issue is that your headers are in different orders or one file has chromosomes/contigs that the other doesn't. Consequently, while the input files might be nicely sorted, it's not immediately clear that the output will be properly sorted. As Santosh mentioned, you can just resort the merged file to fix this.
The bigger question is really how this happened to begin with. I presume you downloaded one of the files or that they in some way came from different sources. If you made both of these yourself, then either you used two different indices, or aligners that spit things out in different orders (that's not good) or something along those lines. If this is the case then it should be fixed because it'll cause you untold problems that you don't even know about yet.
Hi Devon, thanks for the input. I will re-sort the files as suggested. Now, as to your question about how this happened in the first place: these files are the output from an Ion Torrent sequencing run, and unlike other platforms, the alignment suggested is a 'two step alignment' where you first perform an alignment with TopHat2 (may also use STAR) and use the unaligned reads to align with only Bowtie2 (using soft clipping local mode).
The index used was the same, I just noticed that the Bowtie2 output was unsorted (TopHat2 already sorted by coordinate), so I sorted all the files before merging. So I am assuming using the two step alignment might be the reason, but I am not sure how it may be avoided for this particular NGS platform. If you know anything, I would appreciate any info! :) Thanks!
what is the output of :
Hi Pierre,
The output from your command is:
Please, let me know if you need anything else. Thanks!
SO:coordinate
in first line of your BAM says that your file is coordinate-sorted (SO = Sort Order). The merging of two files will destroy this sorting, that's why samtools generates a warning. Don't worry, just re-sort the merged BAM and it should be fine.Hi Santosh, thanks for the clarification, I will re-sort the file. But does that mean that sorting before merging is not necessary? I always though sorting before merging was for the better, but this suggest that there should be two merging steps for my files?
Thanks again!
Actually, sorting is required for merging. The point of merging is not just concatenating the two files, but to also preserve the sort and create a well-formatted header. From samtools manual: http://www.htslib.org/doc/samtools.html
That last sentence is the important one here.
absolutely! that explains it all
Perfect, that is what I thought, but wanted to confirm and make sure i didn't misunderstood.