Entering edit mode
19 months ago
joe_genome
▴
50
Hello all,
Just checked different forums and generally, I see that it would be useful to use samtools
or picard-tools
for comparing alignment files. Here I want to compare the aligned output files using two different alignment algorithms. In this case, I had some general questions:
- I was wondering if it would be fine to compare files in their CRAM format or preferable to convert my CRAM file back to BAM?
- Is
samtools
preferable here, was considering just usingsamtools flagstat
? - What other metrics should be taken into consideration?
Thanks for the input :)
What is the use case for
comparing
alignment files? What criteria do you want to "compare" the files on?Many NGS aligners work non-deterministically (i.e they may/will produce slightly different results for each run). So unless your aligner has a deterministic mode the alignment files will not be identical and thus can't be byte compared. (Note: this is my understanding as a non-computer scientist and may be incorrect).
GenoMax Was thinking of considering "functional equivalence" and considering general alignment statistics for the alignment (BAM/CRAM) files, which includes the mapping rate, insert size distribution, read quality distribution, and GC bias.
If the comparisons are at an aggregate level like what you are describing then it should be reasonable.
flagstat
can indeed be a simple/rapid test.CollectInsertSizeMetrics
(Picard) orbamPEFragmentSize
(deepTools) could be for those.Question is would differences (if any) be meaningful since most aligners should be producing reasonable results (unless one was not appropriate e.g. using a non-splice aware aligner with spliced data).