I have an original bam file that when compressed to cram format, the quality encoding scores are lost and replaced as question marks ? and other symbols. The following questions come up:
- Why is it that the original base quality scores are changed when
compressing from bam to cram?
- Is it possible to map back the base quality scores from a cram file to original bam file using a reference file?
Example:
Converted Cram File
CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAACCTTACCATAAACCTAACCCTAACCCAAAACCTAACCCATAAACAAACCATAAAAAAAAAAATAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
????????????????????????????????????????????+5?++5?55????+??++?5??'+??'+'+?+++++??'?5++++'+++&+?5+++'++++++'++'++++++++++?+++++????+???+?????+?????+???
As an alternative, you might also want to check my Genozip tool, which usually compresses better than CRAM, and is 100% lossless. It can even compress cram files. Some benchmarks here: https://genozip.com/benchmarks.html
Please make a tool post rather than putting this in unrelated threads.
in addition, the compression is not fully open/free. It needs licensing in clinical and commercial settings.