? in cram files
1
0
Entering edit mode
3.1 years ago
joe_genome ▴ 50

I have an original bam file that when compressed to cram format, the quality encoding scores are lost and replaced as question marks ? and other symbols. The following questions come up:

  1. Why is it that the original base quality scores are changed when compressing from bam to cram?
    1. Is it possible to map back the base quality scores from a cram file to original bam file using a reference file?

Example:

Converted Cram File

CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAACCTTACCATAAACCTAACCCTAACCCAAAACCTAACCCATAAACAAACCATAAAAAAAAAAATAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

????????????????????????????????????????????+5?++5?55????+??++?5??'+??'+'+?+++++??'?5++++'+++&+?5+++'++++++'++'++++++++++?+++++????+???+?????+?????+???

sequencing genomics • 2.0k views
ADD COMMENT
0
Entering edit mode

As an alternative, you might also want to check my Genozip tool, which usually compresses better than CRAM, and is 100% lossless. It can even compress cram files. Some benchmarks here: https://genozip.com/benchmarks.html

ADD REPLY
0
Entering edit mode

Please make a tool post rather than putting this in unrelated threads.

ADD REPLY
0
Entering edit mode

in addition, the compression is not fully open/free. It needs licensing in clinical and commercial settings.

ADD REPLY
2
Entering edit mode
3.1 years ago
  1. The base qualities don't have to be lost, that's an optional feature and should generally not produce what you posted.
  2. Unlikely. Someone messed something up when creating the CRAM file. Note that phred scores aren't actually that informative when it comes to SNP calling these days, which is why newer sequencers are starting to bin them.

Overall, review how the CRAM file was made, likely there was a mistake at that point.

ADD COMMENT
0
Entering edit mode

Thanks for the response Devon!

  1. Is there an optional feature when using samtools then I believe? I was trying to use RevertSam to put the cram back to it's original state (bam) and thought with the reference would get the scores but didn't happen.
  2. The phred scores are needed in some pipelines, hence why I wanted to keep them.

Thanks

ADD REPLY
1
Entering edit mode

Can you try samtools view to do the conversions of CRAM to BAM? Using a third party tool like Picard may not be following the latest CRAM/BAM specs.

ADD REPLY
1
Entering edit mode

Samtools itself doesn't have an option to modify base qualities, it turns out that it's just read names and some auxiliary information (MD tags) that it can be told to store in a lossy manner. So the error must have occurred upstream if you used samtools for conversion.

ADD REPLY
0
Entering edit mode

Thanks Devon, this clarifies quite a bit what I was looking for and the direction I need to take!

ADD REPLY

Login before adding your answer.

Traffic: 1645 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6