I am using data which appears to have a higher than expected distribution of quality scores.
From picard-tools: QualityScoreDistribution.jar, the highest scores are over 70:
http://s483.photobucket.com/user/Kez_Cleal/media/1_DB31.qualDist_zpsf2xjcckq.png.html
When I try and use GATK it complains the scores are too high, using BaseRecalibrator:
##### ERROR MESSAGE: SAM/BAM file SAMFileReader{/home/kez/KC_15-05-20/inputs/DB31/QC_hg19/1_DB31/1_DB31.mkDup.rg.rAln.bam} is malformed: we encountered an extremely high quality score (72) with BAQ correction factor of 8; the BAM file appears to be using the wrong encoding for quality scores
I have no Idea what scores are being used in the file. I have read that --fix_misencoded_quality_scores
is ill-advised unless you know what you are doing. How do I find out if I can use this option?
Thank you. The sequencing of these samples was outsourced to BGI in china although I am unaware what platform was used.