I downloaded some CRAM files for variant calling from the 1000 genomes FTP server. I also downloaded the reference genome and MD5 cache as per the instructions in this README doc.
However, running Picard tools' ValidateSAMfile gave errors (refer below) in both the CRAM as well as the subsequently converted (to) BAM. Running FixMateInformation gave zero errors after revalidation.
Is anyone else encountering such issues with 1000genomes GRCh38 CRAM files ? What could be the source of these errors ?
Errors:
Mate negative strand flag does not match read negative strand of mate
Mate alignment does not match alignment start of mate
Mate CIGAR string does not match CIGAR string of mate
.
.
.
Can you show the exact commands you've used? e.g. FixMateInformation has a
ADD_MATE_CIGAR=true
options, was it used?For what it's worth, even the Broad Institute doesn't seem totally confident that their tool works well with CRAM files, but also can you make sure you're using the latest version of Picard tools? Maybe it's better now. I know I've had bad luck using outputs of samtools (which now includes cramtools) into Picard. Annecdotally I found them pretty much incompatible in a specific project and never figured out why.
You could try looking at external tools for BAM/CRAM validation, e.g. https://genome.sph.umich.edu/wiki/BamUtil