I have downloaded cram files, but I don't know which exact version of hg38 was used to align the reads. Can you find the corresponding fasta if you have the M(D)5 strings? For now, it seems I can only test and see when it does. Of course the obvious solution is to ask the person who generated it, but that does not always work out. A part of the header looks like:
@SQ SN:chr1 LN:248956422 M5:6aef897c3d6ff0c78aff06ac189178dd UR:/scratch/hg38.fa
@SQ SN:chr2 LN:242193529 M5:f98db672eb0993dcfdabafe2a882905c UR:/scratch/hg38.fa
@SQ SN:chr3 LN:198295559 M5:76635a41ea913a405ded820447d067b0 UR:/scratch/hg38.fa
@SQ SN:chr4 LN:190214555 M5:3210fecf1eb92d5489da4346b3fddc6e UR:/scratch/hg38.fa
@SQ SN:chr5 LN:181538259 M5:a811b3dc9fe66af729dc0dddf7fa4f13 UR:/scratch/hg38.fa
@SQ SN:chr6 LN:170805979 M5:5691468a67c7e7a7b5f2a3a683792c29 UR:/scratch/hg38.fa
@SQ SN:chr7 LN:159345973 M5:cc044cc2256a1141212660fb07b6171e UR:/scratch/hg38.fa
@SQ SN:chr8 LN:145138636 M5:c67955b5f7815a9a1edfaa15893d3616 UR:/scratch/hg38.fa
@SQ SN:chr9 LN:138394717 M5:1b79085d423b806957b7564497cac5e4 UR:/scratch/hg38.fa
@SQ SN:chr10 LN:133797422 M5:c0eeee7acfdaf31b770a509bdaa6e51a UR:/scratch/hg38.fa
@SQ SN:chr11 LN:135086622 M5:1511375dc2dd1b633af8cf439ae90cec UR:/scratch/hg38.fa
@SQ SN:chr12 LN:133275309 M5:96e414eace405d8c27a6d35ba19df56f UR:/scratch/hg38.fa
@SQ SN:chr13 LN:114364328 M5:787e7eb2d9187bbc20334062332569d4 UR:/scratch/hg38.fa
@SQ SN:chr14 LN:107043718 M5:e0f0eecc3bcab6178c62b6211565c807 UR:/scratch/hg38.fa
@SQ SN:chr15 LN:101991189 M5:f036bd11158407596ca6bf3581454706 UR:/scratch/hg38.fa
@SQ SN:chr16 LN:90338345 M5:9adbaf8ef0094c71470e87eb18e9b5d4 UR:/scratch/hg38.fa
@SQ SN:chr17 LN:83257441 M5:f9a0fb01553adb183568e3eb9d8626db UR:/scratch/hg38.fa
@SQ SN:chr18 LN:80373285 M5:11eeaa801f6b0e2e36a1138616b8ee9a UR:/scratch/hg38.fa
Googling for the checksum value leads to ENA Browser pages that also have the MD5 sums on the page for the relevant chromosomes (for example):
https://www.ebi.ac.uk/ena/browser/view/CM000664
https://www.ebi.ac.uk/ena/browser/view/CM000679
Great, that led me to GCA_000001405... However, it is not correct for all chromosomes. For example, that chromosome 13 (https://www.ebi.ac.uk/ena/browser/view/CM000675.2) has an MD5 checksum of a5437debe2ef9c9ef8f3ea2874ae1d82, while the cram I have has 787e7eb2d9187bbc20334062332569d4 :-(
I found someone on Twitter to point me to the right one (https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1KG_ONT_VIENNA/reference/1KG_ONT_VIENNA_hg38.fa.gz).
Not sure if there could be a better way :)