Hello,
I have a phased assembly (diploid), and I am looking for a way to call SNPs once this diploid assembly is mapped on a haploid reference. The difficulty is that coverage is max 2 (among other things, I guess). What strategy would you use? I tried bbtools callvariants.sh, with clearfilters, which, if I understand correctly, calls "everything" without applying any filter. Still, surprisingly I recovered only 650 K SNPs, whereas I "should" see 1.7 million of them, more or less. I am just fishing for ideas. I found minigraph cactus that could be useful. Do you have any other suggestions?
Thanks
Alex
Do you have separate files of "references" for the two "chromosome" sets?
I am sorry I don't understand the question, deleted my previous answer because I realised I was not replying. Do you mean my fasta reference genome is the same in both cases.?
EDIT: the bam of the diploid on haploid assembly doesn't countain a MD tag, because it was produced with minimap2, which doesn't output a MD tag. Could this cause the issue? Regarding your question, yes the reference fasta is the same. I have one bam with mapped long reads that gives 1.7 million SNPs, and the bam with the diploid assembly mapped on the ref assembly, and the callvariants that gives 650 K SNPs only.
I was curious since you said you had a phased diploid assembly I thought you had two sets of references representing the two sets of chromosomes.
You can get
minimap2
to output the MD tag using following option.