Hello, I have been previously posting about finding and testing benchmarking datasets for different RNA-Seq variant calling pipelines, and upon using the giab HG002 and HG005 datasets i had a problem with the results as the hap.py F1 Scores showed almost no correlation with the DNA variants truth sets, even though many people have been benchmarking RNA variants against DNA truth sets. I have been searching for the cause ever since and came across a benchmarking strategy used by several studies, where they restrict the scoring to a certain coverage or certain feature (e.g. CDS).
My question is, should i depend on this benchmark strategy? It has surely improved my hap.py scores but i am afriad it could have any bias effects that i am not aware of, and if so, on what basis shall i decide the coverage to filter my regions with, i am going to be focusing on CDS/exon variants.
Thank you all so much in advance.