Question

huge difference when comparing 2 vcf files from the same sample

0

Entering edit mode

2.6 years ago

Sara ▴ 260

I have 2 vcf files from 2 different pipelines and I am trying to compare them. To do so I tried 2 things:

I used vcfeval (for SNV and indels separately) to get sensitivity and precision which are quite high for both vcf files
I got the number of events which are common between 2 vcf files (almost 12000) and also the number of unique events for each vcf file. Number of unique events for one files is 2300 and for the other one is 851.

Since the same input file was used for both pipeline and for both of them sensitivity and precision are quite high, how can I interpret the high number of unique events for these files. Since sensitivity and precision are quite high, I do not think those unique events are artifact. How would you interpret such results?

VCF • 1.0k views

ADD COMMENT • link updated 2.6 years ago by Jeremy Leipzig 22k • written 2.6 years ago by Sara ▴ 260

1

Entering edit mode

It is not unexpected to have different results if you run different pipelines. To interpret the differences in results (i.e, the number of unique results), you should probably try to understand the differences between the two pipelines you used. Without knowing what are the pipeline and the tresholds used in the analysis, I doubt anyone here would be able to provide a more specific interpretation.

ADD REPLY • link 2.6 years ago by Carlo Yague 8.9k

score 1 · Answer 1 · 2022-04-19

1

Entering edit mode

2.6 years ago

Jeremy Leipzig 22k

I would not expect too many SNV idiosyncrasies but there are often differences in how indels and complex or composite events are called https://genome.sph.umich.edu/wiki/Variant_Normalization

ADD COMMENT • link 2.6 years ago by Jeremy Leipzig 22k