Question

Classifying Query Set Vs Truth Set Overlapping Genomic Ranges

1

Entering edit mode

11.4 years ago

14134125465346445 ★ 3.6k

I would like to classify a set of query genomic ranges against a set of truth genomic ranges given a minimum overlap rule of more than half the intersection/union. I call this overlap over the threshold a successful overlap.

If the query range, eg. Q1, successfully overlaps one of the truth ranges (eg. T12), I classify Q1 as True Positive. If it doesn't, I classify it as False Positive.

But I am considering how to classify a case where two query ranges, eg. Q1 and Q2, both successfully overlap the same truth range, eg. T3:

Example:

T3   |----------------------------|
Q1       |-------------------------|
Q2    |--------------------------|

How would people classify Q1 and Q2? Both as True Positives? One as True Positive and the other as False Positive?

classification • 2.1k views

ADD COMMENT • link updated 11.2 years ago by Biostar 20 • written 11.4 years ago by 14134125465346445 ★ 3.6k

0

Entering edit mode

That entirely depends on your question. You can define any number of rules with or without biological backing, but even with a biological basis, there will be gray areas.

I think even if you gave more info--including the exact problem you're trying to address, there would be no single, clear answer.

ADD REPLY • link 11.4 years ago by brentp 24k

0

Entering edit mode

It depends on what question you're trying to answer. Perhaps Q2 would be a True Positive and Q1 a False Positive on the basis of Q2's overlap with T3 being longer in extent than Q1's. Or perhaps you are just categorizing overlaps with T3 above some threshold, which would label Q1 and Q2 as True Positives.

ADD REPLY • link 11.2 years ago by Alex Reynolds 35k