Support ordinal types

The code for determining the similarity of two condition thresholds is shown below:

```
PERMISSIBLE_DELTA = 0.1
…
def condition_similarity(condition1: Condition, condition2: Condition):
    # Different attributes
    if condition1.attribute != condition2.attribute:
        return 0

    # Different operators
    # TODO: Extend???
    if condition1.operator != condition2.operator:
        return 0

    # Handle <= as a special case as per paper
    if condition1.operator == Operator.LE and condition2.operator == Operator.LE:
        t = abs(PERMISSIBLE_DELTA * condition1.threshold)
        x = abs(condition1.threshold - condition2.threshold)
        if x == 0:
            return 1
        return 1 - (x / t) if x < t else 0
    return 1
```
(The original code also contained a bug in the calculation of the tollerance, t, which was fixed in PR #6) 

This threshold logic is not appropriate in case of ordinal numbers. For example, the [UCI Poker Hand](https://archive.ics.uci.edu/dataset/158/poker+hand) dataset represents the rank of cards as numbers between 1-13. As `PERMISSIBLE_DELTA` = 1.1, a Queen (12) is has a threshold, `t`, of 12 * 0.1 = 1.2, which means it would be considered similar to a Jack (11) or King (13), but an Ace (1) would have a threshold, `t`, of 1 * 0.1 = 0.1 so wouldn’t be considered similar to any other card.

The `similar_tree` module needs to be modified to allow a list of attributes to be treated as ordinal numbers, and tollerance threshold logic adjusted accordingly. The condition similarity should be 1 if the thresholds represent the same partitioning (e.g. <= 2.0 is the same as <= 2.9 as they both split {1, 2} vs {3, 4, ..}), and 0 otherwise.

Secondly, the code only deals with the case of two `<=` operators, not two `>` operators. In the case of two `>` operators it will return 1 (perfect similarity) even if the thresholds differ.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support ordinal types #10

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support ordinal types #10

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions