classification integration docs and example

dyumanaditya · dyumanaditya · commit b3c0d8574d45 · 2025-03-22T20:01:59.000+01:00
diff --git a/docs/source/user_guide/9_machine_learning_integration.rst b/docs/source/user_guide/9_machine_learning_integration.rst
@@ -0,0 +1,133 @@
+Integrating PyReason with Machine Learning
+===========================
+
+PyReason can be integrated with machine learning models by incorporating the predictions from the machine learning model
+as facts in the graph and reasoning over them with logical rules. This allows users to combine the strengths of machine learning
+models with the interpretability and reasoning capabilities of PyReason.
+
+Classifier Integration Example
+-----------------------------
+.. note::
+   Find the full, executable code `here <https://github.com/lab-v2/pyreason/blob/main/examples/classifier_integration_ex.py>`_
+
+In this section, we will outline how to perform ML integration using a simple classification example. We assume
+that we have a fraud detection model that predicts whether a transaction is fraudulent or not. We will use the predictions
+to reason over a knowledge base of account information to identify potential fraudulent activities. For this example, we
+use an untrained linear model just to demonstrate, but in practice, this can be replaced by any PyTorch classification model.
+
+We start by defining our classifier model.
+
+.. code-block:: python
+
+    import torch
+    import torch.nn as nn
+
+
+    model = nn.Linear(5, 2)
+    class_names = ["fraud", "legitimate"]
+
+Next, we define how we want to incorporate the predictions from the model into the graph. Since the model outputs a probability
+over the classes, we can specify how we integrate this probability into the graph. There are a few options the user can define
+using a ``ModelInterfaceOptions`` object:
+
+1. ``threshold`` **(float)**: The threshold beyond which the prediction is incorporated as a fact in the graph. If the probability
+   of the class is lower than the threshold, no information is added to the graph. Defaults to 0.5.
+2. ``set_lower_bound`` **(bool)**: If True, the lower bound of the probability is set as the fact in the graph.
+   if False the lower bound will be 0. Defaults to True.
+3. ``set_upper_bound`` **(bool)**: If True, the upper bound of the probability is set as the fact in the graph.
+   if False, the upper bound will be 1. Defaults to True.
+4. ``snap_value`` **(float)**: If set, all the probabilities that crossed the threshold are snapped to this value. Defaults to 1.0.
+   The upper/lower bounds are set to this value according to the ``set_lower_bound`` and ``set_upper_bound`` options.
+
+In our binary classification model, we want predictions that cross the threshold of ``0.5`` to be added to the graph.
+For this example we will use a ``snap_value`` of ``1.0`` and set the lower bound of the probability as the fact in the graph.
+Therefore any prediction with a probability greater than ``0.5`` will be added to the graph as a fact with bounds of ``[1.0, 1.0]``.
+
+.. code-block:: python
+
+    interface_options = pr.ModelInterfaceOptions(
+        threshold=0.5,         # Only process probabilities above 0.5
+        set_lower_bound=True,  # Modify the lower bound.
+        set_upper_bound=False, # Keep the upper bound unchanged at 1.0.
+        snap_value=1.0         # Use 1.0 as the snap value.
+    )
+
+
+Next, we create a ``LogicIntegratedClassifier`` object that helps us integrate the predictions from the model into the graph.
+
+.. code-block:: python
+
+    fraud_detector = pr.LogicIntegratedClassifier(
+        model,
+        class_names,
+        model_name="fraud_detector",
+        interface_options=interface_options
+    )
+
+
+To run the model, we perform the same steps as we would with a regular PyTorch model. In this example we use a dummy input.
+This gives us a list of facts that can then be added to PyReason.
+
+.. code-block:: python
+
+    transaction_features = torch.rand(1, 5)
+
+    # Get the prediction from the model
+    logits, probabilities, classifier_facts = fraud_detector(transaction_features)
+
+We now add the facts to PyReason as normal.
+
+.. code-block:: python
+
+    # Add the classifier-generated facts.
+    for fact in classifier_facts:
+        pr.add_fact(fact)
+
+
+
+Next, we define a knowledge graph that contains information about accounts and its relationships. we also define some context
+about the transaction and rules that we want to reason over with the classifier predictions.
+
+.. code-block:: python
+    # Create a networkx graph representing a network of accounts.
+    G = nx.DiGraph()
+    # Add account nodes.
+    G.add_node("AccountA", account=1)
+    G.add_node("AccountB", account=1)
+    G.add_node("AccountC", account=1)
+
+    # Add edges with an attribute "associated".
+    G.add_edge("AccountA", "AccountB", associated=1)
+    G.add_edge("AccountB", "AccountC", associated=1)
+    pr.load_graph(G)
+
+    # Add additional contextual information:
+    # 1. A fact indicating the transaction comes from a suspicious location. This could come from a separate fraud detection system.
+    pr.add_fact(pr.Fact("suspicious_location(AccountA)", "transaction_fact"))
+
+    # Define a rule: if the fraud detector flags a transaction as fraud and the transaction info is suspicious,
+    # then mark the associated account (AccountA) as requiring investigation.
+    pr.add_rule(pr.Rule("requires_investigation(acc) <- account(acc), fraud_detector(fraud), suspicious_location(acc)", "investigation_rule"))
+
+    # Define a propagation rule:
+    # If an account requires investigation and is connected (via the "associated" relationship) to another account,
+    # then the connected account is also flagged for investigation.
+    pr.add_rule(pr.Rule("requires_investigation(y) <- requires_investigation(x), associated(x,y)", "propagation_rule"))
+
+
+Finally, we run the reasoning process and print the output.
+
+.. code-block:: python
+
+    # Run the reasoning engine to allow the investigation flag to propagate through the network.
+    pr.settings.allow_ground_rules = True   # The ground rules allow us to use the classifier prediction facts
+    pr.settings.atom_trace = True
+    interpretation = pr.reason()
+
+    trace = pr.get_rule_trace(interpretation)
+    print(f"RULE TRACE: \n\n{trace[0]}\n")
+
+
+
+This simple example demonstrates the integration of a machine learning model with PyReason. In practice more complex models
+can be used, along with larger and more complex knowledge graphs.
diff --git a/examples/classifier_integration_ex.py b/examples/classifier_integration_ex.py
@@ -0,0 +1,83 @@
+import pyreason as pr
+import torch
+import torch.nn as nn
+import networkx as nx
+
+# --- Part 1: Fraud Detector Model Integration ---
+
+# Create a dummy PyTorch model for transaction fraud detection.
+# Each transaction is represented by 5 features and is classified into "fraud" or "legitimate".
+model = nn.Linear(5, 2)
+class_names = ["fraud", "legitimate"]
+
+# Create a dummy transaction feature vector.
+transaction_features = torch.rand(1, 5)
+
+# Define integration options.
+# Only probabilities above 0.4 are considered for adjustment.
+interface_options = pr.ModelInterfaceOptions(
+    threshold=0.5,       # Only process probabilities above 0.6
+    set_lower_bound=True,  # For high confidence, adjust the lower bound.
+    set_upper_bound=False, # Keep the upper bound unchanged.
+    snap_value=1.0      # Use 1.0 as the snap value.
+)
+
+# Wrap the model using LogicIntegratedClassifier
+fraud_detector = pr.LogicIntegratedClassifier(
+    model,
+    class_names,
+    model_name="fraud_detector",
+    interface_options=interface_options
+)
+
+# Run the model to obtain logits, probabilities, and generated PyReason facts.
+logits, probabilities, classifier_facts = fraud_detector(transaction_features)
+
+print("=== Fraud Detector Output ===")
+print("Logits:", logits)
+print("Probabilities:", probabilities)
+print("\nGenerated Classifier Facts:")
+for fact in classifier_facts:
+    print(fact)
+
+# Add the classifier-generated facts.
+for fact in classifier_facts:
+    pr.add_fact(fact)
+
+# --- Part 2: Create and Load a Networkx Graph representing an account knowledge base ---
+
+# Create a networkx graph representing a network of accounts.
+G = nx.DiGraph()
+# Add account nodes.
+G.add_node("AccountA", account=1)
+G.add_node("AccountB", account=1)
+G.add_node("AccountC", account=1)
+# Add edges with an attribute "relationship" set to "associated".
+G.add_edge("AccountA", "AccountB", associated=1)
+G.add_edge("AccountB", "AccountC", associated=1)
+pr.load_graph(G)
+
+# --- Part 3: Set Up Context and Reasoning Environment ---
+
+# Add additional contextual information:
+# 1. A fact indicating the transaction comes from a suspicious location. This could come from a separate fraud detection system.
+pr.add_fact(pr.Fact("suspicious_location(AccountA)", "transaction_fact"))
+
+# Define a rule: if the fraud detector flags a transaction as fraud and the transaction info is suspicious,
+# then mark the associated account (AccountA) as requiring investigation.
+pr.add_rule(pr.Rule("requires_investigation(acc) <- account(acc), fraud_detector(fraud), suspicious_location(acc)", "investigation_rule"))
+
+# Define a propagation rule:
+# If an account requires investigation and is connected (via the "associated" relationship) to another account,
+# then the connected account is also flagged for investigation.
+pr.add_rule(pr.Rule("requires_investigation(y) <- requires_investigation(x), associated(x,y)", "propagation_rule"))
+
+# --- Part 4: Run the Reasoning Engine ---
+
+# Run the reasoning engine to allow the investigation flag to propagate through the network.
+pr.settings.allow_ground_rules = True
+pr.settings.atom_trace = True
+interpretation = pr.reason()
+
+trace = pr.get_rule_trace(interpretation)
+print(f"RULE TRACE: \n\n{trace[0]}\n")