Modules containing reusable functions for machine learning visualization plotting
- Python 3.13 or later
python3 -m pip install opengood.py-ml-plot
Note: See Release version badge above for the latest version.
Set up a 2-D classification model plot then display its result visualization.
Notes:
- The example below uses a dataset to train a logistic regression model then display the plot for the training set
- For feature scaling, if required, implement the feature scaling logic in the
feature_scaling
lambda - For predictions, implement the prediction logic in the
predict
lambda
import os
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from src.opengood.py_ml_plot import setup_classification_plot
resource_path = os.path.join(os.path.dirname(__file__), "../resources", "data.csv")
dataset = pd.read_csv(resource_path)
x = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values
x_train, _, y_train, _ = train_test_split(x, y, test_size=0.2, random_state=0)
sc = StandardScaler()
x_train = sc.fit_transform(x_train)
classifier = LogisticRegression(random_state=0)
classifier.fit(x_train, y_train)
setup_classification_plot(
x=x_train,
y=y_train,
cmap=ListedColormap(("salmon", "dodgerblue")),
title="Logistic Regression",
x_label="Age",
y_label="Estimated Salary",
feature_scale=lambda x_set, y_set: (
sc.inverse_transform(x_set), y_set
),
predict=lambda x1, x2: (
classifier.predict(
sc.transform(
np.array([x1.ravel(), x2.ravel()]).T)
).reshape(x1.shape)
),
)
plt.show()
feature_scale
lambda implementation logic for function
setup_classification_plot
is as follows:
- Inverse feature scaling is invoked via a featuring scaling object, such as
the
StandardScalar
objectsc
created earlier for feature scaling x_set
andy_set
are assigned non-feature scaled values of the matrix of features and the dependent variablex_set
values are inverted from their feature-scaled values inx
y_set
values are not inverted and taken directly fromy
predict
lambda implementation logic for function setup_classification_plot
is as follows:
- Classifier object
classifier
methodpredict
is invoked - Since the values of the reshaped 2D array are not feature scaled, the
values are feature scaled via the
transform
method on thesc
object- This method call is not required for models that do not require feature scaling
ravel
function from the NumPy library is used to flatten a multidimensional array into a one-dimensional arrayx1
andx2
are flatten into a 1D array via theravel
function- They are then combined via the
array
function from the NumPy library into a 2D array - The result is then reshaped via the
reshape
function to match the shape ofx1
Visualization implementation logic for function setup_classification_plot
is
as follows:
ListedColormap
class from the Matplotlib library creates an object that generates a colormap visual from a list of colors- If the
feature_scale
lambda is defined,x_set
andy_set
are assigned non-feature scaled values of the matrix of features and the dependent variable from the sets using a feature scaling object, such as theStandardScalar
object created earlier for feature scalingx_set
values are inverted from their feature-scaled values inx
y_set
values are not inverted and taken directly fromy
- If the
feature_scale
lambda is not defined,x_set
andy_set
are assigned the values ofx
andy
, respectively meshgrid
function from the NumPy library returns a tuple of coordinate matrices from coordinate vectors- Two sets of matrices (
x1
andx2
) are returned with coordinate vectors x1
arange
function is called with a defined start and stop intervalx_set[:, 0]
returns all the rows for featurex1
start
parameter- Start of an interval
x_set[:, 0].min()
returns the minimum value for featurex1
- Value of
10
is subtracted for padding
stop
parameter- End of an interval
x_set[:, 0].max()
returns the maximum value for featurex1
- Value of
10
is added for padding
step
parameter- Spacing between values
- Value of
0.25
is added for spacing
x2
arange
function is called with a defined start and stop intervalx_set[:, 1]
returns all the rows for featurex2
start
parameter- Start of an interval
x_set[:, 1].min()
returns the minimum value for featurex2
- Value of
1000
is subtracted for padding - Value of
1000
is used instead of10
due to the difference in scaling for featurex2
vs. featurex1
stop
parameter- End of interval
x_set[:, 1].max()
returns the maximum value for featurex2
- Value of
1000
is added for padding
step
parameter- Spacing between values
- Value of
0.25
is added for spacing
- Two sets of matrices (
- The prediction logic implemented in the
preodict
lambda is executed, and the result is assigned toy_pred
, containing the predictions contourf
function from the Matplotlib library is used for creating filled contour plots- It visualizes 3D data in 2D by drawing filled contours representing constant z-values (heights) on an x-y plane
- These plots are useful for displaying data like temperature distributions, terrain elevations, or any scalar field where the magnitude varies over 2 dimensions
- The most basic use case of
contourf
involves providing a 2D array representing the z-values - Matplotlib automatically determines the x and y coordinates based on the array's indices
X
andY
parameters- The coordinates of the values in
Z
X
andY
must both be 2D arrays with the same shape asZ
x1
is used forX
containingx1
valuesx2
is used forY
containingx2
values
- The coordinates of the values in
Z
parameter- The height values over which the contour is drawn
ravel
function from the NumPy library is used to flatten a multidimensional array into a one-dimensional arrayx1
andx2
are flatten into a 1D array via theravel
function- They are then combined via the
array
function from the NumPy library into a 2D array - The result is then reshaped via the
reshape
function to match the shape ofx1
- Since the values of the reshaped 2D array are not feature scaled, the
values are feature scaled via the
transform
method on thesc
object
alpha
parameter- The alpha blending value, between
0
(transparent) and1
(opaque) - Value of
0.75
is used to make the blending mostly opaque
- The alpha blending value, between
cmap
parameter- The
Colormap
object instance or registered colormap name used to map scalar data to colors salmon
anddodgerblue
are used for aListedColormap
objectsalmon
= 0 or negative classifierdodgerblue
= 1 or positive classifier
- The
xlim
function from the Matplotlib library is used to get or set the x-axis limits of the current axesmin()
andmax()
forx1
are used for the limits
ylim
function from the Matplotlib library is used to get or set the y-axis limits of the current axesmin()
andmax()
forx2
are used for the limits
- The values from
y_set
are iterated over in a for-in loopunique
function from the NumPy library returns sorted, unique elements of an array- Values of
y_set
are made unique and sorted
- Values of
- Iterator variable
i
represents the current row of iteration - Iterator variable
j
represents the classification value for the dependent variable0
negative classifier1
positive classifier
scatter
method from the Matplotlib library creates a scatter plot of data points with the shaded contour showing the classification for the dependent variable- x-axis uses values from
x_set
wherey_set
value = 0 (negative classifier) - y-axis uses values from
x_set
wherey_set
value = 1 (positive classifier) c
parameter- The marker colors
- Uses the
ListedColormap
with the classification colors for the current row at indexi
label
parameter- Sets the label
- Values
0
negative classifier1
positive classifier
- x-axis uses values from
Create Python virtual environment:
cd ~/workspace/opengood-aio/py-ml-plot/.venv
python3 -m venv ~/workspace/opengood-aio/py-ml-plot/.venv
source .venv/bin/activate
python3 -m pip install matplotlib
python3 -m pip install numpy
python3 -m pip install pandas
python3 -m pip install scikit-learn
pip freeze > requirements.txt
python -m pytest tests/