pyMAISE.PostProcessor

class pyMAISE.PostProcessor(data, model_configs, new_model_settings=None, yscaler=None)[source]

Bases: object

Assess the performance of the top-performing models.

Parameters:

data (tuple of xarray.DataArray) – The training and testing data given as (xtrain, xtest, ytrain, ytest).
model_configs (single or list of dict of tuple(pandas.DataFrame, model object)) – The model configurations produced by pyMAISE.Tuner.
new_model_settings (dict of dict of int, float, str, or None, default=None) – Updated model settings given as a dictionary under the model’s key.
yscaler (callable or None, default=None) – An object with an inverse_transform method such as min-max scaler from sklearn [PVG+11]. This should have been fit using pyMAISE.preprocessing.scale_data() before hyperparameter tuning. If None then scaling is not undone.

__init__(data, model_configs, new_model_settings=None, yscaler=None)[source]

Methods

`confusion_matrix`([axs, idx, model_type, ...])	Create training and testing confusion matrix.
`diagonal_validation_plot`([ax, y, idx, ...])	Create a diagonal validation plot for a given model.
`get_model`([idx, model_type, sort_by, direction])	Get a model.
`get_params`([idx, model_type, sort_by, direction])	Returns the hyperparameters for a given model.
`get_predictions`([idx, model_type, sort_by, ...])	Get a model's training and testing predictions.
`metrics`([y, model_type, metrics, sort_by, ...])	Calculate model performance of predicting output training and testing data.
`nn_learning_plot`([ax, idx, model_type, ...])	Create a learning plot for a given neural network.
`nn_network_plot`([idx, model_type, sort_by, ...])	Plot NN network.
`print_model`([idx, model_type, sort_by, ...])	Print a models tuned hyperparameters.
`save_models`([num_models, idxs, model_types, ...])	Saves the top models.
`validation_plot`([ax, y, idx, model_type, ...])	Create a validation plot for a given model.

confusion_matrix(axs=None, idx=None, model_type=None, sort_by=None, direction=None, colorbar=False, annotate=True, round=2)[source]

Create training and testing confusion matrix.

Parameters:

axs (list of 2 matplotlib.pyplot.axis or None, default=None) – If not given then an axes are created.
idx (int or None, default=None) – The index in the pyMAISE.PostProcessor.metrics() pandas.DataFrame. If None, then sort_by is used.
model_type (str or None, default=None) – The model name to get. Will get the best model predictions based on sort_by.
sort_by (str or None, detault=None) – The metric to sort the pandas.DataFrame from pyMAISE.PostProcessor.metrics() by. If None then test r2_score is used for pyMAISE.ProblemType.REGRESSION and test accuracy_score is used for pyMAISE.ProblemType.CLASSIFICATION.
direction ('min', 'max', or None, default=None) – The direction to sort_by. It is only required if sort_by is not a default metric.
colorbar (Boolean, default=False) – Whether to include a colorbar.
annotate (Boolean, default=True) – Whether to include annotations (number and percentage).
round (int, default=2) – Number of digits to round percentage in annotation.

Returns:

axs – The two confusion matrix axes: (cm_train, cm_test)

Return type:

tuple of matplotlib.pyplot.axis

diagonal_validation_plot(ax=None, y=None, idx=None, model_type=None, sort_by=None, direction=None)[source]

Create a diagonal validation plot for a given model.

Parameters:

ax (matplotlib.pyplot.axis or None, default=None) – If not given, then an axis is created.
y (single or list of int or str or None, default=None) – The output to plot. If None then all outputs are plotted.
idx (int or None, default=None) – The index in the pyMAISE.PostProcessor.metrics() pandas.DataFrame. If None, then sort_by is used.
model_type (str or None, default=None) – The model name to get. Will get the best model predictions based on sort_by.
sort_by (str or None, detault=None) – The metric to sort the pandas.DataFrame from pyMAISE.PostProcessor.metrics() by. If None then test r2_score is used for pyMAISE.ProblemType.REGRESSION and test accuracy_score is used for pyMAISE.ProblemType.CLASSIFICATION.
direction ('min', 'max', or None, default=None) – The direction to sort_by. It is only required if sort_by is not a default metric.

Returns:

ax – The plot.

Return type:

matplotlib.pyplot.axis

get_model(idx=None, model_type=None, sort_by=None, direction=None)[source]

Get a model. The model with the chosen hyperparameters is refit and returned.

Parameters:

idx (int or None, default=None) – The index in the pyMAISE.PostProcessor.metrics() pandas.DataFrame. If None, then sort_by is used.
model_type (str or None, default=None) – The model name to get. Will get the best model predictions based on sort_by.
sort_by (str or None, detault=None) – The metric to sort the pandas.DataFrame from pyMAISE.PostProcessor.metrics() by. If None then test r2_score is used for pyMAISE.ProblemType.REGRESSION and test accuracy_score is used for pyMAISE.ProblemType.CLASSIFICATION.
direction ('min', 'max', or None, default=None) – The direction to sort_by. It is only required if sort_by is not a default metric.

Returns:

model – The model refit based on the parameters from the arguments.

Return type:

sklearn or keras model

get_params(idx=None, model_type=None, sort_by=None, direction=None)[source]

Returns the hyperparameters for a given model.

Parameters:

idx (int or None, default=None) – The index in the pyMAISE.PostProcessor.metrics() pandas.DataFrame. If None, then sort_by is used.
model_type (str or None, default=None) – The model name to get. Will get the best model predictions based on sort_by.
sort_by (str or None, detault=None) – The metric to sort the pandas.DataFrame from pyMAISE.PostProcessor.metrics() by. If None then test r2_score is used for pyMAISE.ProblemType.REGRESSION and test accuracy_score is used for pyMAISE.ProblemType.CLASSIFICATION.
direction ('min', 'max', or None, default=None) – The direction to sort_by. It is only required if sort_by is not a default metric.

Returns:

params – The hyperparameters of the model.

Return type:

pandas.DataFrame

get_predictions(idx=None, model_type=None, sort_by=None, direction=None)[source]

Get a model’s training and testing predictions.

Parameters:

idx (int or None, default=None) – The index in the pyMAISE.PostProcessor.metrics() pandas.DataFrame. If None, then sort_by is used.
model_type (str or None, default=None) – The model name to get. Will get the best model predictions based on sort_by.
sort_by (str or None, detault=None) – The metric to sort the pandas.DataFrame from pyMAISE.PostProcessor.metrics() by. If None then test r2_score is used for pyMAISE.ProblemType.REGRESSION and test accuracy_score is used for pyMAISE.ProblemType.CLASSIFICATION.
direction ('min', 'max', or None, default=None) – The direction to sort_by. It is only required if sort_by is not a default metric.

Returns:

yhat – The predicted training and testing data given as (train_yhat, test_yhat).

Return type:

tuple of numpy.array

metrics(y=None, model_type=None, metrics=None, sort_by=None, direction=None)[source]

Calculate model performance of predicting output training and testing data. Default metrics are always evaluated depending on the pyMAISE.Settings.problem_type. For pyMAISE.ProblemType.REGRESSION problems, the default metrics are from [PVG+11] and include:

R2: r-squared,
MAE: mean absolute error,
MAPE: mean absolute percentage error,
RMSE: root mean squared error, the square root of MSE,
RMSPE: root mean squared percentage error.

For pyMAISE.ProblemType.CLASSIFICATION problems, the default metrics are

Accuracy: accuracy,
Recall: recall,
Precision: precision,
F1: f1.

These metrics are evaluated for both the training and testing data sets.

Parameters:

y (int, str, or None, default=None) – The output to determine performance. If None then all outputs are used.
model_type (str or None, default=None) – Determine the performance of this model. If None then all models are evaluated.
metrics (dict of callable or None, default=None) – Dictionary of callable metrics such as sklearn’s metrics other than those already default to this method. Must take two arguments: (y_true, y_pred). The key is used as the name in performance_data.
sort_by (str or None, default=None) – The metric to sort the return by. This should differentiate training and testing. For example, we can sort by testing mean_squared_error. If None then the default is test r2_score for pyMAISE.ProblemType.REGRESSION and test accuracy_score for pyMAISE.ProblemType.CLASSIFICATION.
direction (min, max, or None) – Direction to sort_by. Only required if a metric is defined in metrics that you want to sort the return by.

Returns:

performance_data – The performance statistics for the models for both the training and testing data.

Return type:

pandas.DataFrame

nn_learning_plot(ax=None, idx=None, model_type=None, sort_by=None, direction=None)[source]

Create a learning plot for a given neural network.

Parameters:

ax (matplotlib.pyplot.axis or None, default=None) – If not given then an axis is created.
idx (int or None, default=None) – The index in the pyMAISE.PostProcessor.metrics() pandas.DataFrame. If None, then sort_by is used.
model_type (str or None, default=None) – The model name to get. Will get the best model predictions based on sort_by.
sort_by (str or None, detault=None) – The metric to sort the pandas.DataFrame from pyMAISE.PostProcessor.metrics() by. If None then test r2_score is used for pyMAISE.ProblemType.REGRESSION and test accuracy_score is used for pyMAISE.ProblemType.CLASSIFICATION.
direction ('min', 'max', or None, default=None) – The direction to sort_by. It is only required if sort_by is not a default metric.

Returns:

ax – The plot.

Return type:

matplotlib.pyplot.axis

nn_network_plot(idx=None, model_type=None, sort_by=None, direction=None, **kwargs)[source]

Plot NN network.

Note

For this to work you must have graphviz installed which can be done through your package manager.

Parameters:

idx (int or None, default=None) – The index in the pyMAISE.PostProcessor.metrics() pandas.DataFrame. If None, then sort_by is used.
model_type (str or None, default=None) – The model name to get. Will get the best model predictions based on sort_by.
sort_by (str or None, detault=None) – The metric to sort the pandas.DataFrame from pyMAISE.PostProcessor.metrics() by. If None then test r2_score is used for pyMAISE.ProblemType.REGRESSION and test accuracy_score is used for pyMAISE.ProblemType.CLASSIFICATION.
direction ('min', 'max', or None, default=None) – The direction to sort_by. It is only required if sort_by is not a default metric.
kwargs – Any arguments related to tensorflow.keras.utils.plot_model() except model.

print_model(idx=None, model_type=None, sort_by=None, direction=None, **kwargs)[source]

Print a models tuned hyperparameters.

Parameters:

idx (int or None, default=None) – The index in the pyMAISE.PostProcessor.metrics() pandas.DataFrame. If None, then sort_by is used.
model_type (str or None, default=None) – The model name to get. Will get the best model predictions based on sort_by.
sort_by (str or None, detault=None) – The metric to sort the pandas.DataFrame from pyMAISE.PostProcessor.metrics() by. If None then test r2_score is used for pyMAISE.ProblemType.REGRESSION and test accuracy_score is used for pyMAISE.ProblemType.CLASSIFICATION.
direction ('min', 'max', or None, default=None) – The direction to sort_by. It is only required if sort_by is not a default metric.
kwargs – Any arguments used by tensorflow.keras.Sequential.summary().

save_models(num_models=10, idxs=None, model_types=None, sort_by=None, direction=None, directory='.')[source]

Saves the top models. Models are names as <Model Type>_<Index in to metrics table>.

Parameters:

num_models (int, default=None) – Number of models to save.
idxs (int, list of ints, None, default=None) – The indices in the pyMAISE.PostProcessor.metrics() pandas.DataFrame. If None, then sort_by is used.
model_types (str, list of str, or None, default=None) – The model name(s) to get. Will get the best model predictions based on sort_by.
sort_by (str or None, detault=None) – The metric to sort the pandas.DataFrame from pyMAISE.PostProcessor.metrics() by. If None then test r2_score is used for pyMAISE.ProblemType.REGRESSION and test accuracy_score is used for pyMAISE.ProblemType.CLASSIFICATION.
direction ('min', 'max', or None, default=None) – The direction to sort_by. It is only required if sort_by is not a default metric.
directory (str, default=".") – Directory to save the models to. All sklearn models will be saved as pickles and the keras models will be in TensorFlow’s SavedModel format.

validation_plot(ax=None, y=None, idx=None, model_type=None, sort_by=None, direction=None)[source]

Create a validation plot for a given model.

Parameters:

ax (matplotlib.pyplot.axis or None, default=None) – If not given, then an axis is created.
y (single or list of int or str or None, default=None) – The output to plot. If None then all outputs are plotted.
idx (int or None, default=None) – The index in the pyMAISE.PostProcessor.metrics() pandas.DataFrame. If None, then sort_by is used.
model_type (str or None, default=None) – The model name to get. Will get the best model predictions based on sort_by.
sort_by (str or None, detault=None) – The metric to sort the pandas.DataFrame from pyMAISE.PostProcessor.metrics() by. If None then test r2_score is used for pyMAISE.ProblemType.REGRESSION and test accuracy_score is used for pyMAISE.ProblemType.CLASSIFICATION.
direction ('min', 'max', or None, default=None) – The direction to sort_by. It is only required if sort_by is not a default metric.

Returns:

ax – The plot.

Return type:

matplotlib.pyplot.axis