pyMAISE.PostProcessor

class pyMAISE.PostProcessor(data, model_configs, new_model_settings=None, yscaler=None)[source]

Bases: object

Assess the performance of the top-performing models.

Parameters:
  • data (tuple of xarray.DataArray) – The training and testing data given as (xtrain, xtest, ytrain, ytest).

  • model_configs (single or list of dict of tuple(pandas.DataFrame, model object)) – The model configurations produced by pyMAISE.Tuner.

  • new_model_settings (dict of dict of int, float, str, or None, default=None) – Updated model settings given as a dictionary under the model’s key.

  • yscaler (callable or None, default=None) – An object with an inverse_transform method such as min-max scaler from sklearn [PVG+11]. This should have been fit using pyMAISE.preprocessing.scale_data() before hyperparameter tuning. If None then scaling is not undone.

__init__(data, model_configs, new_model_settings=None, yscaler=None)[source]

Methods

confusion_matrix([axs, idx, model_type, ...])

Create training and testing confusion matrix.

diagonal_validation_plot([ax, y, idx, ...])

Create a diagonal validation plot for a given model.

get_model([idx, model_type, sort_by, direction])

Get a model.

get_params([idx, model_type, sort_by, direction])

Returns the hyperparameters for a given model.

get_predictions([idx, model_type, sort_by, ...])

Get a model's training and testing predictions.

metrics([y, model_type, metrics, sort_by, ...])

Calculate model performance of predicting output training and testing data.

nn_learning_plot([ax, idx, model_type, ...])

Create a learning plot for a given neural network.

nn_network_plot([idx, model_type, sort_by, ...])

Plot NN network.

print_model([idx, model_type, sort_by, ...])

Print a models tuned hyperparameters.

save_models([num_models, idxs, model_types, ...])

Saves the top models.

validation_plot([ax, y, idx, model_type, ...])

Create a validation plot for a given model.

confusion_matrix(axs=None, idx=None, model_type=None, sort_by=None, direction=None, colorbar=False, annotate=True, round=2)[source]

Create training and testing confusion matrix.

Parameters:
  • axs (list of 2 matplotlib.pyplot.axis or None, default=None) – If not given then an axes are created.

  • idx (int or None, default=None) – The index in the pyMAISE.PostProcessor.metrics() pandas.DataFrame. If None, then sort_by is used.

  • model_type (str or None, default=None) – The model name to get. Will get the best model predictions based on sort_by.

  • sort_by (str or None, detault=None) – The metric to sort the pandas.DataFrame from pyMAISE.PostProcessor.metrics() by. If None then test r2_score is used for pyMAISE.ProblemType.REGRESSION and test accuracy_score is used for pyMAISE.ProblemType.CLASSIFICATION.

  • direction ('min', 'max', or None, default=None) – The direction to sort_by. It is only required if sort_by is not a default metric.

  • colorbar (Boolean, default=False) – Whether to include a colorbar.

  • annotate (Boolean, default=True) – Whether to include annotations (number and percentage).

  • round (int, default=2) – Number of digits to round percentage in annotation.

Returns:

axs – The two confusion matrix axes: (cm_train, cm_test)

Return type:

tuple of matplotlib.pyplot.axis

diagonal_validation_plot(ax=None, y=None, idx=None, model_type=None, sort_by=None, direction=None)[source]

Create a diagonal validation plot for a given model.

Parameters:
  • ax (matplotlib.pyplot.axis or None, default=None) – If not given, then an axis is created.

  • y (single or list of int or str or None, default=None) – The output to plot. If None then all outputs are plotted.

  • idx (int or None, default=None) – The index in the pyMAISE.PostProcessor.metrics() pandas.DataFrame. If None, then sort_by is used.

  • model_type (str or None, default=None) – The model name to get. Will get the best model predictions based on sort_by.

  • sort_by (str or None, detault=None) – The metric to sort the pandas.DataFrame from pyMAISE.PostProcessor.metrics() by. If None then test r2_score is used for pyMAISE.ProblemType.REGRESSION and test accuracy_score is used for pyMAISE.ProblemType.CLASSIFICATION.

  • direction ('min', 'max', or None, default=None) – The direction to sort_by. It is only required if sort_by is not a default metric.

Returns:

ax – The plot.

Return type:

matplotlib.pyplot.axis

get_model(idx=None, model_type=None, sort_by=None, direction=None)[source]

Get a model. The model with the chosen hyperparameters is refit and returned.

Parameters:
  • idx (int or None, default=None) – The index in the pyMAISE.PostProcessor.metrics() pandas.DataFrame. If None, then sort_by is used.

  • model_type (str or None, default=None) – The model name to get. Will get the best model predictions based on sort_by.

  • sort_by (str or None, detault=None) – The metric to sort the pandas.DataFrame from pyMAISE.PostProcessor.metrics() by. If None then test r2_score is used for pyMAISE.ProblemType.REGRESSION and test accuracy_score is used for pyMAISE.ProblemType.CLASSIFICATION.

  • direction ('min', 'max', or None, default=None) – The direction to sort_by. It is only required if sort_by is not a default metric.

Returns:

model – The model refit based on the parameters from the arguments.

Return type:

sklearn or keras model

get_params(idx=None, model_type=None, sort_by=None, direction=None)[source]

Returns the hyperparameters for a given model.

Parameters:
  • idx (int or None, default=None) – The index in the pyMAISE.PostProcessor.metrics() pandas.DataFrame. If None, then sort_by is used.

  • model_type (str or None, default=None) – The model name to get. Will get the best model predictions based on sort_by.

  • sort_by (str or None, detault=None) – The metric to sort the pandas.DataFrame from pyMAISE.PostProcessor.metrics() by. If None then test r2_score is used for pyMAISE.ProblemType.REGRESSION and test accuracy_score is used for pyMAISE.ProblemType.CLASSIFICATION.

  • direction ('min', 'max', or None, default=None) – The direction to sort_by. It is only required if sort_by is not a default metric.

Returns:

params – The hyperparameters of the model.

Return type:

pandas.DataFrame

get_predictions(idx=None, model_type=None, sort_by=None, direction=None)[source]

Get a model’s training and testing predictions.

Parameters:
  • idx (int or None, default=None) – The index in the pyMAISE.PostProcessor.metrics() pandas.DataFrame. If None, then sort_by is used.

  • model_type (str or None, default=None) – The model name to get. Will get the best model predictions based on sort_by.

  • sort_by (str or None, detault=None) – The metric to sort the pandas.DataFrame from pyMAISE.PostProcessor.metrics() by. If None then test r2_score is used for pyMAISE.ProblemType.REGRESSION and test accuracy_score is used for pyMAISE.ProblemType.CLASSIFICATION.

  • direction ('min', 'max', or None, default=None) – The direction to sort_by. It is only required if sort_by is not a default metric.

Returns:

yhat – The predicted training and testing data given as (train_yhat, test_yhat).

Return type:

tuple of numpy.array

metrics(y=None, model_type=None, metrics=None, sort_by=None, direction=None)[source]

Calculate model performance of predicting output training and testing data. Default metrics are always evaluated depending on the pyMAISE.Settings.problem_type. For pyMAISE.ProblemType.REGRESSION problems, the default metrics are from [PVG+11] and include:

For pyMAISE.ProblemType.CLASSIFICATION problems, the default metrics are

These metrics are evaluated for both the training and testing data sets.

Parameters:
  • y (int, str, or None, default=None) – The output to determine performance. If None then all outputs are used.

  • model_type (str or None, default=None) – Determine the performance of this model. If None then all models are evaluated.

  • metrics (dict of callable or None, default=None) – Dictionary of callable metrics such as sklearn’s metrics other than those already default to this method. Must take two arguments: (y_true, y_pred). The key is used as the name in performance_data.

  • sort_by (str or None, default=None) – The metric to sort the return by. This should differentiate training and testing. For example, we can sort by testing mean_squared_error. If None then the default is test r2_score for pyMAISE.ProblemType.REGRESSION and test accuracy_score for pyMAISE.ProblemType.CLASSIFICATION.

  • direction (min, max, or None) – Direction to sort_by. Only required if a metric is defined in metrics that you want to sort the return by.

Returns:

performance_data – The performance statistics for the models for both the training and testing data.

Return type:

pandas.DataFrame

nn_learning_plot(ax=None, idx=None, model_type=None, sort_by=None, direction=None)[source]

Create a learning plot for a given neural network.

Parameters:
  • ax (matplotlib.pyplot.axis or None, default=None) – If not given then an axis is created.

  • idx (int or None, default=None) – The index in the pyMAISE.PostProcessor.metrics() pandas.DataFrame. If None, then sort_by is used.

  • model_type (str or None, default=None) – The model name to get. Will get the best model predictions based on sort_by.

  • sort_by (str or None, detault=None) – The metric to sort the pandas.DataFrame from pyMAISE.PostProcessor.metrics() by. If None then test r2_score is used for pyMAISE.ProblemType.REGRESSION and test accuracy_score is used for pyMAISE.ProblemType.CLASSIFICATION.

  • direction ('min', 'max', or None, default=None) – The direction to sort_by. It is only required if sort_by is not a default metric.

Returns:

ax – The plot.

Return type:

matplotlib.pyplot.axis

nn_network_plot(idx=None, model_type=None, sort_by=None, direction=None, **kwargs)[source]

Plot NN network.

Note

For this to work you must have graphviz installed which can be done through your package manager.

Parameters:
print_model(idx=None, model_type=None, sort_by=None, direction=None, **kwargs)[source]

Print a models tuned hyperparameters.

Parameters:
save_models(num_models=10, idxs=None, model_types=None, sort_by=None, direction=None, directory='.')[source]

Saves the top models. Models are names as <Model Type>_<Index in to metrics table>.

Parameters:
  • num_models (int, default=None) – Number of models to save.

  • idxs (int, list of ints, None, default=None) – The indices in the pyMAISE.PostProcessor.metrics() pandas.DataFrame. If None, then sort_by is used.

  • model_types (str, list of str, or None, default=None) – The model name(s) to get. Will get the best model predictions based on sort_by.

  • sort_by (str or None, detault=None) – The metric to sort the pandas.DataFrame from pyMAISE.PostProcessor.metrics() by. If None then test r2_score is used for pyMAISE.ProblemType.REGRESSION and test accuracy_score is used for pyMAISE.ProblemType.CLASSIFICATION.

  • direction ('min', 'max', or None, default=None) – The direction to sort_by. It is only required if sort_by is not a default metric.

  • directory (str, default=".") – Directory to save the models to. All sklearn models will be saved as pickles and the keras models will be in TensorFlow’s SavedModel format.

validation_plot(ax=None, y=None, idx=None, model_type=None, sort_by=None, direction=None)[source]

Create a validation plot for a given model.

Parameters:
  • ax (matplotlib.pyplot.axis or None, default=None) – If not given, then an axis is created.

  • y (single or list of int or str or None, default=None) – The output to plot. If None then all outputs are plotted.

  • idx (int or None, default=None) – The index in the pyMAISE.PostProcessor.metrics() pandas.DataFrame. If None, then sort_by is used.

  • model_type (str or None, default=None) – The model name to get. Will get the best model predictions based on sort_by.

  • sort_by (str or None, detault=None) – The metric to sort the pandas.DataFrame from pyMAISE.PostProcessor.metrics() by. If None then test r2_score is used for pyMAISE.ProblemType.REGRESSION and test accuracy_score is used for pyMAISE.ProblemType.CLASSIFICATION.

  • direction ('min', 'max', or None, default=None) – The direction to sort_by. It is only required if sort_by is not a default metric.

Returns:

ax – The plot.

Return type:

matplotlib.pyplot.axis