pyMAISE.PostProcessor
- class pyMAISE.PostProcessor(data, model_configs, new_model_settings=None, yscaler=None)[source]
Bases:
objectAssess the performance of the top-performing models.
- Parameters:
data (tuple of xarray.DataArray) – The training and testing data given as
(xtrain, xtest, ytrain, ytest).model_configs (single or list of dict of tuple(pandas.DataFrame, model object)) – The model configurations produced by
pyMAISE.Tuner.new_model_settings (dict of dict of int, float, str, or None, default=None) – Updated model settings given as a dictionary under the model’s key.
yscaler (callable or None, default=None) – An object with an
inverse_transformmethod such as min-max scaler from sklearn [PVG+11]. This should have been fit usingpyMAISE.preprocessing.scale_data()before hyperparameter tuning. IfNonethen scaling is not undone.
Methods
confusion_matrix([axs, idx, model_type, ...])Create training and testing confusion matrix.
diagonal_validation_plot([ax, y, idx, ...])Create a diagonal validation plot for a given model.
get_model([idx, model_type, sort_by, direction])Get a model.
get_params([idx, model_type, sort_by, direction])Returns the hyperparameters for a given model.
get_predictions([idx, model_type, sort_by, ...])Get a model's training and testing predictions.
metrics([y, model_type, metrics, sort_by, ...])Calculate model performance of predicting output training and testing data.
nn_learning_plot([ax, idx, model_type, ...])Create a learning plot for a given neural network.
nn_network_plot([idx, model_type, sort_by, ...])Plot NN network.
print_model([idx, model_type, sort_by, ...])Print a models tuned hyperparameters.
save_models([num_models, idxs, model_types, ...])Saves the top models.
validation_plot([ax, y, idx, model_type, ...])Create a validation plot for a given model.
- confusion_matrix(axs=None, idx=None, model_type=None, sort_by=None, direction=None, colorbar=False, annotate=True, round=2)[source]
Create training and testing confusion matrix.
- Parameters:
axs (list of 2 matplotlib.pyplot.axis or None, default=None) – If not given then an axes are created.
idx (int or None, default=None) – The index in the
pyMAISE.PostProcessor.metrics()pandas.DataFrame. IfNone, thensort_byis used.model_type (str or None, default=None) – The model name to get. Will get the best model predictions based on
sort_by.sort_by (str or None, detault=None) – The metric to sort the pandas.DataFrame from
pyMAISE.PostProcessor.metrics()by. IfNonethentest r2_scoreis used forpyMAISE.ProblemType.REGRESSIONandtest accuracy_scoreis used forpyMAISE.ProblemType.CLASSIFICATION.direction ('min', 'max', or None, default=None) – The direction to
sort_by. It is only required ifsort_byis not a default metric.colorbar (Boolean, default=False) – Whether to include a colorbar.
annotate (Boolean, default=True) – Whether to include annotations (number and percentage).
round (int, default=2) – Number of digits to round percentage in annotation.
- Returns:
axs – The two confusion matrix axes:
(cm_train, cm_test)- Return type:
tuple of matplotlib.pyplot.axis
- diagonal_validation_plot(ax=None, y=None, idx=None, model_type=None, sort_by=None, direction=None)[source]
Create a diagonal validation plot for a given model.
- Parameters:
ax (matplotlib.pyplot.axis or None, default=None) – If not given, then an axis is created.
y (single or list of int or str or None, default=None) – The output to plot. If
Nonethen all outputs are plotted.idx (int or None, default=None) – The index in the
pyMAISE.PostProcessor.metrics()pandas.DataFrame. IfNone, thensort_byis used.model_type (str or None, default=None) – The model name to get. Will get the best model predictions based on
sort_by.sort_by (str or None, detault=None) – The metric to sort the pandas.DataFrame from
pyMAISE.PostProcessor.metrics()by. IfNonethentest r2_scoreis used forpyMAISE.ProblemType.REGRESSIONandtest accuracy_scoreis used forpyMAISE.ProblemType.CLASSIFICATION.direction ('min', 'max', or None, default=None) – The direction to
sort_by. It is only required ifsort_byis not a default metric.
- Returns:
ax – The plot.
- Return type:
matplotlib.pyplot.axis
- get_model(idx=None, model_type=None, sort_by=None, direction=None)[source]
Get a model. The model with the chosen hyperparameters is refit and returned.
- Parameters:
idx (int or None, default=None) – The index in the
pyMAISE.PostProcessor.metrics()pandas.DataFrame. IfNone, thensort_byis used.model_type (str or None, default=None) – The model name to get. Will get the best model predictions based on
sort_by.sort_by (str or None, detault=None) – The metric to sort the pandas.DataFrame from
pyMAISE.PostProcessor.metrics()by. IfNonethentest r2_scoreis used forpyMAISE.ProblemType.REGRESSIONandtest accuracy_scoreis used forpyMAISE.ProblemType.CLASSIFICATION.direction ('min', 'max', or None, default=None) – The direction to
sort_by. It is only required ifsort_byis not a default metric.
- Returns:
model – The model refit based on the parameters from the arguments.
- Return type:
sklearn or keras model
- get_params(idx=None, model_type=None, sort_by=None, direction=None)[source]
Returns the hyperparameters for a given model.
- Parameters:
idx (int or None, default=None) – The index in the
pyMAISE.PostProcessor.metrics()pandas.DataFrame. IfNone, thensort_byis used.model_type (str or None, default=None) – The model name to get. Will get the best model predictions based on
sort_by.sort_by (str or None, detault=None) – The metric to sort the pandas.DataFrame from
pyMAISE.PostProcessor.metrics()by. IfNonethentest r2_scoreis used forpyMAISE.ProblemType.REGRESSIONandtest accuracy_scoreis used forpyMAISE.ProblemType.CLASSIFICATION.direction ('min', 'max', or None, default=None) – The direction to
sort_by. It is only required ifsort_byis not a default metric.
- Returns:
params – The hyperparameters of the model.
- Return type:
pandas.DataFrame
- get_predictions(idx=None, model_type=None, sort_by=None, direction=None)[source]
Get a model’s training and testing predictions.
- Parameters:
idx (int or None, default=None) – The index in the
pyMAISE.PostProcessor.metrics()pandas.DataFrame. IfNone, thensort_byis used.model_type (str or None, default=None) – The model name to get. Will get the best model predictions based on
sort_by.sort_by (str or None, detault=None) – The metric to sort the pandas.DataFrame from
pyMAISE.PostProcessor.metrics()by. IfNonethentest r2_scoreis used forpyMAISE.ProblemType.REGRESSIONandtest accuracy_scoreis used forpyMAISE.ProblemType.CLASSIFICATION.direction ('min', 'max', or None, default=None) – The direction to
sort_by. It is only required ifsort_byis not a default metric.
- Returns:
yhat – The predicted training and testing data given as
(train_yhat, test_yhat).- Return type:
tuple of numpy.array
- metrics(y=None, model_type=None, metrics=None, sort_by=None, direction=None)[source]
Calculate model performance of predicting output training and testing data. Default metrics are always evaluated depending on the
pyMAISE.Settings.problem_type. ForpyMAISE.ProblemType.REGRESSIONproblems, the default metrics are from [PVG+11] and include:R2: r-squared,MAE: mean absolute error,RMSE: root mean squared error, the square root ofMSE,RMSPE: root mean squared percentage error.
For
pyMAISE.ProblemType.CLASSIFICATIONproblems, the default metrics areThese metrics are evaluated for both the training and testing data sets.
- Parameters:
y (int, str, or None, default=None) – The output to determine performance. If
Nonethen all outputs are used.model_type (str or None, default=None) – Determine the performance of this model. If
Nonethen all models are evaluated.metrics (dict of callable or None, default=None) – Dictionary of callable metrics such as sklearn’s metrics other than those already default to this method. Must take two arguments:
(y_true, y_pred). The key is used as the name inperformance_data.sort_by (str or None, default=None) – The metric to sort the return by. This should differentiate training and testing. For example, we can sort by
testing mean_squared_error. IfNonethen the default istest r2_scoreforpyMAISE.ProblemType.REGRESSIONandtest accuracy_scoreforpyMAISE.ProblemType.CLASSIFICATION.direction (min, max, or None) – Direction to
sort_by. Only required if a metric is defined inmetricsthat you want to sort the return by.
- Returns:
performance_data – The performance statistics for the models for both the training and testing data.
- Return type:
pandas.DataFrame
- nn_learning_plot(ax=None, idx=None, model_type=None, sort_by=None, direction=None)[source]
Create a learning plot for a given neural network.
- Parameters:
ax (matplotlib.pyplot.axis or None, default=None) – If not given then an axis is created.
idx (int or None, default=None) – The index in the
pyMAISE.PostProcessor.metrics()pandas.DataFrame. IfNone, thensort_byis used.model_type (str or None, default=None) – The model name to get. Will get the best model predictions based on
sort_by.sort_by (str or None, detault=None) – The metric to sort the pandas.DataFrame from
pyMAISE.PostProcessor.metrics()by. IfNonethentest r2_scoreis used forpyMAISE.ProblemType.REGRESSIONandtest accuracy_scoreis used forpyMAISE.ProblemType.CLASSIFICATION.direction ('min', 'max', or None, default=None) – The direction to
sort_by. It is only required ifsort_byis not a default metric.
- Returns:
ax – The plot.
- Return type:
matplotlib.pyplot.axis
- nn_network_plot(idx=None, model_type=None, sort_by=None, direction=None, **kwargs)[source]
Plot NN network.
Note
For this to work you must have graphviz installed which can be done through your package manager.
- Parameters:
idx (int or None, default=None) – The index in the
pyMAISE.PostProcessor.metrics()pandas.DataFrame. IfNone, thensort_byis used.model_type (str or None, default=None) – The model name to get. Will get the best model predictions based on
sort_by.sort_by (str or None, detault=None) – The metric to sort the pandas.DataFrame from
pyMAISE.PostProcessor.metrics()by. IfNonethentest r2_scoreis used forpyMAISE.ProblemType.REGRESSIONandtest accuracy_scoreis used forpyMAISE.ProblemType.CLASSIFICATION.direction ('min', 'max', or None, default=None) – The direction to
sort_by. It is only required ifsort_byis not a default metric.kwargs – Any arguments related to tensorflow.keras.utils.plot_model() except
model.
- print_model(idx=None, model_type=None, sort_by=None, direction=None, **kwargs)[source]
Print a models tuned hyperparameters.
- Parameters:
idx (int or None, default=None) – The index in the
pyMAISE.PostProcessor.metrics()pandas.DataFrame. IfNone, thensort_byis used.model_type (str or None, default=None) – The model name to get. Will get the best model predictions based on
sort_by.sort_by (str or None, detault=None) – The metric to sort the pandas.DataFrame from
pyMAISE.PostProcessor.metrics()by. IfNonethentest r2_scoreis used forpyMAISE.ProblemType.REGRESSIONandtest accuracy_scoreis used forpyMAISE.ProblemType.CLASSIFICATION.direction ('min', 'max', or None, default=None) – The direction to
sort_by. It is only required ifsort_byis not a default metric.kwargs – Any arguments used by tensorflow.keras.Sequential.summary().
- save_models(num_models=10, idxs=None, model_types=None, sort_by=None, direction=None, directory='.')[source]
Saves the top models. Models are names as
<Model Type>_<Index in to metrics table>.- Parameters:
num_models (int, default=None) – Number of models to save.
idxs (int, list of ints, None, default=None) – The indices in the
pyMAISE.PostProcessor.metrics()pandas.DataFrame. IfNone, thensort_byis used.model_types (str, list of str, or None, default=None) – The model name(s) to get. Will get the best model predictions based on
sort_by.sort_by (str or None, detault=None) – The metric to sort the pandas.DataFrame from
pyMAISE.PostProcessor.metrics()by. IfNonethentest r2_scoreis used forpyMAISE.ProblemType.REGRESSIONandtest accuracy_scoreis used forpyMAISE.ProblemType.CLASSIFICATION.direction ('min', 'max', or None, default=None) – The direction to
sort_by. It is only required ifsort_byis not a default metric.directory (str, default=".") – Directory to save the models to. All sklearn models will be saved as pickles and the keras models will be in TensorFlow’s SavedModel format.
- validation_plot(ax=None, y=None, idx=None, model_type=None, sort_by=None, direction=None)[source]
Create a validation plot for a given model.
- Parameters:
ax (matplotlib.pyplot.axis or None, default=None) – If not given, then an axis is created.
y (single or list of int or str or None, default=None) – The output to plot. If
Nonethen all outputs are plotted.idx (int or None, default=None) – The index in the
pyMAISE.PostProcessor.metrics()pandas.DataFrame. IfNone, thensort_byis used.model_type (str or None, default=None) – The model name to get. Will get the best model predictions based on
sort_by.sort_by (str or None, detault=None) – The metric to sort the pandas.DataFrame from
pyMAISE.PostProcessor.metrics()by. IfNonethentest r2_scoreis used forpyMAISE.ProblemType.REGRESSIONandtest accuracy_scoreis used forpyMAISE.ProblemType.CLASSIFICATION.direction ('min', 'max', or None, default=None) – The direction to
sort_by. It is only required ifsort_byis not a default metric.
- Returns:
ax – The plot.
- Return type:
matplotlib.pyplot.axis