Fuel Performance

Inputs

  • fuel_dens: Fuel density (\(\frac{kg}{m^3}\))

  • porosity: Porosity

  • clad_thick: Cladding thickness (\(m\))

  • pellet_OD: Pellet outer diameter (\(m\))

  • pellet_h: Pellet height (\(m\))

  • gap_thick: Gap thickness (\(m\))

  • inlet_T: Inlet temperature (\(K\))

  • enrich: U-235 enrichment

  • rough_fuel: Fuel roughness (\(m\))

  • rough_clad: Clad rouchness (\(m\))

  • ax_pow: Axial power

  • clad_T: Cladding surface temperature (\(K\))

  • pressure: Pressure (\(Pa\))

Outputs

  • fis_gas_produced: Fission gas production (\(mol\))

  • max_fuel_centerline_temp: Max fuel centerline temperature (\(K\))

  • max_fuel_surface_temp: Max fuel surface temperature (\(K\))

  • radial_clad_dia: Radial cladding diameter displacement after irradiation (\(m\))

This data set consists of 13 inputs and 4 outputs with 400 data points. This data originates from [1], and a graphical representation is provided in the figure below. Case 1 from the pellet-cladding mechanical interaction (PCMI) benchmark [2] was selected for the data set. This benchmark simulates a beginning of life (BOL) ramp of a 10-pellet pressurized water reactor (PWR) fuel rod to an average linear heat rate of \(40~kW/m\). The inner and outer cladding diameters are reduced, so the fuel-clad interaction occurs during the ramp time. Axial power and rod surface temperature profiles were assumed to be uniform at \(330^\circ C\). The 13 input parameters were uniformly randomly sampled independently within their uncertainty bounds and simulated in BISON. The rod response was recorded in 4 outputs.

fp.png

[11]:
import pyMAISE as mai
import time
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.gridspec import GridSpec
from scipy.stats import uniform, randint
from sklearn.model_selection import ShuffleSplit

# Plot settings
matplotlib_settings = {
    "font.size": 14,
    "legend.fontsize": 12,
}
plt.rcParams.update(**matplotlib_settings)

pyMAISE Initialization

First we initialize pyMAISE with the following 4 parameters:

  • verbosity: 0 \(\rightarrow\) pyMAISE prints no outputs,

  • random_state: None \(\rightarrow\) No random seed is set,

  • test_size: 0.3 \(\rightarrow\) 30% of the data is used for testing,

  • num_configs_saved: 5 \(\rightarrow\) The top 5 hyper-parameter configurations are saved for each model.

With pyMAISE initialized we can load the preprocessor for this data set using load_fp().

[12]:
global_settings = mai.settings.init()
preprocessor = mai.load_fp()

As stated the data set consists of 13 inputs:

[13]:
preprocessor.inputs.head()
[13]:
fuel_dens porosity clad_thick pellet_OD pellet_h gap_thick inlet_T enrich rough_fuel rough_clad ax_pow clad_T pressure
0 10466 0.040527 0.000571 0.004104 0.013077 0.000019 292.36 0.044852 0.000002 3.890000e-07 0.99967 602.72 15504000
1 10488 0.041780 0.000570 0.004096 0.014227 0.000019 291.33 0.044942 0.000002 4.170000e-07 0.98741 602.81 15591000
2 10434 0.058323 0.000568 0.004094 0.013923 0.000019 293.35 0.044889 0.000002 3.670000e-07 0.99225 620.33 15510000
3 10449 0.038379 0.000572 0.004093 0.013976 0.000019 293.29 0.044937 0.000002 3.250000e-07 1.02650 610.91 15382000
4 10429 0.038438 0.000568 0.004096 0.013787 0.000019 291.81 0.044849 0.000002 4.360000e-07 0.99289 597.05 15583000

and 4 outputs with 400 total data points:

[14]:
preprocessor.outputs.head()
[14]:
fis_gas_produced max_fuel_centerline_temp max_fuel_surface_temp radial_clad_dia
0 0.000029 1569.699310 699.613033 0.000019
1 0.000032 1559.465162 699.976191 0.000019
2 0.000031 1632.394103 712.771506 0.000020
3 0.000032 1613.399315 710.114032 0.000020
4 0.000031 1542.540705 694.956854 0.000018

Prior to constructing any models we can get a surface understanding of the data set with a correlation matrix.

[15]:
fig, ax = plt.subplots(figsize=(15,10))
fig, ax = preprocessor.correlation_matrix(fig=fig, ax=ax)
../_images/examples_fuel_performance_9_0.png

There is a positive correlation between axial power and cladding temperature with max fuel centerline temperature, max fuel surface temperature, and radial cladding diameter. Additionally, the fission gas production correlates with pellet height.

The final step of the pyMAISE initialization process is data scaling. For this data set we will use min-max scaling.

[17]:
data = preprocessor.min_max_scale()

Model Initialization

We will examine the performance of 6 models in this data set:

  • Linear regression: linear,

  • Lasso regression: lasso,

  • Decision tree regression: dtree,

  • Random forest regression: rforest,

  • K-nearest neighbors regression: knn,

  • Sequential dense neural networks: nn.

For hyper-parameter tuning each model must be initialized. We will use the Scikit-learn defaults for the classical ML models (linear, lasso, dtree, rforest, and knn); therefore, they are only specified in the models parameter of the model_settings dictionary. However, we must specify nn model parameters that define the layers, optimizer, and training.

[18]:
model_settings = {
    "models": ["linear", "lasso", "dtree", "knn", "rforest", "nn"],
    "nn": {
        # Sequential
        "num_layers": 4,
        "dropout": True,
        "rate": 0.5,
        "validation_split": 0.15,
        "loss": "mean_absolute_error",
        "metrics": ["mean_absolute_error"],
        "batch_size": 8,
        "epochs": 50,
        "warm_start": True,
        "jit_compile": False,
        # Starting Layer
        "start_num_nodes": 100,
        "start_kernel_initializer": "normal",
        "start_activation": "relu",
        "input_dim": preprocessor.inputs.shape[1], # Number of inputs
        # Middle Layers
        "mid_num_node_strategy": "linear", # Middle layer nodes vary linearly from 'start_num_nodes' to 'end_num_nodes'
        "mid_kernel_initializer": "normal",
        "mid_activation": "relu",
        # Ending Layer
        "end_num_nodes": preprocessor.outputs.shape[1], # Number of outputs
        "end_activation": "linear",
        "end_kernel_initializer": "normal",
        # Optimizer
        "optimizer": "adam",
        "learning_rate": 5e-4,
    },
}
tuning = mai.Tuning(data=data, model_settings=model_settings)

Hyper-parameter Tuning

We will use random search for the hyper-parameter tuning of the classical models (lasso, dtree, rforest, and knn) through the random_search function. linear will be manually fit with the Scikit-learn defaults. For each classical model 300 models will be produced with randomly sampled parameter configurations. For nn, bayesian search is used to optimize the hyper-parameters in 50 iterations through the bayesian_search function. Bayesian search is appealing for nn as their training can be computationally expensive. To further reduce the computational cost of nn we specify only 10 epochs which will produce less than performant models but show the optimal parameters. For both search methods we use cross-validation to reduce bias in the models from the data set. The hyper-parameter search spaces are defined in the random_search_spaces and bayesian_search_spaces dictionaries.

[19]:
random_search_spaces = {
    "lasso": {
        "alpha": uniform(loc=0.0001, scale=0.0099), # 0.0001 - 0.01
    },
    "dtree": {
        "max_depth": randint(low=5, high=50), # 5 - 50
        "max_features": [None, "sqrt", "log2", 2, 4, 6],
        "min_samples_split": randint(low=2, high=20), # 2 - 20
        "min_samples_leaf": randint(low=1, high=20), # 1 - 20
    },
    "rforest": {
        "n_estimators": randint(low=50, high=200), # 50 - 200
        "criterion": ["squared_error", "absolute_error", "poisson"],
        "min_samples_split": randint(low=2, high=20), # 2 - 20
        "min_samples_leaf": randint(low=1, high=20), # 1 - 20
        "max_features": [None, "sqrt", "log2", 2, 4, 6],
    },
    "knn": {
        "n_neighbors": randint(low=1, high=20), # 1 - 20
        "weights": ["uniform", "distance"],
        "leaf_size": randint(low=1, high=30), # 1 - 30
        "p": randint(low=1, high=10), # 1 - 10
    },
}
bayesian_search_spaces = {
    "nn": {
        "mid_num_node_strategy": ["constant", "linear"],
        "batch_size": [8, 64],
        "learning_rate": [1e-5, 0.001],
        "num_layers": [2, 6],
        "start_num_nodes": [25, 300],
    },
}

start = time.time()
random_search_configs = tuning.random_search(
    param_spaces=random_search_spaces,
    models=["linear"] + list(random_search_spaces.keys()),
    n_iter=300,
    cv=ShuffleSplit(n_splits=5, test_size=0.25, random_state=global_settings.random_state),
)
bayesian_search_configs = tuning.bayesian_search(
    param_spaces=bayesian_search_spaces,
    models=bayesian_search_spaces.keys(),
    n_iter=50,
    cv=5,
)
stop = time.time()
print("Hyper-parameter tuning took " + str((stop - start) / 60) + " minutes to process.")
Hyper-parameter tuning search space was not provided for linear, doing manual fit
Hyper-parameter tuning took 25.628323105971017 minutes to process.

We can understand the hyper-parameter tuning of Bayesian search from the convergence plot.

[22]:
fig, ax = plt.subplots(figsize=(8,8))
ax = tuning.convergence_plot(model_types="nn")
../_images/examples_fuel_performance_17_0.png

Fewer than 30 iterations were required to converge to the optimal parameter configurations.

Model Post-processing

Now that the top num_configs_saved saved, we can pass these models to the PostProcessor for model comparison and analysis. To improve the nn performance we can pass an updated epochs parameter. Using 200 epochs should improve fitting at higher computational cost.

[21]:
new_model_settings = {
    "nn": {"epochs": 200}
}
postprocessor = mai.PostProcessor(
    data=data,
    models_list=[random_search_configs, bayesian_search_configs],
    new_model_settings=new_model_settings,
    yscaler=preprocessor.yscaler,
)

To compare the performance of these models we will compute 4 metrics for both the training and testing data:

  • mean squared error MSE \(=\frac{1}{n}\sum^n_{i = 1}(y_i - \hat{y_i})^2\),

  • root mean squared error RMSE \(=\sqrt{\frac{1}{n}\sum^n_{i = 1}(y_i - \hat{y_i})^2}\),

  • mean absolute error MAE = \(=\frac{1}{n}\sum^n_{i = 1}|y_i - \hat{y_i}|\),

  • and r-squared R2 \(=1 - \frac{\sum^n_{i = 1}(y_i - \hat{y_i})^2}{\sum^n_{i = 1}(y_i - \bar{y_i})^2}\),

where \(y\) is the actual outcome, \(\bar{y}\) is the average outcome, \(\hat{y}\) is the model predicted outcome, and \(n\) is the number of observations. The averaged performance metrics are shown below.

[23]:
postprocessor.metrics()
[23]:
Model Types Parameter Configurations Train R2 Train MAE Train MSE Train RMSE Test R2 Test MAE Test MSE Test RMSE
5 lasso {'alpha': 0.00026923161864008733} 0.978827 0.935999 2.537076 1.592820 0.978097 0.903101 2.273044 1.507662
4 lasso {'alpha': 0.00023744920034234952} 0.978956 0.928801 2.488477 1.577491 0.978074 0.900014 2.247918 1.499306
3 lasso {'alpha': 0.00020503774055468597} 0.979077 0.920792 2.438582 1.561596 0.978016 0.898150 2.234312 1.494762
2 lasso {'alpha': 0.00015465795335091117} 0.979234 0.909078 2.372053 1.540147 0.977884 0.898684 2.229068 1.493006
1 lasso {'alpha': 0.00013061058032902984} 0.979294 0.904416 2.344675 1.531233 0.977802 0.899640 2.234351 1.494775
0 linear {'copy_X': True, 'fit_intercept': True, 'n_job... 0.979471 0.893394 2.268822 1.506261 0.977089 0.924379 2.329088 1.526135
22 nn {'batch_size': 8, 'learning_rate': 0.000786843... 0.978336 1.219678 5.441800 2.332767 0.969183 1.342735 6.227949 2.495586
21 nn {'batch_size': 11, 'learning_rate': 0.00089686... 0.977618 1.173200 5.570638 2.360220 0.966752 1.313837 5.953898 2.440061
24 nn {'batch_size': 8, 'learning_rate': 0.000696190... 0.976610 1.032113 4.556381 2.134568 0.965068 1.205700 4.959931 2.227090
23 nn {'batch_size': 8, 'learning_rate': 0.000885509... 0.972479 1.402569 7.658852 2.767463 0.962669 1.442403 7.205904 2.684381
25 nn {'batch_size': 8, 'learning_rate': 0.000888957... 0.968117 1.550723 8.360630 2.891475 0.957415 1.673796 8.948252 2.991363
13 rforest {'criterion': 'poisson', 'max_features': 6, 'm... 0.940745 2.100256 23.137852 4.810182 0.809732 3.414570 56.801522 7.536678
12 rforest {'criterion': 'poisson', 'max_features': 6, 'm... 0.942109 1.947629 22.431744 4.736216 0.803872 3.424181 57.412456 7.577101
15 rforest {'criterion': 'squared_error', 'max_features':... 0.919059 2.578299 33.902547 5.822589 0.793365 3.693747 66.615058 8.161805
11 rforest {'criterion': 'squared_error', 'max_features':... 0.924903 2.367251 31.422827 5.605607 0.791679 3.710470 66.453039 8.151873
14 rforest {'criterion': 'poisson', 'max_features': None,... 0.910993 2.526743 35.025825 5.918262 0.789048 3.716417 66.922642 8.180626
18 knn {'leaf_size': 3, 'n_neighbors': 9, 'p': 2, 'we... 1.000000 0.000000 0.000000 0.000000 0.681593 4.727056 95.507025 9.772770
16 knn {'leaf_size': 18, 'n_neighbors': 5, 'p': 2, 'w... 1.000000 0.000000 0.000000 0.000000 0.647915 5.037443 106.382621 10.314195
17 knn {'leaf_size': 12, 'n_neighbors': 5, 'p': 2, 'w... 1.000000 0.000000 0.000000 0.000000 0.647915 5.037443 106.382621 10.314195
19 knn {'leaf_size': 4, 'n_neighbors': 6, 'p': 2, 'we... 0.766819 4.528183 93.627271 9.676119 0.643723 5.059660 109.627240 10.470303
20 knn {'leaf_size': 3, 'n_neighbors': 5, 'p': 3, 'we... 1.000000 0.000000 0.000000 0.000000 0.636662 5.070612 113.431055 10.650402
10 dtree {'max_depth': 16, 'max_features': None, 'min_s... 0.805646 4.130526 80.616378 8.978662 0.595758 5.515306 141.989607 11.915939
9 dtree {'max_depth': 17, 'max_features': None, 'min_s... 0.780961 4.311674 86.430893 9.296822 0.591724 5.572924 141.626549 11.900695
8 dtree {'max_depth': 38, 'max_features': None, 'min_s... 0.785009 4.284712 86.018943 9.274640 0.590158 5.603306 142.553336 11.939570
6 dtree {'max_depth': 22, 'max_features': None, 'min_s... 0.829430 3.895268 74.367367 8.623652 0.586285 5.807412 146.484444 12.103076
7 dtree {'max_depth': 27, 'max_features': None, 'min_s... 0.772476 4.512748 94.374841 9.714671 0.567891 5.845143 151.663289 12.315165

Given the top performing models are linear and lasso this data set’s outputs are linear with their inputs. nn also performs well with all models greater than 0.95. Performance quickly drops off with rforest, knn, and dtree. We can look specifically at the performance for each output:

[25]:
postprocessor.metrics(y="fis_gas_produced")
[25]:
Model Types Parameter Configurations Train R2 Train MAE Train MSE Train RMSE Test R2 Test MAE Test MSE Test RMSE
0 linear {'copy_X': True, 'fit_intercept': True, 'n_job... 0.999304 3.286269e-08 1.613853e-15 4.017279e-08 0.999095 3.303975e-08 1.800865e-15 4.243660e-08
1 lasso {'alpha': 0.00013061058032902984} 0.999239 3.402448e-08 1.765100e-15 4.201309e-08 0.999061 3.345253e-08 1.867430e-15 4.321377e-08
2 lasso {'alpha': 0.00015465795335091117} 0.999226 3.421009e-08 1.795060e-15 4.236815e-08 0.999037 3.379421e-08 1.915802e-15 4.376987e-08
3 lasso {'alpha': 0.00020503774055468597} 0.999192 3.468531e-08 1.874203e-15 4.329207e-08 0.998978 3.455206e-08 2.032505e-15 4.508331e-08
4 lasso {'alpha': 0.00023744920034234952} 0.999165 3.514227e-08 1.936839e-15 4.400953e-08 0.998935 3.512777e-08 2.118579e-15 4.602803e-08
5 lasso {'alpha': 0.00026923161864008733} 0.999135 3.561100e-08 2.007169e-15 4.480144e-08 0.998888 3.573908e-08 2.211343e-15 4.702492e-08
21 nn {'batch_size': 11, 'learning_rate': 0.00089686... 0.994642 8.169126e-08 1.243119e-14 1.114952e-07 0.993048 9.249743e-08 1.382836e-14 1.175940e-07
24 nn {'batch_size': 8, 'learning_rate': 0.000696190... 0.990528 1.115254e-07 2.197835e-14 1.482510e-07 0.989709 1.119127e-07 2.046832e-14 1.430675e-07
22 nn {'batch_size': 8, 'learning_rate': 0.000786843... 0.991317 1.125558e-07 2.014702e-14 1.419402e-07 0.988133 1.247508e-07 2.360333e-14 1.536338e-07
23 nn {'batch_size': 8, 'learning_rate': 0.000885509... 0.988808 1.288184e-07 2.596850e-14 1.611474e-07 0.986707 1.334664e-07 2.643891e-14 1.626005e-07
25 nn {'batch_size': 8, 'learning_rate': 0.000888957... 0.964312 2.556519e-07 8.280572e-14 2.877598e-07 0.961364 2.451923e-07 7.684695e-14 2.772128e-07
15 rforest {'criterion': 'squared_error', 'max_features':... 0.945207 2.181329e-07 1.271362e-13 3.565617e-07 0.920970 2.608805e-07 1.571900e-13 3.964719e-07
11 rforest {'criterion': 'squared_error', 'max_features':... 0.948627 2.051300e-07 1.192002e-13 3.452538e-07 0.920302 2.521910e-07 1.585197e-13 3.981453e-07
14 rforest {'criterion': 'poisson', 'max_features': None,... 0.931893 2.385483e-07 1.580278e-13 3.975271e-07 0.903839 2.740886e-07 1.912645e-13 4.373379e-07
13 rforest {'criterion': 'poisson', 'max_features': 6, 'm... 0.944420 2.252765e-07 1.289617e-13 3.591123e-07 0.873799 3.370844e-07 2.510135e-13 5.010125e-07
12 rforest {'criterion': 'poisson', 'max_features': 6, 'm... 0.946976 2.171107e-07 1.230303e-13 3.507567e-07 0.868743 3.438968e-07 2.610694e-13 5.109495e-07
6 dtree {'max_depth': 22, 'max_features': None, 'min_s... 0.847844 4.333155e-07 3.530456e-13 5.941764e-07 0.750425 5.537847e-07 4.964038e-13 7.045593e-07
10 dtree {'max_depth': 16, 'max_features': None, 'min_s... 0.820813 4.774076e-07 4.157650e-13 6.447984e-07 0.746392 5.695370e-07 5.044256e-13 7.102293e-07
8 dtree {'max_depth': 38, 'max_features': None, 'min_s... 0.814371 4.911534e-07 4.307116e-13 6.562863e-07 0.726987 5.942721e-07 5.430224e-13 7.369005e-07
9 dtree {'max_depth': 17, 'max_features': None, 'min_s... 0.807016 4.982410e-07 4.477764e-13 6.691610e-07 0.718997 6.034520e-07 5.589148e-13 7.476061e-07
7 dtree {'max_depth': 27, 'max_features': None, 'min_s... 0.806430 5.097794e-07 4.491367e-13 6.701766e-07 0.717898 5.956039e-07 5.610997e-13 7.490659e-07
18 knn {'leaf_size': 3, 'n_neighbors': 9, 'p': 2, 'we... 1.000000 0.000000e+00 0.000000e+00 0.000000e+00 0.655778 6.778442e-07 6.846581e-13 8.274407e-07
19 knn {'leaf_size': 4, 'n_neighbors': 6, 'p': 2, 'we... 0.736241 6.162500e-07 6.119950e-13 7.823011e-07 0.614494 7.143056e-07 7.667708e-13 8.756545e-07
16 knn {'leaf_size': 18, 'n_neighbors': 5, 'p': 2, 'w... 1.000000 0.000000e+00 0.000000e+00 0.000000e+00 0.602686 7.263933e-07 7.902572e-13 8.889641e-07
17 knn {'leaf_size': 12, 'n_neighbors': 5, 'p': 2, 'w... 1.000000 0.000000e+00 0.000000e+00 0.000000e+00 0.602686 7.263933e-07 7.902572e-13 8.889641e-07
20 knn {'leaf_size': 3, 'n_neighbors': 5, 'p': 3, 'we... 1.000000 0.000000e+00 0.000000e+00 0.000000e+00 0.594329 7.231460e-07 8.068791e-13 8.982645e-07

For fission gas production, decision tree out performs k-nearest neighbors. Additionally, fission gas production is well modeled by linear ML models (greater than 0.99 r-squared).

[26]:
postprocessor.metrics(y="max_fuel_centerline_temp")
[26]:
Model Types Parameter Configurations Train R2 Train MAE Train MSE Train RMSE Test R2 Test MAE Test MSE Test RMSE
1 lasso {'alpha': 0.00013061058032902984} 0.996902 1.745836 4.740269 2.177216 0.996174 1.724052 4.264319 2.065023
2 lasso {'alpha': 0.00015465795335091117} 0.996834 1.761842 4.844015 2.200912 0.996172 1.723755 4.266002 2.065430
3 lasso {'alpha': 0.00020503774055468597} 0.996670 1.803053 5.095021 2.257215 0.996114 1.728691 4.331339 2.081187
4 lasso {'alpha': 0.00023744920034234952} 0.996548 1.831316 5.282642 2.298400 0.996041 1.740692 4.412045 2.100487
0 linear {'copy_X': True, 'fit_intercept': True, 'n_job... 0.997090 1.712765 4.452931 2.110197 0.995970 1.801571 4.491111 2.119224
5 lasso {'alpha': 0.00026923161864008733} 0.996429 1.856584 5.464506 2.337628 0.995932 1.756835 4.534321 2.129395
24 nn {'batch_size': 8, 'learning_rate': 0.000696190... 0.990804 2.544354 14.070440 3.751058 0.987714 2.844687 13.692524 3.700341
21 nn {'batch_size': 11, 'learning_rate': 0.00089686... 0.987799 3.265046 18.668726 4.320732 0.983395 3.467136 18.506363 4.301902
22 nn {'batch_size': 8, 'learning_rate': 0.000786843... 0.988149 3.334418 18.133583 4.258354 0.981942 3.569044 20.125629 4.486160
23 nn {'batch_size': 8, 'learning_rate': 0.000885509... 0.982659 4.100383 26.533966 5.151113 0.979201 3.939549 23.181205 4.814686
25 nn {'batch_size': 8, 'learning_rate': 0.000888957... 0.980498 4.663089 29.839972 5.462598 0.972238 4.852364 30.941825 5.562538
13 rforest {'criterion': 'poisson', 'max_features': 6, 'm... 0.942460 6.812932 88.042788 9.383112 0.809634 10.681128 212.167814 14.565981
12 rforest {'criterion': 'poisson', 'max_features': 6, 'm... 0.944317 6.231305 85.201548 9.230468 0.808022 10.691132 213.964384 14.627521
11 rforest {'criterion': 'squared_error', 'max_features':... 0.922063 7.572027 119.252980 10.920301 0.777842 11.554297 247.601017 15.735343
15 rforest {'criterion': 'squared_error', 'max_features':... 0.915917 8.303870 128.657282 11.342719 0.777076 11.512136 248.454838 15.762450
14 rforest {'criterion': 'poisson', 'max_features': None,... 0.913513 8.048460 132.335135 11.503701 0.775774 11.610214 249.905850 15.808411
18 knn {'leaf_size': 3, 'n_neighbors': 9, 'p': 2, 'we... 1.000000 0.000000 0.000000 0.000000 0.672986 15.534981 364.465568 19.090981
16 knn {'leaf_size': 18, 'n_neighbors': 5, 'p': 2, 'w... 1.000000 0.000000 0.000000 0.000000 0.634850 16.749371 406.969929 20.173496
17 knn {'leaf_size': 12, 'n_neighbors': 5, 'p': 2, 'w... 1.000000 0.000000 0.000000 0.000000 0.634850 16.749371 406.969929 20.173496
19 knn {'leaf_size': 4, 'n_neighbors': 6, 'p': 2, 'we... 0.763829 15.286292 361.369810 19.009729 0.623624 16.737117 419.481637 20.481251
20 knn {'leaf_size': 3, 'n_neighbors': 5, 'p': 3, 'we... 1.000000 0.000000 0.000000 0.000000 0.609258 16.932070 435.493098 20.868471
9 dtree {'max_depth': 17, 'max_features': None, 'min_s... 0.785072 13.964987 328.865193 18.134641 0.514896 18.234117 540.661778 23.252135
10 dtree {'max_depth': 16, 'max_features': None, 'min_s... 0.798162 13.557073 308.836967 17.573758 0.514448 17.845325 541.160450 23.262856
8 dtree {'max_depth': 38, 'max_features': None, 'min_s... 0.785923 13.914824 327.563924 18.098727 0.511920 18.314280 543.978647 23.323350
6 dtree {'max_depth': 22, 'max_features': None, 'min_s... 0.812903 12.844460 286.280891 16.919837 0.498947 18.948043 558.436893 23.631269
7 dtree {'max_depth': 27, 'max_features': None, 'min_s... 0.764374 14.766666 360.536324 18.987794 0.479625 19.254532 579.972335 24.082615

The max fuel centerline temperature results closely follow the averaged results with decision tree back at the bottom.

[27]:
postprocessor.metrics(y="max_fuel_surface_temp")
[27]:
Model Types Parameter Configurations Train R2 Train MAE Train MSE Train RMSE Test R2 Test MAE Test MSE Test RMSE
5 lasso {'alpha': 0.00026923161864008733} 0.923426 1.887410 4.683800 2.164209 0.921733 1.855567 4.557854 2.134913
4 lasso {'alpha': 0.00023744920034234952} 0.923631 1.883890 4.671267 2.161311 0.921359 1.859363 4.579627 2.140006
3 lasso {'alpha': 0.00020503774055468597} 0.923826 1.880116 4.659308 2.158543 0.920907 1.863910 4.605909 2.146138
2 lasso {'alpha': 0.00015465795335091117} 0.924073 1.874468 4.644197 2.155040 0.920146 1.870981 4.650269 2.156448
1 lasso {'alpha': 0.00013061058032902984} 0.924168 1.871829 4.638431 2.153702 0.919754 1.874508 4.673086 2.161732
22 nn {'batch_size': 8, 'learning_rate': 0.000786843... 0.940595 1.544292 3.633616 1.906205 0.917812 1.801895 4.786168 2.187731
0 linear {'copy_X': True, 'fit_intercept': True, 'n_job... 0.924431 1.860812 4.622355 2.149966 0.917141 1.895945 4.825240 2.196643
25 nn {'batch_size': 8, 'learning_rate': 0.000888957... 0.941103 1.539803 3.602550 1.898038 0.916696 1.842820 4.851183 2.202540
21 nn {'batch_size': 11, 'learning_rate': 0.00089686... 0.940919 1.427752 3.613826 1.901007 0.908830 1.788210 5.309227 2.304176
23 nn {'batch_size': 8, 'learning_rate': 0.000885509... 0.932947 1.509893 4.101442 2.025202 0.903109 1.830064 5.642411 2.375376
24 nn {'batch_size': 8, 'learning_rate': 0.000696190... 0.932070 1.584097 4.155082 2.038402 0.894440 1.978113 6.147198 2.479354
13 rforest {'criterion': 'poisson', 'max_features': 6, 'm... 0.926290 1.588091 4.508621 2.123351 0.741763 2.977153 15.038274 3.877921
12 rforest {'criterion': 'poisson', 'max_features': 6, 'm... 0.926015 1.559212 4.525427 2.127305 0.730650 3.005590 15.685442 3.960485
18 knn {'leaf_size': 3, 'n_neighbors': 9, 'p': 2, 'we... 1.000000 0.000000 0.000000 0.000000 0.698417 3.373240 17.562533 4.190768
14 rforest {'criterion': 'poisson', 'max_features': None,... 0.873001 2.058511 7.768167 2.787143 0.694601 3.255453 17.784719 4.217193
15 rforest {'criterion': 'squared_error', 'max_features':... 0.886329 2.009326 6.952905 2.636836 0.690812 3.262850 18.005393 4.243276
11 rforest {'criterion': 'squared_error', 'max_features':... 0.894742 1.896978 6.438327 2.537386 0.687279 3.287584 18.211139 4.267451
20 knn {'leaf_size': 3, 'n_neighbors': 5, 'p': 3, 'we... 1.000000 0.000000 0.000000 0.000000 0.686936 3.350377 18.231122 4.269792
17 knn {'leaf_size': 12, 'n_neighbors': 5, 'p': 2, 'w... 1.000000 0.000000 0.000000 0.000000 0.681279 3.400398 18.560555 4.308196
16 knn {'leaf_size': 18, 'n_neighbors': 5, 'p': 2, 'w... 1.000000 0.000000 0.000000 0.000000 0.681279 3.400398 18.560555 4.308196
19 knn {'leaf_size': 4, 'n_neighbors': 6, 'p': 2, 'we... 0.785190 2.826438 13.139274 3.624814 0.673263 3.501521 19.027322 4.362032
9 dtree {'max_depth': 17, 'max_features': None, 'min_s... 0.724388 3.281710 16.858379 4.105896 0.556200 4.057577 25.844417 5.083740
8 dtree {'max_depth': 38, 'max_features': None, 'min_s... 0.730053 3.224022 16.511850 4.063478 0.549499 4.098945 26.234695 5.121982
7 dtree {'max_depth': 27, 'max_features': None, 'min_s... 0.722676 3.284324 16.963042 4.118621 0.541838 4.126040 26.680820 5.165348
10 dtree {'max_depth': 16, 'max_features': None, 'min_s... 0.777191 2.965028 13.628546 3.691686 0.539826 4.215899 26.797979 5.176676
6 dtree {'max_depth': 22, 'max_features': None, 'min_s... 0.817081 2.736612 11.188577 3.344933 0.527756 4.281605 27.500883 5.244128

The max fuel surface temperature is the output with the worst results for all models. All models struggle to predict this output with none predicting greater than 0.95 r-squared.

[28]:
postprocessor.metrics(y="radial_clad_dia")
[28]:
Model Types Parameter Configurations Train R2 Train MAE Train MSE Train RMSE Test R2 Test MAE Test MSE Test RMSE
1 lasso {'alpha': 0.00013061058032902984} 0.996869 3.613964e-08 2.168691e-15 4.656921e-08 0.996217 3.525466e-08 2.047072e-15 4.524458e-08
2 lasso {'alpha': 0.00015465795335091117} 0.996801 3.657936e-08 2.215827e-15 4.707257e-08 0.996182 3.549197e-08 2.066157e-15 4.545500e-08
0 linear {'copy_X': True, 'fit_intercept': True, 'n_job... 0.997059 3.504616e-08 2.036892e-15 4.513194e-08 0.996148 3.599969e-08 2.084395e-15 4.565517e-08
3 lasso {'alpha': 0.00020503774055468597} 0.996621 3.766758e-08 2.340340e-15 4.837706e-08 0.996066 3.602923e-08 2.128717e-15 4.613802e-08
4 lasso {'alpha': 0.00023744920034234952} 0.996478 3.848066e-08 2.438882e-15 4.938504e-08 0.995962 3.650793e-08 2.185121e-15 4.674528e-08
5 lasso {'alpha': 0.00026923161864008733} 0.996319 3.934030e-08 2.549569e-15 5.049325e-08 0.995837 3.716851e-08 2.252809e-15 4.746377e-08
22 nn {'batch_size': 8, 'learning_rate': 0.000786843... 0.993284 5.072550e-08 4.651096e-15 6.819895e-08 0.988844 5.897759e-08 6.037071e-15 7.769859e-08
24 nn {'batch_size': 8, 'learning_rate': 0.000696190... 0.993038 4.950792e-08 4.821267e-15 6.943534e-08 0.988407 5.466055e-08 6.273828e-15 7.920750e-08
21 nn {'batch_size': 11, 'learning_rate': 0.00089686... 0.987111 7.948478e-08 8.926073e-15 9.447790e-08 0.981736 8.137518e-08 9.883658e-15 9.941659e-08
23 nn {'batch_size': 8, 'learning_rate': 0.000885509... 0.985501 7.985379e-08 1.004140e-14 1.002068e-07 0.981658 7.673337e-08 9.925957e-15 9.962910e-08
25 nn {'batch_size': 8, 'learning_rate': 0.000888957... 0.986553 8.093902e-08 9.313114e-15 9.650448e-08 0.979363 8.795358e-08 1.116792e-14 1.056784e-07
13 rforest {'criterion': 'poisson', 'max_features': 6, 'm... 0.949808 1.370162e-07 3.476067e-14 1.864421e-07 0.813730 2.393689e-07 1.008033e-13 3.174954e-07
12 rforest {'criterion': 'poisson', 'max_features': 6, 'm... 0.951128 1.287713e-07 3.384638e-14 1.839739e-07 0.808072 2.414137e-07 1.038653e-13 3.222813e-07
15 rforest {'criterion': 'squared_error', 'max_features':... 0.928782 1.659666e-07 4.932228e-14 2.220862e-07 0.784600 2.596526e-07 1.165671e-13 3.414192e-07
14 rforest {'criterion': 'poisson', 'max_features': None,... 0.925567 1.634129e-07 5.154932e-14 2.270447e-07 0.781978 2.584975e-07 1.179862e-13 3.434913e-07
11 rforest {'criterion': 'squared_error', 'max_features':... 0.934179 1.519128e-07 4.558462e-14 2.135055e-07 0.781294 2.617171e-07 1.183566e-13 3.440300e-07
18 knn {'leaf_size': 3, 'n_neighbors': 9, 'p': 2, 'we... 1.000000 0.000000e+00 0.000000e+00 0.000000e+00 0.699191 3.321722e-07 1.627880e-13 4.034700e-07
16 knn {'leaf_size': 18, 'n_neighbors': 5, 'p': 2, 'w... 1.000000 0.000000e+00 0.000000e+00 0.000000e+00 0.672844 3.527152e-07 1.770461e-13 4.207684e-07
17 knn {'leaf_size': 12, 'n_neighbors': 5, 'p': 2, 'w... 1.000000 0.000000e+00 0.000000e+00 0.000000e+00 0.672844 3.527152e-07 1.770461e-13 4.207684e-07
19 knn {'leaf_size': 4, 'n_neighbors': 6, 'p': 2, 'we... 0.782016 3.116667e-07 1.509663e-13 3.885438e-07 0.663510 3.536111e-07 1.820972e-13 4.267285e-07
20 knn {'leaf_size': 3, 'n_neighbors': 5, 'p': 3, 'we... 1.000000 0.000000e+00 0.000000e+00 0.000000e+00 0.656126 3.482672e-07 1.860932e-13 4.313852e-07
10 dtree {'max_depth': 16, 'max_features': None, 'min_s... 0.826420 2.706021e-07 1.202142e-13 3.467193e-07 0.582367 3.703769e-07 2.260091e-13 4.754041e-07
9 dtree {'max_depth': 17, 'max_features': None, 'min_s... 0.807368 2.815260e-07 1.334086e-13 3.652514e-07 0.576804 3.748928e-07 2.290195e-13 4.785599e-07
8 dtree {'max_depth': 38, 'max_features': None, 'min_s... 0.809689 2.790239e-07 1.318012e-13 3.630443e-07 0.572225 3.777103e-07 2.314975e-13 4.811419e-07
6 dtree {'max_depth': 22, 'max_features': None, 'min_s... 0.839893 2.540108e-07 1.108834e-13 3.329915e-07 0.568013 3.858896e-07 2.337769e-13 4.835048e-07
7 dtree {'max_depth': 27, 'max_features': None, 'min_s... 0.796424 2.860791e-07 1.409882e-13 3.754840e-07 0.532204 3.998662e-07 2.531555e-13 5.031456e-07

The performance metrics for radial cladding diameter follow the averaged results.

We can see the parameters of each model with the best Test R2 with get_params.

[29]:
for model in model_settings["models"]:
    print(postprocessor.get_params(model_type=model), "\n")
  Model Types  copy_X  fit_intercept n_jobs  positive
0      linear    True           True   None     False

  Model Types     alpha
0       lasso  0.000131

  Model Types  max_depth max_features  min_samples_leaf  min_samples_split
0       dtree         16         None                 7                 13

  Model Types  leaf_size  n_neighbors  p   weights
0         knn          3            9  2  distance

  Model Types criterion  max_features  min_samples_leaf  min_samples_split  \
0     rforest   poisson             6                 1                  5

   n_estimators
0            89

  Model Types  batch_size  learning_rate mid_num_node_strategy  num_layers  \
0          nn           8       0.000787                linear           2

   start_num_nodes
0              300

We can visualize the performance of each model with diagonal validation plots. These plots show the predicted output to the actual output. For the plots below we will do max fuel surface temperature.

[31]:
models = np.array([["linear", "lasso"], ["dtree", "knn"], ["rforest", "nn"]])

output = ["max_fuel_surface_temp"]

fig = plt.figure(constrained_layout=fig, figsize=(10,15))
gs = GridSpec(models.shape[0], models.shape[1], figure=fig)
for i in range(models.shape[0]):
    for j in range(models.shape[1]):
        if models[i, j] != None:
            ax = fig.add_subplot(gs[i, j])
            ax = postprocessor.diagonal_validation_plot(
                model_type=models[i, j],
                y=output,
            )
            ax.set_title(models[i, j])
../_images/examples_fuel_performance_33_0.png

With these plots we can see the narrow spread of linear and lasso to \(y = x\), the best possible performance of a model. Additionally, knn appears to be overfit to the training data set and the preditions of nn under 700 K under-approximate the max fuel surface temperature.

Similarly, the validation_plot function produces validation plots that show the absolute relative error for each max fuel surface temperature prediction.

[32]:
fig = plt.figure(constrained_layout=fig, figsize=(10,20))
gs = GridSpec(models.shape[0], models.shape[1], figure=fig)
for i in range(models.shape[0]):
    for j in range(models.shape[1]):
        if models[i, j] != None:
            ax = fig.add_subplot(gs[i, j])
            ax = postprocessor.validation_plot(
                model_type=models[i, j],
                y=output,
            )
            ax.set_title(models[i, j])
../_images/examples_fuel_performance_35_0.png

The performance gap of the linear model to the others is evident in the magnitude of the relative error. There is also one common outlier between linear, lasso, dtree, and rforest between 80 and 100.

Finally, the learning curve of the most performant nn is shown by nn_learning_plot.

[33]:
fig, ax = plt.subplots(figsize=(8,8))
ax = postprocessor.nn_learning_plot()
../_images/examples_fuel_performance_37_0.png

The validation curve is below the training curve; therefore, the nn is not overfit.

References

      1. RADAIDEH and T. KOZLOWSKI, “Surrogate modeling of advanced computer simulations using deep Gaussian processes,” Reliability Engineering System Safety, 195, 106731 (2020).

  1. Rossiter, G., Massara, S., & Amaya, M. (2016). OECD/NEA benchmark on pellet-clad mechanical interaction modelling with fuel performance codes (AECL-CW–124600-CONF-004). Canada

pyMAISElogo.png