Heat Conduction
Intputs
qprime: Linear heat generation rate (\(\frac{W}{m}\))mdot: Mass flow rate (\(\frac{g}{s}\))Tin: Temperature of the fuel boundary (\(K\))R: Fuel radius (\(m\))L: Fuel length (\(m\))Cp: Heat capacity (\(\frac{J}{g \cdot K}\))k: Thermal conductivity (\(\frac{W}{m \cdot K}\))
Output
T: Fuel centerline temperature (\(K\))
This data set consists of 1000 data points with 7 inputs and 1 output. The data set was constructed through Latin hypercube sampling of the 7 input parameters for heat conduction through a fuel rod. These samples were then used to solve for the fuel centerline temperature analytically. The geometry of the problem is illustrated in the figure below, and it is assumed volumetric heat generation is uniform radially. The problem is defined by
with two boundary conditions: \(\frac{dT}{dr}|_{r = 0} = 0\) and \(T(R) = T_{in}\). Therefore, the temperature profile in the fuel is
[6]:
import pyMAISE as mai
import time
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.gridspec import GridSpec
from scipy.stats import uniform, randint
from sklearn.model_selection import ShuffleSplit
# Plot settings
matplotlib_settings = {
"font.size": 14,
"legend.fontsize": 12,
}
plt.rcParams.update(**matplotlib_settings)
pyMAISE Initialization
Starting any pyMAISE job requires initialization, this includes the definition of global settings used throughout pyMAISE. For this problem the pyMAISE defaults are used:
verbosity: 0 \(\rightarrow\) pyMAISE prints no outputs,random_state: None \(\rightarrow\) No random seed is set,test_size: 0.3 \(\rightarrow\) 30% of the data is used for testing,num_configs_saved: 5 \(\rightarrow\) The top 5 hyper-parameter configurations are saved for each model.
The settings are initialized and the heat conduction preprocessor specific to this data set is retrieved.
[7]:
global_settings = mai.settings.init()
preprocessor = mai.load_heat()
The heat conduction data set has 7 inputs
[8]:
preprocessor.inputs.head()
[8]:
| qprime | mdot | Tin | R | L | Cp | k | |
|---|---|---|---|---|---|---|---|
| 0 | 35987.992759 | 206.185816 | 573.151869 | 0.004901 | 3.448155 | 4.096140 | 0.960945 |
| 1 | 38481.055798 | 192.378974 | 573.150960 | 0.004966 | 3.436833 | 4.249182 | 1.011272 |
| 2 | 39143.292108 | 205.076928 | 573.153975 | 0.005210 | 3.681457 | 4.237540 | 0.994646 |
| 3 | 38687.579644 | 199.594924 | 573.150777 | 0.004864 | 3.624594 | 4.158921 | 1.028158 |
| 4 | 40469.561181 | 194.142084 | 573.150977 | 0.004914 | 3.731222 | 4.244278 | 1.018738 |
and 1 output with 1000 data points.
[9]:
preprocessor.outputs.head()
[9]:
| T | |
|---|---|
| 0 | 1034.133784 |
| 1 | 1170.316042 |
| 2 | 1164.893565 |
| 3 | 1205.250040 |
| 4 | 1444.718666 |
To better understand the data here is a correlation matrix of the data.
[10]:
fig, ax = plt.subplots(figsize=(15,10))
fig, ax = preprocessor.correlation_matrix(fig=fig, ax=ax, colorbar=False, annotations=True)
As expected there is a strong correlation between linear heat generation rate, qprime, and the centerline fuel temperature, T.
Prior to model training the data is min-max scaled to make each feature’s effect size comperable. Additionaly, this can improve the performance of some models.
[12]:
data = preprocessor.min_max_scale()
Model Initialization
Given this data set has a 1-dimensional output we will compare the performance of 7 machine learning (ML) models:
Linear regression:
linear,Lasso regression:
lasso,Support vector regression:
svr,Decision tree regression:
dtree,Random forest regression:
rforest,K-nearest neighbors regression:
knn,Sequential dense neural networks:
nn.
For hyper-parameter tuning each model must be initialized. We will use the Scikit-learn defaults for the classical ML models (linear, lasso, svr, dtree, rforest, and knn); therefore, they are only specified in the models parameter of the model_settings dictionary. However, we must specify many parameters for the nn model that define the layers, optimizer, and training.
[13]:
model_settings = {
"models": ["linear", "lasso", "dtree", "svr", "knn", "rforest", "nn"],
"nn": {
# Sequential
"num_layers": 4,
"dropout": True,
"rate": 0.5,
"validation_split": 0.15,
"loss": "mean_absolute_error",
"metrics": ["mean_absolute_error"],
"batch_size": 8,
"epochs": 50,
"warm_start": True,
"jit_compile": False,
# Starting Layer
"start_num_nodes": 100,
"start_kernel_initializer": "normal",
"start_activation": "relu",
"input_dim": preprocessor.inputs.shape[1], # Number of inputs
# Middle Layers
"mid_num_node_strategy": "linear", # Middle layer nodes vary linearly from 'start_num_nodes' to 'end_num_nodes'
"mid_kernel_initializer": "normal",
"mid_activation": "relu",
# Ending Layer
"end_num_nodes": preprocessor.outputs.shape[1], # Number of outputs
"end_activation": "linear",
"end_kernel_initializer": "normal",
# Optimizer
"optimizer": "adam",
"learning_rate": 5e-4,
},
}
tuning = mai.Tuning(data=data, model_settings=model_settings)
Hyper-parameter Tuning
While three hyper-parameter tuning functions are supported (grid_search, random_search, and bayesian_search), random_search and bayesian_search are used for the classical ML and nn models, respectively. random_search randomly samples a defined parameter space and the number of models generated is easily defined. A large number of classical models are generated through random_search as they are relatively quick to train. However, the prohibative time required to
train neural networks makes bayesian_search more appealing as the search converges to the optimal hyper-parameters in relatively few iterations. The hyper-parameter search spaces are outlined below, many with Scipy uniform distributions. Cross validation is used in both search methods to eliminate bias from the data set.
[14]:
random_search_spaces = {
"lasso": {
"alpha": uniform(loc=0.0001, scale=0.0099), # 0.0001 - 0.01
},
"dtree": {
"max_depth": randint(low=5, high=50), # 5 - 50
"max_features": [None, "sqrt", "log2", 2, 4, 6],
"min_samples_split": randint(low=2, high=20), # 2 - 20
"min_samples_leaf": randint(low=1, high=20), # 1 - 20
},
"svr": {
"kernel": ["linear", "poly", "rbf", "sigmoid"],
"degree": randint(low=1, high=5),
"gamma": ["scale", "auto"],
},
"rforest": {
"n_estimators": randint(low=50, high=200), # 50 - 200
"criterion": ["squared_error", "absolute_error", "poisson"],
"min_samples_split": randint(low=2, high=20), # 2 - 20
"min_samples_leaf": randint(low=1, high=20), # 1 - 20
"max_features": [None, "sqrt", "log2", 2, 4, 6],
},
"knn": {
"n_neighbors": randint(low=1, high=20), # 1 - 20
"weights": ["uniform", "distance"],
"leaf_size": randint(low=1, high=30), # 1 - 30
"p": randint(low=1, high=10), # 1 - 10
},
}
bayesian_search_spaces = {
"nn": {
"mid_num_node_strategy": ["constant", "linear"],
"batch_size": [8, 64],
"learning_rate": [1e-5, 0.001],
"num_layers": [2, 6],
"start_num_nodes": [25, 300],
},
}
start = time.time()
random_search_configs = tuning.random_search(
param_spaces=random_search_spaces,
models=["linear"] + list(random_search_spaces.keys()),
n_iter=300,
cv=ShuffleSplit(n_splits=5, test_size=0.25, random_state=global_settings.random_state),
)
bayesian_search_configs = tuning.bayesian_search(
param_spaces=bayesian_search_spaces,
models=bayesian_search_spaces.keys(),
n_iter=50,
cv=5,
)
stop = time.time()
print("Hyper-parameter tuning took " + str((stop - start) / 60) + " minutes to process.")
Hyper-parameter tuning search space was not provided for linear, doing manual fit
Hyper-parameter tuning took 64.99195301135381 minutes to process.
With the conclusion of hyper-parameter tuning we can see the training results of each iteration of the bayesian search.
[15]:
fig, ax = plt.subplots(figsize=(8,8))
ax = tuning.convergence_plot(model_types="nn")
After about 35 iterations the bayesian search converges on the optimal solution for this parameter space.
Model Post-processing
With the models tuned and the top num_configs_saved saved, we can now pass these models to the PostProcessor for model comparison and analysis. We will increase the epochs of the nn models for better performance.
[16]:
new_model_settings = {
"nn": {"epochs": 200}
}
postprocessor = mai.PostProcessor(
data=data,
models_list=[random_search_configs, bayesian_search_configs],
new_model_settings=new_model_settings,
yscaler=preprocessor.yscaler,
)
To compare the performance of these models we will compute 4 metrics for both the training and testing data:
mean squared error
MSE\(=\frac{1}{n}\sum^n_{i = 1}(y_i - \hat{y_i})^2\),root mean squared error
RMSE\(=\sqrt{\frac{1}{n}\sum^n_{i = 1}(y_i - \hat{y_i})^2}\),mean absolute error
MAE= \(=\frac{1}{n}\sum^n_{i = 1}|y_i - \hat{y_i}|\),and r-squared
R2\(=1 - \frac{\sum^n_{i = 1}(y_i - \hat{y_i})^2}{\sum^n_{i = 1}(y_i - \bar{y_i})^2}\),
where \(y\) is the actual outcome, \(\bar{y}\) is the average outcome, \(\bar{y}\) is the model predicted outcome, and \(n\) is the number of observations.
[17]:
postprocessor.metrics()
[17]:
| Model Types | Parameter Configurations | Train R2 | Train MAE | Train MSE | Train RMSE | Test R2 | Test MAE | Test MSE | Test RMSE | |
|---|---|---|---|---|---|---|---|---|---|---|
| 16 | rforest | {'criterion': 'poisson', 'max_features': 6, 'm... | 0.998453 | 3.591531 | 37.184198 | 6.097885 | 0.990350 | 9.288734 | 207.756209 | 14.413751 |
| 20 | rforest | {'criterion': 'poisson', 'max_features': 6, 'm... | 0.996445 | 6.080878 | 85.427674 | 9.242709 | 0.989620 | 10.205636 | 223.480141 | 14.949252 |
| 17 | rforest | {'criterion': 'poisson', 'max_features': None,... | 0.997094 | 5.037790 | 69.824015 | 8.356077 | 0.988891 | 9.804428 | 239.182928 | 15.465540 |
| 19 | rforest | {'criterion': 'squared_error', 'max_features':... | 0.995783 | 5.840237 | 101.341391 | 10.066846 | 0.988845 | 9.861571 | 240.177509 | 15.497661 |
| 18 | rforest | {'criterion': 'absolute_error', 'max_features'... | 0.995097 | 5.859402 | 117.824900 | 10.854718 | 0.987887 | 10.476799 | 260.805013 | 16.149459 |
| 8 | dtree | {'max_depth': 9, 'max_features': None, 'min_sa... | 0.999305 | 2.733475 | 16.705728 | 4.087264 | 0.978193 | 14.604683 | 469.513283 | 21.668255 |
| 7 | dtree | {'max_depth': 33, 'max_features': 6, 'min_samp... | 0.997990 | 4.899830 | 48.294779 | 6.949445 | 0.977932 | 15.095519 | 475.123305 | 21.797323 |
| 6 | dtree | {'max_depth': 24, 'max_features': None, 'min_s... | 0.996641 | 5.159630 | 80.723680 | 8.984636 | 0.977861 | 14.439304 | 476.653943 | 21.832406 |
| 9 | dtree | {'max_depth': 23, 'max_features': None, 'min_s... | 0.997745 | 4.412194 | 54.189393 | 7.361345 | 0.976071 | 15.054146 | 515.199978 | 22.698017 |
| 10 | dtree | {'max_depth': 25, 'max_features': 6, 'min_samp... | 0.995978 | 5.808797 | 96.664687 | 9.831820 | 0.974757 | 15.482882 | 543.477880 | 23.312612 |
| 26 | nn | {'batch_size': 8, 'learning_rate': 0.000721275... | 0.900750 | 35.952520 | 2385.133483 | 48.837828 | 0.873662 | 39.635764 | 2720.085656 | 52.154440 |
| 28 | nn | {'batch_size': 14, 'learning_rate': 0.00089049... | 0.866768 | 43.658858 | 3201.772436 | 56.584207 | 0.838317 | 45.660018 | 3481.078535 | 59.000666 |
| 30 | nn | {'batch_size': 14, 'learning_rate': 0.00089754... | 0.859826 | 46.918987 | 3368.615820 | 58.039778 | 0.831708 | 48.253050 | 3623.369092 | 60.194427 |
| 5 | lasso | {'alpha': 0.0001824989270416765} | 0.849727 | 52.206878 | 3611.305127 | 60.094136 | 0.829573 | 52.166466 | 3669.332827 | 60.575018 |
| 4 | lasso | {'alpha': 0.00016869826770270902} | 0.849751 | 52.194496 | 3610.727957 | 60.089333 | 0.829515 | 52.168366 | 3670.576526 | 60.585283 |
| 3 | lasso | {'alpha': 0.00013742897968849972} | 0.849798 | 52.166681 | 3609.588039 | 60.079847 | 0.829376 | 52.173664 | 3673.583780 | 60.610096 |
| 2 | lasso | {'alpha': 0.00012692803708103} | 0.849812 | 52.157526 | 3609.257463 | 60.077096 | 0.829326 | 52.175443 | 3674.652602 | 60.618913 |
| 1 | lasso | {'alpha': 0.00011717208282240732} | 0.849824 | 52.149020 | 3608.973873 | 60.074736 | 0.829279 | 52.178128 | 3675.672142 | 60.627322 |
| 15 | svr | {'degree': 1, 'gamma': 'scale', 'kernel': 'poly'} | 0.848972 | 52.588788 | 3629.439049 | 60.244826 | 0.828916 | 52.476532 | 3683.489056 | 60.691754 |
| 14 | svr | {'degree': 1, 'gamma': 'scale', 'kernel': 'poly'} | 0.848972 | 52.588788 | 3629.439049 | 60.244826 | 0.828916 | 52.476532 | 3683.489056 | 60.691754 |
| 13 | svr | {'degree': 1, 'gamma': 'scale', 'kernel': 'poly'} | 0.848972 | 52.588788 | 3629.439049 | 60.244826 | 0.828916 | 52.476532 | 3683.489056 | 60.691754 |
| 12 | svr | {'degree': 1, 'gamma': 'scale', 'kernel': 'poly'} | 0.848972 | 52.588788 | 3629.439049 | 60.244826 | 0.828916 | 52.476532 | 3683.489056 | 60.691754 |
| 11 | svr | {'degree': 1, 'gamma': 'scale', 'kernel': 'poly'} | 0.848972 | 52.588788 | 3629.439049 | 60.244826 | 0.828916 | 52.476532 | 3683.489056 | 60.691754 |
| 0 | linear | {'copy_X': True, 'fit_intercept': True, 'n_job... | 0.849893 | 52.047689 | 3607.316904 | 60.060943 | 0.828604 | 52.215410 | 3690.200299 | 60.747019 |
| 29 | nn | {'batch_size': 8, 'learning_rate': 0.000709612... | 0.862323 | 43.002099 | 3308.608881 | 57.520508 | 0.827847 | 45.971629 | 3706.503882 | 60.881063 |
| 27 | nn | {'batch_size': 8, 'learning_rate': 0.000710736... | 0.809784 | 52.969568 | 4571.188994 | 67.610569 | 0.763386 | 55.993587 | 5094.357125 | 71.374765 |
| 21 | knn | {'leaf_size': 21, 'n_neighbors': 10, 'p': 1, '... | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.737858 | 64.180097 | 5643.967965 | 75.126347 |
| 24 | knn | {'leaf_size': 7, 'n_neighbors': 8, 'p': 1, 'we... | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.728624 | 62.456548 | 5842.789462 | 76.438141 |
| 25 | knn | {'leaf_size': 20, 'n_neighbors': 8, 'p': 1, 'w... | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.728624 | 62.456548 | 5842.789462 | 76.438141 |
| 23 | knn | {'leaf_size': 26, 'n_neighbors': 12, 'p': 1, '... | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.725792 | 67.179263 | 5903.767667 | 76.835979 |
| 22 | knn | {'leaf_size': 17, 'n_neighbors': 7, 'p': 1, 'w... | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.720752 | 62.599642 | 6012.271535 | 77.538839 |
As this data set represents radial heat conduction, we do not expect linear models to perform well. This is shown by the relatively poor performance of linear and lasso. The best perfroming models were rforest and dtree with test and train r-squared above 0.99.
Using the get_params function we can see the optimal hyper-parameter configurations for each model.
[18]:
for model in ["linear", "lasso", "svr", "dtree", "knn", "rforest", "nn"]:
print(postprocessor.get_params(model_type=model), "\n")
Model Types copy_X fit_intercept n_jobs positive
0 linear True True None False
Model Types alpha
0 lasso 0.000182
Model Types degree gamma kernel
0 svr 1 scale poly
Model Types max_depth max_features min_samples_leaf min_samples_split
0 dtree 9 None 1 3
Model Types leaf_size n_neighbors p weights
0 knn 21 10 1 distance
Model Types criterion max_features min_samples_leaf min_samples_split \
0 rforest poisson 6 1 4
n_estimators
0 136
Model Types batch_size learning_rate mid_num_node_strategy num_layers \
0 nn 8 0.000721 constant 2
start_num_nodes
0 300
To visualize the performance of these models we can use the diagonal_validation_plot functions to produce diagonal validation plots.
[19]:
models = np.array([["linear", "lasso"], ["dtree", "svr"], ["knn", "rforest"], ["nn", None]])
fig = plt.figure(constrained_layout=fig, figsize=(10,20))
gs = GridSpec(models.shape[0], models.shape[1], figure=fig)
for i in range(models.shape[0] - 1):
for j in range(models.shape[1]):
if models[i, j] != None:
ax = fig.add_subplot(gs[i, j])
ax = postprocessor.diagonal_validation_plot(model_type=models[i, j])
ax.set_title(models[i, j])
ax = fig.add_subplot(gs[-1, :])
ax = postprocessor.diagonal_validation_plot(model_type=models[-1, 0])
_ = ax.set_title(models[-1, 0])
The performance differences between rforest/dtree with the other models is apparent along with the overfitting of knn. The predictions of rforest and dtree are closely spread along \(y=x\) while the knn test predictions are over-predicted at lower temperatures and under-predicted at higher temperatures.
Similarly, the validation_plot function produces validation plots that show the absolute relative error for each prediction.
[20]:
fig = plt.figure(constrained_layout=fig, figsize=(10,20))
gs = GridSpec(models.shape[0], models.shape[1], figure=fig)
for i in range(models.shape[0] - 1):
for j in range(models.shape[1]):
if models[i, j] != None:
ax = fig.add_subplot(gs[i, j])
ax = postprocessor.validation_plot(model_type=models[i, j])
ax.set_title(models[i, j])
ax = fig.add_subplot(gs[-1, :])
ax = postprocessor.validation_plot(model_type=models[-1, 0])
_ = ax.set_title(models[-1, 0])
The performance of the models is best represented by the magnitudes observed on the y-axis; however, even rforest gets as high as \(>9.0\%\) error.
Finally, the learning curve of the most performant nn is shown by nn_learning_plot.
[21]:
fig, ax = plt.subplots(figsize=(8,8))
ax = postprocessor.nn_learning_plot()
The validation curve is below the training curve; therefore, the nn is not overfit.