Heat Conduction

Intputs

qprime: Linear heat generation rate (\(\frac{W}{m}\))
mdot: Mass flow rate (\(\frac{g}{s}\))
Tin: Temperature of the fuel boundary (\(K\))
R: Fuel radius (\(m\))
L: Fuel length (\(m\))
Cp: Heat capacity (\(\frac{J}{g \cdot K}\))
k: Thermal conductivity (\(\frac{W}{m \cdot K}\))

Output

T: Fuel centerline temperature (\(K\))

This data set consists of 1000 data points with 7 inputs and 1 output. The data set was constructed through Latin hypercube sampling of the 7 input parameters for heat conduction through a fuel rod. These samples were then used to solve for the fuel centerline temperature analytically. The geometry of the problem is illustrated in the figure below, and it is assumed volumetric heat generation is uniform radially. The problem is defined by

\begin{equation} \frac{1}{r}\frac{d}{dr}(kr\frac{dT}{dr}) + q''' = 0, \end{equation}

with two boundary conditions: \(\frac{dT}{dr}|_{r = 0} = 0\) and \(T(R) = T_{in}\). Therefore, the temperature profile in the fuel is

\begin{equation} T(r) = \frac{q'}{4\pi k}(1 - (r / R)^2) + T_{in}. \end{equation}

[6]:

import pyMAISE as mai
import time
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.gridspec import GridSpec
from scipy.stats import uniform, randint
from sklearn.model_selection import ShuffleSplit

# Plot settings
matplotlib_settings = {
    "font.size": 14,
    "legend.fontsize": 12,
}
plt.rcParams.update(**matplotlib_settings)

pyMAISE Initialization

Starting any pyMAISE job requires initialization, this includes the definition of global settings used throughout pyMAISE. For this problem the pyMAISE defaults are used:

verbosity: 0 \(\rightarrow\) pyMAISE prints no outputs,
random_state: None \(\rightarrow\) No random seed is set,
test_size: 0.3 \(\rightarrow\) 30% of the data is used for testing,
num_configs_saved: 5 \(\rightarrow\) The top 5 hyper-parameter configurations are saved for each model.

The settings are initialized and the heat conduction preprocessor specific to this data set is retrieved.

[7]:

global_settings = mai.settings.init()
preprocessor = mai.load_heat()

The heat conduction data set has 7 inputs

[8]:

preprocessor.inputs.head()

[8]:

	qprime	mdot	Tin	R	L	Cp	k
0	35987.992759	206.185816	573.151869	0.004901	3.448155	4.096140	0.960945
1	38481.055798	192.378974	573.150960	0.004966	3.436833	4.249182	1.011272
2	39143.292108	205.076928	573.153975	0.005210	3.681457	4.237540	0.994646
3	38687.579644	199.594924	573.150777	0.004864	3.624594	4.158921	1.028158
4	40469.561181	194.142084	573.150977	0.004914	3.731222	4.244278	1.018738

and 1 output with 1000 data points.

[9]:

preprocessor.outputs.head()

[9]:

	T
0	1034.133784
1	1170.316042
2	1164.893565
3	1205.250040
4	1444.718666

To better understand the data here is a correlation matrix of the data.

[10]:

fig, ax = plt.subplots(figsize=(15,10))
fig, ax = preprocessor.correlation_matrix(fig=fig, ax=ax, colorbar=False, annotations=True)

../_images/examples_heat_conduction_9_0.png

As expected there is a strong correlation between linear heat generation rate, qprime, and the centerline fuel temperature, T.

Prior to model training the data is min-max scaled to make each feature’s effect size comperable. Additionaly, this can improve the performance of some models.

[12]:

data = preprocessor.min_max_scale()

Model Initialization

Given this data set has a 1-dimensional output we will compare the performance of 7 machine learning (ML) models:

Linear regression: linear,
Lasso regression: lasso,
Support vector regression: svr,
Decision tree regression: dtree,
Random forest regression: rforest,
K-nearest neighbors regression: knn,
Sequential dense neural networks: nn.

For hyper-parameter tuning each model must be initialized. We will use the Scikit-learn defaults for the classical ML models (linear, lasso, svr, dtree, rforest, and knn); therefore, they are only specified in the models parameter of the model_settings dictionary. However, we must specify many parameters for the nn model that define the layers, optimizer, and training.

[13]:

model_settings = {
    "models": ["linear", "lasso", "dtree", "svr", "knn", "rforest", "nn"],
    "nn": {
        # Sequential
        "num_layers": 4,
        "dropout": True,
        "rate": 0.5,
        "validation_split": 0.15,
        "loss": "mean_absolute_error",
        "metrics": ["mean_absolute_error"],
        "batch_size": 8,
        "epochs": 50,
        "warm_start": True,
        "jit_compile": False,
        # Starting Layer
        "start_num_nodes": 100,
        "start_kernel_initializer": "normal",
        "start_activation": "relu",
        "input_dim": preprocessor.inputs.shape[1], # Number of inputs
        # Middle Layers
        "mid_num_node_strategy": "linear", # Middle layer nodes vary linearly from 'start_num_nodes' to 'end_num_nodes'
        "mid_kernel_initializer": "normal",
        "mid_activation": "relu",
        # Ending Layer
        "end_num_nodes": preprocessor.outputs.shape[1], # Number of outputs
        "end_activation": "linear",
        "end_kernel_initializer": "normal",
        # Optimizer
        "optimizer": "adam",
        "learning_rate": 5e-4,
    },
}
tuning = mai.Tuning(data=data, model_settings=model_settings)

Hyper-parameter Tuning

While three hyper-parameter tuning functions are supported (grid_search, random_search, and bayesian_search), random_search and bayesian_search are used for the classical ML and nn models, respectively. random_search randomly samples a defined parameter space and the number of models generated is easily defined. A large number of classical models are generated through random_search as they are relatively quick to train. However, the prohibative time required to train neural networks makes bayesian_search more appealing as the search converges to the optimal hyper-parameters in relatively few iterations. The hyper-parameter search spaces are outlined below, many with Scipy uniform distributions. Cross validation is used in both search methods to eliminate bias from the data set.

[14]:

random_search_spaces = {
    "lasso": {
        "alpha": uniform(loc=0.0001, scale=0.0099), # 0.0001 - 0.01
    },
    "dtree": {
        "max_depth": randint(low=5, high=50), # 5 - 50
        "max_features": [None, "sqrt", "log2", 2, 4, 6],
        "min_samples_split": randint(low=2, high=20), # 2 - 20
        "min_samples_leaf": randint(low=1, high=20), # 1 - 20
    },
    "svr": {
        "kernel": ["linear", "poly", "rbf", "sigmoid"],
        "degree": randint(low=1, high=5),
        "gamma": ["scale", "auto"],
    },
    "rforest": {
        "n_estimators": randint(low=50, high=200), # 50 - 200
        "criterion": ["squared_error", "absolute_error", "poisson"],
        "min_samples_split": randint(low=2, high=20), # 2 - 20
        "min_samples_leaf": randint(low=1, high=20), # 1 - 20
        "max_features": [None, "sqrt", "log2", 2, 4, 6],
    },
    "knn": {
        "n_neighbors": randint(low=1, high=20), # 1 - 20
        "weights": ["uniform", "distance"],
        "leaf_size": randint(low=1, high=30), # 1 - 30
        "p": randint(low=1, high=10), # 1 - 10
    },
}
bayesian_search_spaces = {
    "nn": {
        "mid_num_node_strategy": ["constant", "linear"],
        "batch_size": [8, 64],
        "learning_rate": [1e-5, 0.001],
        "num_layers": [2, 6],
        "start_num_nodes": [25, 300],
    },
}

start = time.time()
random_search_configs = tuning.random_search(
    param_spaces=random_search_spaces,
    models=["linear"] + list(random_search_spaces.keys()),
    n_iter=300,
    cv=ShuffleSplit(n_splits=5, test_size=0.25, random_state=global_settings.random_state),
)
bayesian_search_configs = tuning.bayesian_search(
    param_spaces=bayesian_search_spaces,
    models=bayesian_search_spaces.keys(),
    n_iter=50,
    cv=5,
)
stop = time.time()
print("Hyper-parameter tuning took " + str((stop - start) / 60) + " minutes to process.")

Hyper-parameter tuning search space was not provided for linear, doing manual fit
Hyper-parameter tuning took 64.99195301135381 minutes to process.

With the conclusion of hyper-parameter tuning we can see the training results of each iteration of the bayesian search.

[15]:

fig, ax = plt.subplots(figsize=(8,8))
ax = tuning.convergence_plot(model_types="nn")

../_images/examples_heat_conduction_17_0.png

After about 35 iterations the bayesian search converges on the optimal solution for this parameter space.

Model Post-processing

With the models tuned and the top num_configs_saved saved, we can now pass these models to the PostProcessor for model comparison and analysis. We will increase the epochs of the nn models for better performance.

[16]:

new_model_settings = {
    "nn": {"epochs": 200}
}
postprocessor = mai.PostProcessor(
    data=data,
    models_list=[random_search_configs, bayesian_search_configs],
    new_model_settings=new_model_settings,
    yscaler=preprocessor.yscaler,
)

To compare the performance of these models we will compute 4 metrics for both the training and testing data:

mean squared error MSE \(=\frac{1}{n}\sum^n_{i = 1}(y_i - \hat{y_i})^2\),
root mean squared error RMSE \(=\sqrt{\frac{1}{n}\sum^n_{i = 1}(y_i - \hat{y_i})^2}\),
mean absolute error MAE = \(=\frac{1}{n}\sum^n_{i = 1}|y_i - \hat{y_i}|\),
and r-squared R2 \(=1 - \frac{\sum^n_{i = 1}(y_i - \hat{y_i})^2}{\sum^n_{i = 1}(y_i - \bar{y_i})^2}\),

where \(y\) is the actual outcome, \(\bar{y}\) is the average outcome, \(\bar{y}\) is the model predicted outcome, and \(n\) is the number of observations.

[17]:

postprocessor.metrics()

[17]:

	Model Types	Parameter Configurations	Train R2	Train MAE	Train MSE	Train RMSE	Test R2	Test MAE	Test MSE	Test RMSE
16	rforest	{'criterion': 'poisson', 'max_features': 6, 'm...	0.998453	3.591531	37.184198	6.097885	0.990350	9.288734	207.756209	14.413751
20	rforest	{'criterion': 'poisson', 'max_features': 6, 'm...	0.996445	6.080878	85.427674	9.242709	0.989620	10.205636	223.480141	14.949252
17	rforest	{'criterion': 'poisson', 'max_features': None,...	0.997094	5.037790	69.824015	8.356077	0.988891	9.804428	239.182928	15.465540
19	rforest	{'criterion': 'squared_error', 'max_features':...	0.995783	5.840237	101.341391	10.066846	0.988845	9.861571	240.177509	15.497661
18	rforest	{'criterion': 'absolute_error', 'max_features'...	0.995097	5.859402	117.824900	10.854718	0.987887	10.476799	260.805013	16.149459
8	dtree	{'max_depth': 9, 'max_features': None, 'min_sa...	0.999305	2.733475	16.705728	4.087264	0.978193	14.604683	469.513283	21.668255
7	dtree	{'max_depth': 33, 'max_features': 6, 'min_samp...	0.997990	4.899830	48.294779	6.949445	0.977932	15.095519	475.123305	21.797323
6	dtree	{'max_depth': 24, 'max_features': None, 'min_s...	0.996641	5.159630	80.723680	8.984636	0.977861	14.439304	476.653943	21.832406
9	dtree	{'max_depth': 23, 'max_features': None, 'min_s...	0.997745	4.412194	54.189393	7.361345	0.976071	15.054146	515.199978	22.698017
10	dtree	{'max_depth': 25, 'max_features': 6, 'min_samp...	0.995978	5.808797	96.664687	9.831820	0.974757	15.482882	543.477880	23.312612
26	nn	{'batch_size': 8, 'learning_rate': 0.000721275...	0.900750	35.952520	2385.133483	48.837828	0.873662	39.635764	2720.085656	52.154440
28	nn	{'batch_size': 14, 'learning_rate': 0.00089049...	0.866768	43.658858	3201.772436	56.584207	0.838317	45.660018	3481.078535	59.000666
30	nn	{'batch_size': 14, 'learning_rate': 0.00089754...	0.859826	46.918987	3368.615820	58.039778	0.831708	48.253050	3623.369092	60.194427
5	lasso	{'alpha': 0.0001824989270416765}	0.849727	52.206878	3611.305127	60.094136	0.829573	52.166466	3669.332827	60.575018
4	lasso	{'alpha': 0.00016869826770270902}	0.849751	52.194496	3610.727957	60.089333	0.829515	52.168366	3670.576526	60.585283
3	lasso	{'alpha': 0.00013742897968849972}	0.849798	52.166681	3609.588039	60.079847	0.829376	52.173664	3673.583780	60.610096
2	lasso	{'alpha': 0.00012692803708103}	0.849812	52.157526	3609.257463	60.077096	0.829326	52.175443	3674.652602	60.618913
1	lasso	{'alpha': 0.00011717208282240732}	0.849824	52.149020	3608.973873	60.074736	0.829279	52.178128	3675.672142	60.627322
15	svr	{'degree': 1, 'gamma': 'scale', 'kernel': 'poly'}	0.848972	52.588788	3629.439049	60.244826	0.828916	52.476532	3683.489056	60.691754
14	svr	{'degree': 1, 'gamma': 'scale', 'kernel': 'poly'}	0.848972	52.588788	3629.439049	60.244826	0.828916	52.476532	3683.489056	60.691754
13	svr	{'degree': 1, 'gamma': 'scale', 'kernel': 'poly'}	0.848972	52.588788	3629.439049	60.244826	0.828916	52.476532	3683.489056	60.691754
12	svr	{'degree': 1, 'gamma': 'scale', 'kernel': 'poly'}	0.848972	52.588788	3629.439049	60.244826	0.828916	52.476532	3683.489056	60.691754
11	svr	{'degree': 1, 'gamma': 'scale', 'kernel': 'poly'}	0.848972	52.588788	3629.439049	60.244826	0.828916	52.476532	3683.489056	60.691754
0	linear	{'copy_X': True, 'fit_intercept': True, 'n_job...	0.849893	52.047689	3607.316904	60.060943	0.828604	52.215410	3690.200299	60.747019
29	nn	{'batch_size': 8, 'learning_rate': 0.000709612...	0.862323	43.002099	3308.608881	57.520508	0.827847	45.971629	3706.503882	60.881063
27	nn	{'batch_size': 8, 'learning_rate': 0.000710736...	0.809784	52.969568	4571.188994	67.610569	0.763386	55.993587	5094.357125	71.374765
21	knn	{'leaf_size': 21, 'n_neighbors': 10, 'p': 1, '...	1.000000	0.000000	0.000000	0.000000	0.737858	64.180097	5643.967965	75.126347
24	knn	{'leaf_size': 7, 'n_neighbors': 8, 'p': 1, 'we...	1.000000	0.000000	0.000000	0.000000	0.728624	62.456548	5842.789462	76.438141
25	knn	{'leaf_size': 20, 'n_neighbors': 8, 'p': 1, 'w...	1.000000	0.000000	0.000000	0.000000	0.728624	62.456548	5842.789462	76.438141
23	knn	{'leaf_size': 26, 'n_neighbors': 12, 'p': 1, '...	1.000000	0.000000	0.000000	0.000000	0.725792	67.179263	5903.767667	76.835979
22	knn	{'leaf_size': 17, 'n_neighbors': 7, 'p': 1, 'w...	1.000000	0.000000	0.000000	0.000000	0.720752	62.599642	6012.271535	77.538839

As this data set represents radial heat conduction, we do not expect linear models to perform well. This is shown by the relatively poor performance of linear and lasso. The best perfroming models were rforest and dtree with test and train r-squared above 0.99.

Using the get_params function we can see the optimal hyper-parameter configurations for each model.

[18]:

for model in ["linear", "lasso", "svr", "dtree", "knn", "rforest", "nn"]:
    print(postprocessor.get_params(model_type=model), "\n")

  Model Types  copy_X  fit_intercept n_jobs  positive
0      linear    True           True   None     False

  Model Types     alpha
0       lasso  0.000182

  Model Types  degree  gamma kernel
0         svr       1  scale   poly

  Model Types  max_depth max_features  min_samples_leaf  min_samples_split
0       dtree          9         None                 1                  3

  Model Types  leaf_size  n_neighbors  p   weights
0         knn         21           10  1  distance

  Model Types criterion  max_features  min_samples_leaf  min_samples_split  \
0     rforest   poisson             6                 1                  4

   n_estimators
0           136

  Model Types  batch_size  learning_rate mid_num_node_strategy  num_layers  \
0          nn           8       0.000721              constant           2

   start_num_nodes
0              300

To visualize the performance of these models we can use the diagonal_validation_plot functions to produce diagonal validation plots.

[19]:

models = np.array([["linear", "lasso"], ["dtree", "svr"], ["knn", "rforest"], ["nn", None]])

fig = plt.figure(constrained_layout=fig, figsize=(10,20))
gs = GridSpec(models.shape[0], models.shape[1], figure=fig)
for i in range(models.shape[0] - 1):
    for j in range(models.shape[1]):
        if models[i, j] != None:
            ax = fig.add_subplot(gs[i, j])
            ax = postprocessor.diagonal_validation_plot(model_type=models[i, j])
            ax.set_title(models[i, j])
ax = fig.add_subplot(gs[-1, :])
ax = postprocessor.diagonal_validation_plot(model_type=models[-1, 0])
_ = ax.set_title(models[-1, 0])

../_images/examples_heat_conduction_26_0.png

The performance differences between rforest/dtree with the other models is apparent along with the overfitting of knn. The predictions of rforest and dtree are closely spread along \(y=x\) while the knn test predictions are over-predicted at lower temperatures and under-predicted at higher temperatures.

Similarly, the validation_plot function produces validation plots that show the absolute relative error for each prediction.

[20]:

fig = plt.figure(constrained_layout=fig, figsize=(10,20))
gs = GridSpec(models.shape[0], models.shape[1], figure=fig)
for i in range(models.shape[0] - 1):
    for j in range(models.shape[1]):
        if models[i, j] != None:
            ax = fig.add_subplot(gs[i, j])
            ax = postprocessor.validation_plot(model_type=models[i, j])
            ax.set_title(models[i, j])
ax = fig.add_subplot(gs[-1, :])
ax = postprocessor.validation_plot(model_type=models[-1, 0])
_ = ax.set_title(models[-1, 0])

../_images/examples_heat_conduction_28_0.png

The performance of the models is best represented by the magnitudes observed on the y-axis; however, even rforest gets as high as \(>9.0\%\) error.

Finally, the learning curve of the most performant nn is shown by nn_learning_plot.

[21]:

fig, ax = plt.subplots(figsize=(8,8))
ax = postprocessor.nn_learning_plot()

../_images/examples_heat_conduction_30_0.png

The validation curve is below the training curve; therefore, the nn is not overfit.