10.3 Neural networks for heat transfer 335
machine learning texts such as Goodfellow et al. (2016) or optimization texts such as Balaji (2019) for further details on optimization algorithms.
Example 10.1: A long mild steel slab of thickness 10 cm having a thermal conductivity of k = 44.5W/mK can be considered to be insulated on the top and bottom sides. There is no heat generation in the slab, and thermophysical properties can be assumed to be constant. The steady-state temperature recorded by the thermocouples at five selected locations inside the slab is given below. Determine the steady-state heat flux in the slab using appropriate assumptions. Use the direct method for finding the optimal parameters. The slab is 1 m2 in the area in the direction normal to the heat transfer (Table 10.1) Fig. 10.3).
Table 10.1 Steady-state temperature distribution recorded by thermocouples for Example 10.1.
S. no. |
x (in cm) from left end |
T (in °C) |
1 |
1 |
34.51 |
2 |
3 |
34.16 |
3 |
5 |
32.50 |
4 |
7 |
32.24 |
5 |
9 |
30.41 |
|
|
|
FIGURE 10.3
Schematic diagram for Example 10.1.
336 CHAPTER 10 Machine learning in heat transfer
Solution:
At steady state, the flux in this problem is constant across the slab. In particular, |
||||
q = −k |
dT |
. The question, therefore, boils down to finding |
dT |
. |
dx |
|
dx |
||
The chief difficulty here is that we have only discrete values of the temperature.
Further, as we will see below, the measured temperatures do not fall exactly on a straight line as expected, theoretically, in case of the temperature in a slab. This is because of “experimental errors” including noise in the thermocouple measurements. Therefore, we resort to finding the “best” line that fits this data. This now falls within the framework of the learning process we listed above.
Let us now follow the template given in the learning process in order to solve this problem.
Step 1: Formulation—The input and output variables for this problem are obvious. We choose x as the input and y = T as the output variable.
Step 2: Hypothesis—Before imposing a hypothesis function, we plot the given data points to see if we can observe a trend. Fig. 10.4 shows the plot of how our “output,” the temperature, varies with the “input” x, the location. We can immediately observe that the temperature variation has a decreasing trend that is roughly linear. Of course, we also know this from our knowledge of physics of the problem. A good hypothesis function, therefore, would be the linear function
yˆ = h(x; w) = w0 + w1 x |
(10.9) |
Step 3: Data collection—The data is already collected in this problem and given the size of the data (just 5 points), it does not make sense to split it further into training and testing sets.
FIGURE 10.4
Location versus temperature in example 10.1.
10.3 Neural networks for heat transfer 337
FIGURE 10.5
Location versus temperature in example 10.1.
Step 4: Optimal parameters—The hypothesis function is linear and hence the parameters can be computed using the expressions given in Eq. (10.8). Calculating these (see Exercise 10.2) results in
|
w0 = 35.29 w1 = −0.5060 |
(10.10) |
ˆ |
+ w1x , the proposed model is |
|
Since y = w0 |
|
|
|
T (x) ≈ 35.29 − 0.506x |
(10.11) |
A plot of the above best fit along with the original thermocouple data is shown in Fig. 10.5. One can notice that even though the model fit does not pass through any of the data points, it still fits the data quite well.
Now, dTdx = w1 = −0.506 C/cm. Therefore, the heat flux is given by
|
dT |
−2 |
|
2 |
(10.12) |
|
|
|
|
|
|||
q = −k dx = 44.5 × 0.506/10 |
|
= 2251.7W/m |
|
|||
|
|
|
||||
10.3.3 Neural networks
In the above example, we used a linear hypothesis function, as our knowledge of the physics of the problem warranted it. But what about highly nonlinear phenomena involving multiple variables? For a long time, this was solved either by some high order polynomial or power-law regression. These methods work well when there is only a small amount of data or when the nonlinearity is not severe. However, in modern times, with “big” data available, there is scope to apply more complex and highly nonlinear hypothesis functions. One such class of hypothesis function that is very powerful in the nonlinearities it can express is the artificial neural network.
338 CHAPTER 10 Machine learning in heat transfer
FIGURE 10.6
Pictorial representation of the linear model.
The earliest artificial neural networks were biologically inspired. McCulloch and Pitts in 1943 were among the first to try and abstract the processes in a biological neuron to their mathematical and computational equivalents. In this early view of the human mind, the brain was simply a computational device that discerns and learns patterns through unit computational processes taking place in a simple computational unit—the neuron. Whether this view is accurate or not is a subject of much debate, but the outcome of the work of several brilliant minds has been the refinement of neural networks as a simple and powerful technique for approximating highly nonlinear relations. In this book, we avoid the biological analogy (and the ensuing controversy) and simply view the neural network as a powerful class of hypothesis functions.
To understand how neural networks function, consider a pictorial representation (Fig. 10.6) of the linear hypothesis function we discussed previously.
Notice the following features of the figure:
•Each circle gives as output a scalar variable. We call these circles neurons. These are the fundamental building blocks of neural networks.
•The computation takes place sequentially from left to right. We call the first layer of neurons the input layer and the final layer as the output layer.
•The neurons in the input layer are multiplied by the parameters represented by the connecting lines. These parameters w are called weights. Biologically, they are supposed to represent the strength of the connections between neurons. Mathematically, it is clear that the higher this weight between two neurons, the stronger the connection between these two “concepts” or variables.
•The final neuron has both inputs and outputs. And the output is given by a weighted sum of the input: yˆ = w0 × 1+ w1 × x .
We will call the above type of neuron in a linear model a linear neuron. The linear neuron gives as output simply the weighted sum of the inputs yˆ = w0 + w1x . For convenience, we use the notation that x0 = 1 and x1 = x. The constant “neuron” with x0 = 1 is called the bias unit. We can now write the output as
yˆ = w0 x0 + w1 x1
n−1 |
(10.13) |
|
= ∑wi xi |
||
|
||
i=0 |
|
10.3 Neural networks for heat transfer 339
where n is the number of neuron connections coming in. That is,
Output of Linear Neuron = ∑ Weight × (Input to Neuron) |
(10.14) |
The neural network works on a very similar principle as the linear model with just two small modifications:
1.Nonlinearity
2.Hidden layers
Let us look at these modifications in detail.
Nonlinearity—If we have only linear operations in the neuron, we can only represent linear functions. In order to represent nonlinear functions, we just need to make a small change to the linear neuron output given by Eq. (10.15).
Output of Linear Neuron = g(∑ Weights × (Inputs to Neuron)) |
(10.15) |
where g is some nonlinear function. This function g is known as the nonlinearity or the activation function, because biologically it was supposed to represent the fact that a neuron is activated nonlinearly based on the sum of its incoming signals. We represent the output of this nonlinear neuron as aˆ . This output can be seen as the combination of a linear and nonlinear step as follows:
|
n−1 |
|
|
Linear Step z = ∑wi xi |
(10.16) |
|
i=0 |
|
Nonlinear step |
ˆ |
|
a = g(z) |
|
In summary, we can write the output of the neuron as aˆ = g(∑ ni=−11wi xi ). Note that this is the most general expression for any neuron in any neural network, no matter how complicated. A very common choice for the nonlinear activation function is the so-called sigmoid function and is calculated as
σ (z) = |
1 |
(10.17) |
|
1 + exp(−z) |
|||
|
where σ is the symbol representing the sigmoid. Fig. 10.7 shows the variation of the sigmoid with z.
It can be seen both mathematically and graphically that the sigmoid nonlinearity varies between 0 and 1. The output nearly vanishes for highly negative inputs and asymptotes to 1 for highly positive inputs. This is supposed to represent the biological “waking up” or activation of the neuron. Small or negative signals are insufficient to activate it, while large signals make the neuron fire. Regardless of the biological origins, the sigmoid is a commonly used nonlinearity, and, for the rest of the chapter, it will be assumed that whenever we use g, we mean the sigmoid nonlinearity σ . Let us now consider an example of the computation inside an artificial neuron.
Example 10.2: Fig. 10.8 shows a single neuron in a neural network. The inputs to this neuron are shown in the figure. Assuming that all connections have a weight of 1, and that the nonlinearity is a sigmoid, determine aˆ , the output of the neuron.
340 CHAPTER 10 Machine learning in heat transfer
FIGURE 10.7
Output of the Sigmoid nonlinearity.
FIGURE 10.8
Figure for Example 10.2.
10.3 Neural networks for heat transfer 341
Solution: All the weights are given to be wi = 1. The output is given by aˆ = g(∑ ni=−11wi xi ). For clarity, we calculate this in two steps.
Linear step—For convenience, we will use our earlier notation that x0 = 1.
5
z = ∑wi xi
i=0
= w0 x0 + w1 x1 + w2 x2 + w3 x3 + w4 x4 + w5 x5
+0.3 + 0.4 + 0.5
=2.5= 1 + 0.1 + 0.2
Nonlinear step
aˆ = g(z)
= + 1(− )
1 exp 2.5
= 0.9241
(10.18)
(10.19)
As you see, the process is not complicated at all! It is just a nonlinear function applied over a linear model. Nonetheless, the above process is essentially what takes place in every neuron of every neural network, no matter how complex. The only difference would be the nonlinear function being used. In fact, the sigmoid is good enough for most practical purposes.
Hidden layers—The second difference between the linear model shown in Fig. 10.6 and a full neural network is the existence of intermediate layers called “hidden” layers. In the linear model, there were only the input and output layers. However, it can be shown that there are a large number of functions that cannot be approximated by this structure.
The solution is to add intermediate calculation layers in the middle with an arbitrary number of neurons. Fig. 10.9 shows a schematic of such a neural network. The
FIGURE 10.9
A typical deep neural network.
342CHAPTER 10 Machine learning in heat transfer
figure may look complicated, but note that the calculation of each neuron's output is exactly the same process as discussed in Example 10.2. Therefore, the calculation of any forward pass through the network is not complicated and can be easily programmed.
Note:
•There are two hidden layers in this network. Each hidden layer has 5 neurons.
•Each connecting line represents a weight. All the weights are parameters of the neural network hypothesis function.
•It is conventional in neural network diagrams not to show the “bias” unit (the unit with a constant neuron with a value of 1). This is the reason why the bias unit in Fig. 10.8 was shown with a dotted line. The lines emerging from the “bias” units are also similarly hidden. We must assume that these exist in every layer.
•The input layer has 4 neurons, meaning that there are 4 components in the input vector x. Similarly, there are 3 components in the output vector y. If we include the bias units, then are 5 neurons in the input layer and 4 in the output layer.
•Not including the bias weights, there are a total of 4 × 5 + 5 × 5 + 5 × 3 = 60 adjustable weights in this network. (How many weights are there including the bias weights?). If we did not have the hidden layer, we would have had only 12 weights. This is one of the purposes of the hidden layer—the more the number of adjustable weights, the more the number of functions we can approximate well.
•The forward pass through this network simply involves systematically calculating the output of each neuron from left to right and progressing through the network.
• Just like it was possible to write the linear model as yˆ = w0 + w1x, it is possible to write an analytical expression for the neural network shown above. This analytical expression would look like yˆ = NN (x;w). Here, as discussed above, x and y have 4 and 3 components respectively and there are 60 components in w. The analytical expression for this network would obviously be very complicated! Nonetheless, it is possible to write it, and it is very insightful to think of the picture of a neural network as simply a graphical representation of a complicated mathematical function. This idea is used very effectively in an approach called physics informed neural networks by Raissi et al. (2019) for solving inverse problems.
• Since there are multiple weights, it makes sense to decide on some uniform notation denoting a given weight. For this, notice that there is a single line or weight connecting them for two neurons in adjacent layers. We will use the
notation w(k) to denote the weight of the ith neuron in the kth layer joining with
jth ij
the neuron in the next layer.
Given that the neural network is just two simple modifications away from a linear model, what gives it power, and why is it so widely used? The secret behind this is a powerful mathematical theorem called the universal approximation theorem.
Universal approximation theorem—Given sufficient data and neurons, a neural network with even a single hidden layer can approximate any function to any desired accuracy.
10.3 Neural networks for heat transfer 343
FIGURE 10.10
Figure for Example 10.3.
This theorem is the key to understanding why neural networks are possible. Regardless of whatever heat transfer problem we are approximating, whether laminar or turbulent, as long as we can collect sufficient data, and as long as we are willing to keep adding neurons (and, therefore, adjustable parameters), we can always get a precise “surrogate” neural network model. There are, however, several caveats to this, which we discuss below in Section 10.4.
Let us look at an example with an extremely simple neural network to understand how a typical neural network works.
Example 10.3: Fig. 10.10 shows a simple neural network. For the given data and weights, do a forward pass through the network and calculate the corresponding output yˆ (assuming that the nonlinearity is a sigmoid). The bias units are not shown but must be assumed to be present.
x = 5.0 |
|
|
|
|
w(1) = 1.0 |
w(1) = 0.8 |
w(1) |
= 0.7 |
w(1) = 0.9 |
01 |
02 |
11 |
|
12 |
w(2) = 1.0 |
w(2) = 0.7 |
w(2) = 0.9 |
|
|
01 |
11 |
21 |
|
|
Solution:
For clarity, we draw the network diagram with the bias units and weight explicitly shown in Fig. 10.11. Note that we have split the figure into two to show the weights clearly. Some weights and connections are missing in each figure for viewing clarity.
Both figures omit some portions of the network for clarity of labeling weights.
FIGURE 10.11
Figure for Example 10.3, with the weights and biases shown.
344 CHAPTER 10 Machine learning in heat transfer
Let us now sequentially calculate the outputs of the neurons one by one. In the first (and only) hidden layer,
|
ˆ(1) |
(1) |
(1) |
|
|
||
|
a1 |
|
= g(w01 x0 |
+ w11 x1 ) |
|
|
|
|
|
|
= g(1 × 1 + 0.7 × 0.5) |
(10.20) |
|||
|
|
|
= g(1.35) |
|
|
||
|
|
|
|
|
|
||
|
|
|
= 0.7941 |
|
|
|
|
Similarly, |
|
|
|
|
|
|
|
|
ˆ(1) |
|
(1) |
|
(1) |
|
|
|
a2 |
= g(w02 x0 + w12 x1 ) |
|
|
|||
|
= g(0.8 × 1 + 0.9 × 0.5) |
(10.21) |
|||||
|
= g(1.25) |
|
|
|
|||
|
= 0.7773 |
|
|
|
|
||
In the output layer, |
|
|
|
|
|
|
|
ˆ |
(2) |
|
(2) |
ˆ(1) |
(2) ˆ(1) |
) |
|
y = g(w01 |
x0 + w11 |
a1 |
+ w21 a2 |
|
|||
|
= g(1 × 1 + 0.7 × 0.7941 + 0.9 × 0.7773) |
(10.22) |
|||||
|
= g(2.255) |
|
|
|
|
||
|
= 0.9051 |
|
|
|
|
|
|
The calculation in the above exercise is the essence of a forward pass through any neural network. At the end of this exercise, we hope you understand that forward pass through a network, even if tedious, is extremely straightforward.
We have now discussed everything about neural networks except for the most important thing—“How do we find the optimal weights for a given dataset?” The answer to this was already given implicitly in Step 4 of the learning process given above. To reiterate this, we first initialize the weights to random values. Then, with the given weights, we do a forward pass for every data point and obtain a model prediction yˆ . The error J is then calculated, and finally, we correct the weight through some optimization technique.
An important portion of the weight optimization above is that it usually requires the gradient ∂∂wJ . This gradient calculation is usually the most computationally expensive part of the neural network process and was, in fact, one of the reasons why neural networks were impractical for a long time—finding optimal weights was and is time-consuming. The discovery of an efficient algorithm called backpropagation for gradient calculation and its implementation on modern computational architectures is part of the reason for the modern machine learning resurrection. Backpropagation is, in essence, the chain rule of partial differentiation applied to neural networks. Due to its technical nature, we are skipping it in this chapter. The interested reader can refer to Goodfellow et al. (2016) for further details.
Let us now summarize our understanding of neural networks through the next example.
