**Data-driven soft sensors in pulp refining processes using artificial neural networks**,”

*BioResources*19(1), 1030-1057.

#### Abstract

Pulp refining processes are most often complicated to describe using linear methodologies, and sometimes an artificial neural network (ANN) is a preferable alternative when assimilating non-linear operating data. In this study, an ANN is used to predict pulp properties, such as shives (wide), fiber length, and freeness. Both traditional process variables (external variables) and refining zone variables (internal variables) are necessary to include as model inputs. The estimation of shives (wide) results achieved an R^{2} (coefficient of determination) of 0.9 (0.7) for the training and (validation) sets. Corresponding measures for fiber length and freeness can be questioned using this methodology. It is shown that the maximum temperature in the flat zone can be modeled using the external variables motor load and production instead of the specific energy. This resulted in an R^{2} of approximately 0.9 for the training sets, while the R^{2} for the validation set did not reach an acceptable level – most likely due to inherent non-linearities in the process. Additional results showed that the consistency profile is difficult to estimate properly using an ANN. Instead, a model-driven sensor is preferred to be used. The main results from this study indicate that shives (wide) should be the prime candidate when introducing advanced pulp property control concepts.

Download PDF

#### Full Article

**Data-Driven Soft Sensors in Pulp Refining Processes Using Artificial Neural Networks**

Anders Karlström,^{a} Jan Hill,^{b} and Lars Johansson^{ c}

Pulp refining processes are most often complicated to describe using linear methodologies, and sometimes an artificial neural network (ANN) is a preferable alternative when assimilating non-linear operating data. In this study, an ANN is used to predict pulp properties, such as shives (wide), fiber length, and freeness. Both traditional process variables (external variables) and refining zone variables (internal variables) are necessary to include as model inputs. The estimation of shives (wide) results achieved an R^{2} (coefficient of determination) of 0.9 (0.7) for the training and (validation) sets. Corresponding measures for fiber length and freeness can be questioned using this methodology. It is shown that the maximum temperature in the flat zone can be modeled using the external variables motor load and production instead of the specific energy. This resulted in an R^{2} of approximately 0.9 for the training sets, while the R^{2} for the validation set did not reach an acceptable level – most likely due to inherent non-linearities in the process. Additional results showed that the consistency profile is difficult to estimate properly using an ANN. Instead, a model-driven sensor is preferred to be used. The main results from this study indicate that shives (wide) should be the prime candidate when introducing advanced pulp property control concepts.

*DOI: 10.15376/biores.19.1.1030-1057*

*Keywords: ANN models; Soft sensors; Consistency; Temperature; Pulp property estimation*

*Contact information: a: Chalmers University of Technology, Department of Electrical Engineering, SE-412 96 Gothenburg, Sweden; b: QualTech Jan Hill AB, SE-282 21 Tyringe, Sweden; c: Woodworks Cluster, Skognæringa Kyst, Skolegata 22, Landbruksklyngen bygg P2, NO-7713 Steinkjer, Norway;*

** Corresponding author: lars@woodworkscluster.no*

**INTRODUCTION**

Many systems in the pulp and paper industry can be classified as nonlinear. Some of them are possible to describe using first principle models (FPM). If the nonlinearities are weak, linear models based on gain scheduling can be derived as an alternative. In other situations, the complexity can be almost impossible to describe, which calls for the implementation of complementary modeling techniques. When producing chemi-thermo-mechanical pulp (CTMP), this is certainly true, as the needed input-output variables can be unknown as well as non-linear, noisy, and imprecise (Karlström and Hill 2017a,b,c; Karlström *et al. *2018a; Karlström and Hill 2018b; Sund *et al.* 2021).

To approach the challenge, a hybrid concept of both model-driven and data-driven soft sensors is used in this paper. These two soft sensors are well described by Bohlin (2006) and Kadlec *et al.* (2008), and a short overview is given below of pulp refining processes. The traditional process variables are most often inputs to a model-driven sensor. Most often, the model-driven soft sensors are based on FPM, extended Kalman filter, and adaptive observers (Bastin and Dochain 1990; Chruy 1997; Jos de Assis and Filho 2000).

In refining processes, the physical models (FPM) are of special interest, as non-linear states can be derived (Miles and May 1990, 1991; Harkonen *et al.* 1999, 2003; Huhtanen 2004; Eriksson 2005; Karlström *et al.* 2008; Eriksson 2009). However, such models are often not suitable for online applications.

Through measuring the temperature profile inside the refining zone, Karlström and Eriksson (2014a) introduced a simplified entropy modeling concept to cope with defibration/ fibrillation in three-phase systems. This model allowed for online estimation of internal variables, such as consistency profile, backward and forward flowing steam (the direction of the steam is dependent on where to find the maximum temperature (Karlström and Eriksson 2014b, 2014c, 2014d)), shear forces between chips/pulp and refining segments, *etc.* Thereby, together with the traditional process variables (such as motor load, production (also called throughput), and dilution water feed rates (also called external variables)), the number of process variables expanded considerably, which also allowed for a broader use of data-driven soft sensors where for instance parametric black-box models have been used successfully in several processes (Ljung 1987; Principe *et al.* 2000; Wold *et al.* 2001; Jolliffe 2002). However, parametric models are often based on linear methodology, requiring updated routines of the model parameters to cover an entire process operation window that also changes over time (Berg and Karlström 2005; Eriksson 2005; Eriksson 2009; Karlström *et al.* 2015b, 2016a,b). This limits the use of such models (Karlström *et al. *2022). Therefore, data-driven models based on artificial neural networks (ANN), with multiple layers of neurons, can be an alternative to handle inherent non-linearities (Demuth and Beale 2004; Rajesh and Ray 2006).

In this paper, the authors focused on ANN-models based on a gradient- or Jacobian-based Levenberg-Marquardt method (LM) (Levenberg 1944; Marquardt 1963). This algorithm was used to solve non-linear least square problems. Other backpropagation methods can be used as well (Hagan and Menhaj 1994; Björck 1996; Hagan *et al. *1996; Kermani *et al.* 2005; Bonnans *et al.* 2006), but LM has proven to be both robust and relatively fast, which is favorable when handling large industrial data sets of information.

In this paper, the authors concentrated the analysis on the estimation of a) the maximum temperatures in the flat zone (FZ) and the conical zone (CD); b) consistencies out from FZ and CD; and c) the pulp properties shives (wide), fiber length, and freeness. It is worth mentioning that the measured shives can be wide or long, where the term “wide” refers to the thickness of the fibers. The term “long” refers to the length of the fibers on average. Here, the focus is on the measured shives (wide), as the fiber length is also being used as a measure.

**EXPERIMENTAL**

**Materials and Methods**

This section covers an introduction to special process conditions in a full-scale CTMP production line followed by information about the ANN-structure used. Some of all possible model input candidates are thereafter penetrated in more detail. The output candidates will be temperatures and consistencies in the refining zones and pulp properties with specific criteria for the selected training and validation sets.

**Fundamentals**

In this study, the production line consisted of an RGP82CD (with a capacity of about 36 T/h and a normal operation of 16 to 20 MW). Three small 6 MW refiners (RLP54) ran in parallel but the focus was the CD-refiner.

The CD-refiner, consists of two serially linked refining zones called the flat zone (FZ) and the conical zone (CD), see Fig. 1. In each refining zone, refining segments were mounted on a stationary and a rotating disc. The chips were fed into the FZ refining zone and moved towards the periphery of the CD by centrifugal forces.

**Fig. 1.** A schematic drawing of a CD refiner. The vertical FZ is directly linked to the CD *via* an expanding point. Distance between the stationary and rotating segments (plate gaps) is represented by Δ* _{CD}* and Δ

*For clarification, the pulp is introduced from below in the figure,*

_{FZ}.*i.e.*at the label “Δ

*”.*

_{FZ}The segments in Fig. 1 have a specific pattern with several bars to defibrate chips and to fibrillate the pulp along the way from the centrum (Fig. 2a) to the periphery of the refiner (Fig. 2b).

**Fig. 2.** Sensor arrays with eight temperature measurements each, mounted along the radius and between refining segments in a CD82-refiner, see Fig. 1.

It is a real challenge to describe the final pulp quality (Ferritsius *et al.* 2018; Ferritsius 2021). As Fig. 2 shows, the temperature profiles were measured using sensor arrays mounted between two segments in each refining zone. Eight temperature sensors were used in each zone. Thereby, the temperature profile can be seen as an internal variable vector that is measured together with traditional process variables (external variables) such as production rate, dilution water flows, motor load, and distance between the refining segments.

Through using the temperature profile, it is possible to get access to other important internal variables utilizing a non-linear physical model derived by Karlström and Eriksson (2014a,b,c,d). This model opens for fast dynamic follow-up (ranging from 0.5 to 1 s) of *e.g.*, consistency profiles (Karlström and Hill 2017a,b,c). In Fig. 3, a snapshot of the consistency profile is given. It is essential to control the shape of the profile by manipulating the added dilution water to the refining zones. Thereby, the process inside the refining zones becomes observable and controllable, which is essential when controlling both the flat zone and the conical zone simultaneously (Karlström *et al. *2018b).

**Fig. 3.** Consistency profile for one sample. The outlet consistency in the blow-line (last sample) has been included. Note that the drastic change in the conical zone is a consequence of the position at which the dilution water is added.

Other important profiles are related to the fiber shear force, steam balance, and the fiber residence time in FZ and CD, which affect the final pulp properties as well. However, the pulp properties are still a challenge to derive only by using physical models, which calls for additional efforts such as the use of empirical models (Karlström *et al.* 2022).

The non-linear physical model, which can be classified as a model-driven soft sensor, is preferably combined with empirical models. These models are often classified as data-driven soft sensors according to the Grey-box terminology presented by Bohlin (2006), as shown in Fig. 4.

The output data of interest in this article is,

where the internal variables can be derived from the model-driven soft sensor, , while comes from the data-driven soft sensor (Karlström *et al.* 2022).

**Fig. 4.** The concept includes a non-linear model based on first principles (model-driven soft sensor) and the data-driven soft sensor, which can be *e.g.* pulp and handsheet models. The idea is to use the outputs for control purposes in *TCtrl:* Temperature control, *CCtrl:* Consistency control, *ECtrl:* Specific energy control and *QCtrl:* Quality control.

**ANN Models **

In this paper, the MATLAB Neural Network Toolbox is referenced for basic information about the ANN concept. The overview is written by Demuth and Beale (2004) and describes how the ANN can be trained with a backpropagation algorithm based on the Levenberg-Marquardt algorithm (Levenberg 1944; Marquardt 1963). As such, the network has a good characteristic for solving non-linear multi-dimensional mapping problems, given consistent data and enough neurons in its hidden layer (Hagan and Menhaj 1994; Björck 1996; Kermani *et al.* 2005; Rajesh and Ray 2006).

The hidden layers in the authors’ ANN were based on a hyperbolic tangent sigmoid transfer function. To prevent overfitting, five layers were used as an initial setting, although it is possible to expand the size dependent on the size of the input matrix. The hidden neurons for each layer was set to ten according to the MATLAB default setting. In this paper, the focus will not be on the ANN-methodology. Instead, the Levenberg-Marquardt algorithm is used for all cases as it is fast and robust when it comes to the non-linear least squares curve-fitting (Gavin 2020).

**Input Candidates to the Data-Driven Soft Sensor**

As indicated in Fig 4, the model-driven soft sensor provides several candidates that can be used as additional inputs to a data-driven soft sensor.

Examples, such as temperature profiles (> 20 variables); consistency profiles (> 20 variables); backward and forward flowing steam (>20 variables); water content profiles (> 20 variables); forces on bars (> 20 variables); and defibration and thermodynamical work in each refining zone (> 4 variables), are just a few internal input candidates. In this paper, the internal variables were represented by the matrix in Fig. 4. For a more comprehensive overview of other possible candidates, see Karlström and Eriksson (2014a,b,c,d). Note that the reason for using both consistency and steam as inputs is a consequence of the fact that there were both backward and forward flowing steam in the refining zone.

It is also natural to combine such inputs with traditional external variables like specific energy, dilution water added to FZ and CD, as well as the distance between the segments (also called plate gaps), as visualized in Fig. 1.

The external input candidates are quite reliable as measures and constitute the matrix* X _{ext}*, which means that the complete set of independent variables used in the data-driven soft sensor can be represented by Eq. 2:

From a practical perspective, it is impossible to introduce all inputs from *X _{extended}* in the ANN modeling procedure. However, some candidates are more relevant than others (Karlström and Hill (2017a,b,c); Bengtsson

*et al.*(2019, 2020, 2021)) see below:

External variables (*X _{ext}*)

- Specific energy (SpE) or the split into motor load (Mload) and production (Prod);
- Dilution water to FZ and CD (DilwFZ, DilwCD);
- Distance between segments in FZ and CD (GapFZ, GapCD);
- The amount of sawmill chips (Sawmill), which affect
*e.g.*, the inlet consistency to FZ.

Internal variables

Backward flowing steam in FZ (primarily two selected close to the maximum temperature; Steam5, Steam8. The figures correspond to the position of the temperature measurements in Fig. 2 (counted from the bottom of the pictures);

- Consistency out from FZ as well as the consistency near the periphery in CD, (CFZ, CCD);
- Fiber residence time in FZ and CD (RFZ, RCD);
- Temperature profile with a focus on maximum temperature in FZ and CD (TmaxFZ and TmaxCD) and periphery temperature in CD (TCDper).

To handle outliers in the input/output data sets, pre-processing was performed to reject data from process stops and traditional process failures. This is handled by removing all measurements larger than 3 standard deviations from the mean. Other failures, like malfunctions in measurement devices, will be rejected as well.

Another obstacle to handle is the fact that the internal and external variables as inputs can introduce (larger or smaller) process-related dependencies. To analyze such situations and detect multicollinearities the Variance Inflation Factors (VIF) are worth studying.

In Eq. 3, represents the coefficient of determination obtained by regressing the *k*^{th} inputs on the remaining inputs, *i.e.*, *VIF* quantifies how much the variance is inflated, (Karlström *et al.* 2022). At this stage, it is not known whether the dependencies are linear or non-linear but if a standalone model is considered, handling the collinearities is a good initiative to analyze input data in the modeling procedure.

A *VIF _{k}* = 1 means that there is no linear correlation between the

*k*input and the other remaining inputs. If

^{th}*VIF*> 4, a general rule is that further analysis should be performed, while

_{k}*VIF*> 10 indicates serious multicollinearities, which may call for further analysis and perhaps a modified set of inputs (Belsley

_{k}*et al.*1980). In general, if only external variables are used, then the collinearities are not severe, which will be shown below. However, when using internal variables, the collinearities occur naturally, meaning that several aspects must be considered. For example, if the aim is to derive “standalone” estimators where the contribution from each independent variable is analyzed, the collinearities in

*X*become cumbersome for all data-driven soft sensors. In contrast, if the aim is to find a data-driven soft sensor that can use information from the model-driven soft sensor, we can accept that the

_{extended}*VIF*between the internal variables can exceed the rule of thumb if the data-driven soft sensor is not used as a standalone estimator.

This paper analyzes whether the external variables in *X _{ext}* will do fine as model inputs or if additional information from the internal variables is needed when estimating the outputs in Eq. 1.

**Estimation of Internal Variables in Data-Driven Soft Sensors**

As declared above, some of the internal variables in Eq. 1, can be worth studying as output candidates as well. This is especially true if the aim is to develop simulation tools for pulp refining processes.

A natural set of internal variables to study in more detail will include the maximum temperatures and the consistencies in FZ and CD. These variables are central when it comes to simulation tools as well as advanced control concepts (Karlström and Hill 2018b). In this paper, the following internal variables are of special interest:

To minimize the risk for input collinearities in the model, the focus here is placed on the external variables in *X _{ext}* as inputs when estimating , while

*X*in Eq. 2 can be needed when estimating . The latter statement will be discussed further below.

_{extended}Normally, the operators use the specific energy as a measure of how the defibration/fibrillation of the chips/fibers is performed. This is, however, a rather rough approach, as the specific energy is dependent on both the production and the motor load *i.e.*, the force distribution related to the steam and fiber-to-bar interaction along the radius of the refining segments (Karlström and Eriksson 2014a,b,c,d). Therefore, there is motivation to analyze both specific energy as well as the split between motor load and production as model inputs when estimating internal variables in Eq. 4.

**Pulp Property Estimation in Data-Driven Soft Sensors **

As shown in Fig. 5, small data sets can be sampled before the latency chest, but such sampling procedures are difficult to perform on a regular basis as they are based on tedious laboratory-based measurement procedures. The pulp samples become a set of under-sampled variables, which must be expanded into an oversampled data set of traditional process variables (normally sampled with a fast-sampling rate) to obtain a common time frame (Karlström *et al.* 2016a,b). This undermines the possibility of obtaining enough data to cover the entire operating window over the segment lifetime.

To obtain enough data for modeling purposes, large data sets spread over a long period must be available from measurement devices. For practical reasons it is common that such measurement devices are positioned after the latency chest, see Fig. 5.

This paper focuses on the estimation of the commonly used pulp properties,

where the subscripts “*Shives, FL*, and *Freeness*” correspond to shives (wide), fiber length, and freeness. Here freeness is the total volume of water discharged from a side orifice of a specific configuration while the pulp suspension drains freely under gravity.

**Fig. 5.** Schematic drawing of the sampling points for blow-line samples before the latency chest and the pulp property measurements after the latency chest

** Fig. 6.** Measured and filtered pulp properties (a) shives (wide) and b) fiber length) according to a moving average filter based on a horizon of 4000 samples. The sampling rate was 6 s.

Karlström *et al.* (2015a,b, 2022) highlighted several obstacles to overcome when preparing data for modeling purposes. Some of them are related to the absence of relevant signals for long periods (sometimes longer than 3 h). Other complications are associated to the position of the device after the latency chest that introduces a long time constant (20 min). They also indicated that sometimes mixing problems in the chest can be a problem. This makes it hard to guarantee that pulp property variations, caused by changes in the refining conditions, are possible to derive properly.

Finally, it is worth mentioning that the pulp property measurement device often has a non-equidistant sampling rate with a varying interval of 15 to 20 min, which needs to be taken into consideration as well (Sund *et al.* 2021).

In Fig. 6, two of the pulp properties in* Y _{PULP}* (see Fig. 4) are included, and it is obvious that the noisy signals must be properly analyzed before using them as outputs. Here, filtering of the signals is most likely required, which can be performed in many ways. In this paper, a moving average filter was used (a low pass FIR filter as seen in Fig. 6), to handle the noisy data. A set of filters [2500, 4000, 8000, 10000] were analyzed. However, the main analysis was performed using 4000 samples as a reference, which corresponds to a time horizon of about 7 h.

**Criteria for the Modeling Procedure**

As the Levenberg-Marquardt algorithm can end up finding a local minimum, which is not necessarily the global minimum, it was not expected to get one optimal model. Instead, 200 iterations were run and the best models fulfilling specific criteria were selected for training (*train*) and validation (*val*) sets. Thereby, an ensemble of models constituted the basis when selecting the best structure.

The training of data was primarily performed based on 4200 h (70%) of the time series in Fig. 6, while the rest, 1800 h, was allocated to the validation of the models. Additional information concerning the potential input/output data can be found in Appendix A.

To find a good combination of input candidates, the coefficient of determination was employed, as given as Eq. 6,

where represents the sum of the squared residuals from the regression, while denotes the sum of the squared differences from the mean of the dependent variable for *n* observations (Draper and Smith 1998).

It is expected for the data quality to be relatively unreliable; thus, the thresholds were set to:

The criteria for the validation sets were deliberately set to lower values compared with the training sets.

When it comes to the estimation of the internal variables, the non-linearity effects in both temperature profile and the motor-driven soft sensor (which estimates *CFZ* and *CCD*) can be considerable. This motivates an R^{2} = 0.2 for the validation set, while a slightly higher value is expected (0.5) when analyzing the pulp properties.

Because the criteria in Eq. 7 was used as a first requirement to fulfill the thresholds, a selection procedure was introduced based on the weights of the obtained models in each ensemble *i = 1,..,n* to find a proper estimate. This was performed by using the obtained R^{2}*–*vector according to *j =* 1; the training set, *j = *2; the validation set or *j = *3; and the hypotenuse of the two, which is given as Eq. 8:

**RESULTS AND DISCUSSION**

This section is divided in two subsections covering the results obtained when applying the approach described above to estimate the internal variables in Eq. 4 and the pulp properties in Eq. 5. Therefore, the set of independent input candidates interesting to use will differ from each other. Some of the most important input candidates are given as time series in Appendix A, to “assure visually” that the excitation levels in the inputs are large enough to contribute to the expected changes in the selected outputs.

**Estimation of Internal Variables in Data-Driven Soft Sensors**

When the internal variables in FZ are analyzed, the external variables will be in focus as inputs, while a mix of external and internal variables will be used when analyzing the internal variables in CD as well as the pulp properties. This is motivated from a causality perspective, as all internal variables in FZ are a consequence of the external variables, while both the external and the internal variables in FZ might affect the predictions in the CD-zone, but not *vice versa*.

To analyze the internal variables in in Eq. 4, the goal is to find an appropriate combination of inputs. According to the combinations in Table 1, the inputs describing FZ, will be based on external variables such as specific energy, motor load, production, the dilution water added, and the amount of sawmill chips to the refiner. As can be seen in Table 1, it is obvious from a *VIF*-perspective that Case B through Case E provides good candidates to the input matrix *X _{ext}*, while Case A has been included only to show the colinear effects.

**Table 1. **Variance Inflated Factors for the Different Model Inputs

Notes: SpE = Specific energy; Mload = Motor load, Prod = Production, Sawmill = Percentage of sawmill chips to refiner, DilwFZ = dilution water to FZ, GapFZ = Distance between refining segments in FZ

**The Maximum Temperature in FZ ( T_{maxFZ})**

As shown in Fig. 7, which shows the R^{2} for the validation data *versus* the R^{2} for the training data, the ANN cannot provide good models when using the input data from Case B.

**Fig. 7.** Estimated R^{2} for *T _{max}*FZ using the approach based on Case B in Table 1; Dashed lines: Black is required area and red is accepted according to Eq. 7

If the motor load and the production (Case C) are used as inputs instead of the specific energy, much better fits are obtained, as shown in Fig. 8. It is also shown that a slightly better estimation will be obtained if the dilution water to FZ (Case D) is included as an input. However, the R^{2} value for the validation set was not increased, and the question is whether the response in time domain was short enough.

**Fig. 8.** Estimated *R ^{2}* for TmaxFZ using the approach based on a) Case C and b) Case D in Table 1; Dashed lines represent the accepted criteria in Eq. 7

Through studying the ensemble of models in the time domain, the major dynamics seem to have been covered, as shown in Fig. 9a. This is of course promising, and using the criterion *j = 1* in Eq. 8, the response became even more pronounced (at this stage we are not comparing all criteria in Eq. 8, this will be done later in the text), see Fig. 9b. Hence, when estimating *T*_{maxFZ}, the most important result so far is that the ANN provided the best estimates if the motor load and production were included as inputs, instead of using the specific energy as a model input.

**Fig. 9.** Filtered and estimated *T*_{maxFZ} for Case D in Table 1: a: ensemble of models related to Fig. 8; b: Response using the weighted training set (*j = 1 *in Eq. 8)

To complete the discussion about possible input combinations from Table 1*,* Case E in Fig. 10 was also studied. As can be seen, the use of the inputs from Case E provided slightly better models when adding the distance between the segments as an input. This is good news and motivates a proposal to use a set up according to Case E as a reference when estimating the maximum temperature in FZ.

To verify the robustness in the results in Fig. 10, another 200 iterations were also included to find complementary models (Set 2). As shown, the cluster of R^{2} will not be improved outside the original region. Instead, the ensemble is located almost in the same interval as the original set (Set 1). This indicates, to some extent, a robustness in the used methodology.

The R^{2} value from the validation set was, however, surprisingly low, and this undermines the use of the models derived although the time responses seem to be acceptable.

**Fig. 10.** Estimated *R ^{2}* for TmaxFZ using the approach based on two ensemble sets for Case E in Table 1

**The Consistency Modeling in FZ (CFZ) **

When it comes to the consistency modeling in the flat zone, none of the input combinations in Table 1 provided acceptable models. This could be perceived as a surprise, but the complex process conditions inside the refining zone are difficult to capture using only the external variables.

As an example, neither the external variables cover phenomena, such as the split between backflowing and forward flowing steam, nor the variable distributed shear forces, where inherent physical conditions must be considered. Hence, although the input matrix was expanded by including the maximum temperature in FZ as well, a good estimation of the consistency was not obtained.

**The Maximum Temperature in CD ( T_{maxCD}) **

Compared with the modeling of the maximum temperature in FZ (Case E in Table 1), the models for the first temperature measurement in CD (see Fig. 1 (right figure)) need an even more complex input structure.

It is natural to expand the input structure in Case E by introducing the dilution water feed rate together with the distance between the refining segment in CD, but this was not enough, as no models fulfilled the criteria in Eq. 7.

Without going through all the details regarding different combinations and their inability to create good models, it can be concluded that the use of the maximum temperature in FZ as an additional input seems to be important to achieve acceptance according to the criteria in Eq. 7. This indicates that external variables as model inputs are not enough when estimating the maximum temperature in CD. This, of course, is a drawback from an empirical modeling perspective, as temperature measurements and most likely the model-driven soft sensor must be used when estimating internal variables in the conical zone.

Hence, when Case E in Table 1 is expanded by adding the dilution water and plate gap in CD together with the maximum temperature in FZ as inputs, a good estimate of TmaxCD can be provided, as shown in Fig. 11. This is a bit of a surprise, as the position for the maximum temperature in FZ is close to the periphery of the segment and thereby close to the maximum temperature in CD.

However, the correlation coefficient between them was only 0.5. The covariation was thereby unexpectedly small, which can be seen by comparing Fig. 9 and Fig. 11 as well.

**Fig. 11.** Filtered and estimated *T*_{maxCD} for Case E in Table 1 added with the dilution water feedrate to CD (DilwCD); the plate gap in CD (GapCD); and the maximum temperature in FZ (*T*_{maxFZ}). Figure a) represents the ensemble models. Figure b) represents the weighted training set (*j = 1 *in Eq. 8).

**The Consistency Modeling in CD (CCD)**

When it comes to the consistency modeling in the conical zone, none of the input combinations mentioned above could be used, which indicates that strong process non-linearities were present.

In summary, the maximum temperature in FZ can be estimated with quite good accuracy using external variables as inputs, whereas the model-driven soft sensor becomes vital when estimating other internal variables.

**Estimation of Pulp Property in Data-Driven Soft Sensors **

When it comes to the modeling of the pulp properties in , we have access to all internal variables from the model-driven soft sensor as well as traditional external variables. That is, the entire palette of possible inputs to the ANN models.

Here, the specific energy is used instead of the motor load and the production when estimating pulp properties. This is because pulp properties are dependent on all process conditions represented by both external and internal variables.

Because we are not using the data-driven sensor as a standalone model in this case, a strict methodology regarding the VIF does not need to be followed. However, it is still interesting to see how the collinearities are reflected by the VIF in Table 2.

**Table 2. **Variance Inflated Factors for the Different Model Inputs

*SpE=Specific energy, Gap=Distance between refining segments in FZ and CD, CFZ and CCD=Consistency in FZ and CD, RFZ and RCD= Fiber residence time in FZ and CD, Steam5 and Steam8=Backward flowing steam in 5 ^{th} and 8^{th} position in FZ, T_{maxFZ}= Maximum temperature in FZ, TCDper=Periphery temperature in CD. Sawmill=Percentage of sawmill chips to refiner*

**Shives (Wide)**

When estimating shives (wide) using the inputs based on Case 1 in Table 2, the criteria for pulp properties in Eq. 7 will be fulfilled for some of the models, as shown in Fig. 12 a. It is also worth mentioning that the use of just external variables in Case 2 will not result in any acceptable models even though the specific energy is split into motor load and production as inputs.

It is notable to observe that when adding the sawmill chip content as an input (Case 4), one model represented by the green dot in Fig. 12a will meet the criteria in Eq. 7*. *Therefore, also in this case it is indicated that the sawmill chip content is vital to include as an input in all models.

When reducing the model complexity of Case 1 in Fig. 12, it can be seen that Case 5 through Case 7 seem to provide similar performance as the ensemble of models in Case 1. A closer look at the plots in Fig. 12, shows that the best fits did not exceed an R^{2 }> 0.88 for the training sets. Through using the procedure described in Eq. 8, the cases can be further compared. As can be seen in Table 3, the training set of Case 1 outperformed Case 5 through Case 7, even though the number of models was smaller (15 against 24). This is, to some extent, expected, as Case 1 can handle more input combinations and thereby include more process non-linearities.

**Fig. 12.** Estimated R^{2} for shives (wide). Figure a)– Case 1 (All input candidates included; Red accepted), Case 2 (External inputs only; No models accepted) and Case 4 (External inputs including the sawmill chips content- Green accepted). For the figure b) Case 5 through figure part c) Case 6 and part d) – Case 7, see Table 2 for corresponding input combinations.

**Table 3.** R^{2} for the Training and the Validation Sets Based on a Normalized Weight on the Ensemble of Models from Case 1, and Case 5 Through Case 7