Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

An efficient Bayesian experimental calibration of dynamic thermal models

An efficient Bayesian experimental calibration of dynamic thermal models Experimental calibration of dynamic thermal models is required for model predictive control and characterization of building energy performance. In these applications, the uncertainty assessment of the parameter estimates is decisive; this is why a Bayesian calibration procedure (selection, calibration and validation) is presented. The calibration is based on an improved Metropolis-Hastings algorithm suitable for linear and Gaussian state-space models. The procedure, illustrated on a real house experiment, shows that the algorithm is more robust to initial conditions than a maximum likelihood optimization with a quasi-Newton algorithm. Furthermore, when the data are not informative enough, the use of prior distributions helps to regularize the problem. Keywords: Bayesian calibration, model selection and validation, dynamic thermal models, real house experiment, Metropolis-Hastings algorithm, robust gradient and Hessian computation, change of variables, prior distribution selection, identifiability Nomenclature Notations 𝑥 ,𝑦 ,𝑧 Scalars Vectors 𝐱 ,𝐲 ,𝐳 Matrices 𝐀 ,𝐁 ,𝐂 ℝ Space of dimension 𝑞 Notational conventions Matrix transpose −1 Matrix inverse −1 −𝟏 /𝟐 1/2 ( ) −T/𝟐 −1/2 ( ) ( ) det 𝐀 Determinant of the matrix 𝐀 tr(𝐀 ) Trace of the matrix 𝐀 𝐱 ̇ Time derivative of vector 𝐱 𝜕 𝐱 ⁄𝜕 𝜃 Partial derivative of 𝐱 with respect to 𝜃 diag(𝑎 ,𝑎 ,… ,𝑎 ) 1 2 𝑁 Diagonal matrix with diagonal values 𝑎 ,𝑎 ,… ,𝑎 1 2 𝑁 𝔼 [∙] Expected value 𝑝 (𝐱 ) Probability density function (pdf) of a random variable 𝐱 𝑝 (𝐱 |𝐲 ) Conditional pdf of vector 𝐱 given vector 𝐲 𝐱 ~ 𝑝 (𝐱 ) Random variable 𝐱 with probability distribution 𝑝 (𝐱 ) Proportional Approximately equal 1:𝑁 Set of values 𝐱 = [𝐱 ,𝐱 ,…,𝐱 ] 1 2 𝑁 1. Introduction The existing methods for characterizing building energy performance and energy saving provided by retrofitting are not relevant (Turner & Frankel 2008, De Wilde 2014). The energy performance estimation of buildings and associated systems must be independent of weather conditions and user behavior. From this assessment, the Efficiency Valuation Organization has developed the International Performance Measurement and Verification Protocol (IPMVP) (EVO 2014). The idea is to construct a physical model which characterizes the building intrinsic thermal dynamic and relate inputs to outputs measured on-site. Hence, the gap between the energy use given by the pre-retrofit and post-retrofit models represents the energy gained by the refurbishment. Minimizing heat losses from buildings is the most obvious solution to reduce the heating and cooling demand but the efficiency and sustainability of the energy chain, from the production to the HVAC systems , must also be improved. Nowadays, the dominant paradigm is that the energy sources need to respond to all requests at any moment. The complexity of this strategy will be augmented with the increasing of the share of renewable energy sources in the energy mix. Therefore, supply and demand must become more flexible by using demand response mechanisms and energy storage (European Commission 2016). In order to adapt the demand to the production, the energy demand must be known. Physical models characterizing the thermal dynamic of buildings associated with model predictive control can be used to forecast the energy demand while maintaining indoor comfort (Hazyuk et al. 2012a, Hazyuk et al. 2012b, Ghiaus & Hazyuk 2010). Two important societal needs are identified, the estimation of building energy demand and the estimation of energy savings brought by energy conservation measures. These two societal needs have the same scientific deadlock: the experimental estimation of the physical parameters of the dynamic thermal behavior of buildings. Such models can be obtained considering the energy balance between buildings and their surroundings and energy balance in buildings can be modelled by using thermal networks (Naveros 2016, Ghiaus 2013). Stochastic state-space models are obtained by first transforming thermal networks in deterministic state-space, and then noise terms are added to represent the deviations between the differential algebraic equations and the true variations of the states. Stochastic state-space models relate inputs to outputs, where the dynamic of the states is given by the parameters; hence by knowing inputs and parameters, the output of the system can be simulated. However, the direct problem requires to know the parameters, so the inverse problem of parameter estimation must be solved first. The interest in parameter estimation for dynamic thermal models is not new (Nielsen & Nielsen 1984) and experiments of various scales have been used to test the validity of different approaches (Bloem 1994, Baker & van Dijk 2008, Jiménez 2014). It is essential when making prediction or decision from an identified model to assess the uncertainties in the parameter estimates; it should be done by taking all the information available. From this perspective, Bayesian estimation gets more and more consideration, for instance, in quantification of energy saving from retrofit (Heo et al. 2012, Heo et al. 2015, Tian et al. 2016 and Li et al. 2016), in calibration of energy models (Chong & Lam 2015, Chong & Poh Lam 2017, Zayane 2011), in estimation of thermal characteristic of a wall (Berger et al. 2016) and in estimation of heating consumption (Kristensen et al. 2017). Bayesian methods, such as the Metropolis-Hastings (MH), are usually employed for low-dimensional problems because by increasing the number of parameters it becomes more and more difficult to tune properly the algorithms. The paper compare the three phases of an experimental model identification (selection, calibration and validation) from a Bayesian and frequentist point of view. More precisely, the paper treats the problem of parameter estimation of dynamic thermal models when the model structure is known. First, the choice of a model structure is discussed and then an implementation of the second-order Metropolis-Hastings for linear and Gaussian state-space models is proposed. Tools and guidelines for tuning and diagnosing the algorithm are also presented. Next, different criteria are presented to assess the performance of models and to guide the selection of a model structure in agreement with data. The whole procedure is tested on a real test case where the differences between the Bayesian and the frequentist approach are illustrated. 2. Twin houses experiment Twin houses are a real outdoor experiment conducted by the Fraunhofer Institute near Munich during April and May 2014. It is an unoccupied single family house (twin house O5) with 100 m ground floor, a cellar and an attic space; a full description of the house and the experiment is given by Strachan et al. (2016). The experiment was designed such that the south zone (Figure 1) has only two boundary conditions: the external temperature and the adjacent spaces (cellar, attic, north zone). The adjacent spaces were held at 22 °C with blinds closed to reduce the chance of overheating and the doors separating the north and south zone were sealed off. The electric heaters on the south zone were synchronized on a Randomly Ordered Logarithmic Binary Sequence (ROLBS) to maintain a similar temperature in the different rooms (800 W in the living room, 500 W in the south bedroom and 500 W in the bathroom). The ROLBS signal was designed for three reasons: maximize the temperature difference with the boundary conditions, excite the range of time constant from 1 hour to 90 hours and decorrelate the heating signal with the solar radiation. The heaters are lightweight, with a time response estimated around 1 or 2 minutes by the Fraunhofer Institute (Strachan et al. 2016) and a split coefficient between convective and radiative heat gains of 70⁄30 %. Mechanical ventilation was set to supply a volume flow rate of 60 m /h into the living room and extract 30 m /h in the bathroom and the south bedroom. Figure 1: Layout of the twin house O5 The stratification of the air in each room of the south zone is measured with temperature sensors at 10 cm, 110 cm and 170 cm from the ground. This level of accuracy is not required to fit a simple dynamic thermal model; therefore the temperature of the rooms are chosen as the average of the three heights and the south zone temperature 𝑇 (°C) is a weighted average of the spaces (living room, south bedroom, corridor and bathroom) by their respective surfaces. The boundary temperature 𝑇 (°C) is also a weighted average of the temperatures in the different spaces (kitchen, lobby and north bedroom). A weather station near the house provides the outside air ( ) ( ) temperature 𝑇 °C and the global solar irradiance measured on a horizontal surface 𝑄 W/m . The heating in 𝑜 𝑔 ℎ the south zone is done by three electric heaters; therefore 𝑄 (W) represents the total heat input injected. Buildings are dynamic systems which may be modelled by using thermal networks from where state-space models can be deduced (Ghiaus 2013). This procedure is used to find a suitable model of the twin house where the south zone (in green on Figure 1) is considered as the main thermal zone and the north zone as an adjacent space. 3. Experimental calibration process The experimental calibration process is decomposed in 3 phases: selection, calibration and validation (Ljung 2002). First, a set of likely model structures characterized by unknown physical parameter vector 𝛉 is chosen based on a-priori knowledge (physics, experiment details, etc). Then, the calibration assesses how these model structures relate to observed data and to physical considerations. The calibration consists of finding a set of parameters which best represents the input-output relationship of a model through observed data, 𝐮 = [𝐮 ,𝐮 ,…,𝐮 ] and 𝐲 = 1:𝑘 1 2 𝑘 1:𝑘 [𝐲 ,𝐲 ,…,𝐲 ]. Finally, the performances of the calibrated models are evaluated to select the model which is the 1 2 𝑘 best suited for its intended use. The experimental calibration process is usually treated either from a frequentist or a Bayesian point of view. In Bayesian estimation, the unknown parameters are treated as random variables with a certain prior distribution 𝑝 (𝛉 ), which represents the prior belief before looking at the data (Dahlin 2016). Then, all the information available in the data is summarized in the likelihood function 𝑝 (𝐲 |𝛉 ). The prior belief and the data 1:𝑘 information are combined in the Bayes’ theorem to compute the posterior distribution: 𝑝 (𝐲 |𝛉 )𝑝 (𝛉 ) 1:𝑘 ( | ) ( | ) ( ) (1) 𝑝 𝛉 𝐲 = ∝ 𝑝 𝐲 𝛉 𝑝 𝛉 1:𝑘 1:𝑘 𝑝 (𝐲 ) 1:𝑘 ( ) where 𝑝 𝐲 is a normalization constant independent of the parameters. 1:𝑘 The posterior distribution 𝑝 (𝛉 |𝐲 ) contains all the statistical information about 𝛉 ; the most probable value of the 1:𝑘 posterior distribution gives the maximum a posteriori (MAP) estimate: ( ) 𝛉 = argmax(𝑝 (𝐲 |𝛉 )𝑝 𝛉 ) (2) 𝐀𝐏𝐌 1:𝑘 ( | ) If only the information in the data is considered, maximizing the likelihood function 𝑝 𝐲 𝛉 gives the maximum 1:𝑘 likelihood (ML) estimate: 𝛉 = argmax(𝑝 (𝐲 |𝛉 )) (3) 1:𝑘 The ML estimate can be seen as a MAP estimate with uniform prior distribution, 𝑝 (𝛉 )∝ 1 (Sarkka 2013). Two philosophies exist for computing these estimates and their uncertainties. The frequentist approach relies on the fact that, as the number of observations increases, the influence of the prior distribution becomes negligible compared to the likelihood and the posterior distribution can be approximated by a Gaussian distribution (Gelman, Carlin, et al. 2014). ML estimation is popular because it requires only point estimates of the posterior modes and their corresponding uncertainties are determined by asymptotic properties. ML estimates are usually found by optimization routines as in the CTSM-R package (CTSM-R Development Team 2015). This strategy has been proven to be efficient at numerous cases (Naveros et al. 2014, Himpe & Janssens 2015, Nespoli et al. 2015, 𝐌𝐋 Andersen et al. 2014, Bacher & Madsen 2011, Váňa et al. 2013). However, the asymptotic theory does not hold for small number of observations, which is often the case in real experiment. From a Bayesian point of view, all the statistical information is summarized in the posterior distribution, thus no assumption is made. Bayesian and frequentist methods are compared in the three phases of the calibration process in order to illustrate the differences and to take advantage of both methods. 3.1. Model selection The choice of an appropriate model structure is the most crucial part in experimental calibration according to Ljung (2002). The model structure should be representative of the real physical system and also in agreement with the measured data. Based on a-priori knowledge of the experiment, a set of models of increasing complexity is defined which allows a forward selection between nested models, i.e. the simplest model can be recovered by putting extra parameters to zero (Bacher & Madsen 2011). In this paper, the distinction between model selection and model comparison is done because the model comparison requires to discuss about calibration first; therefore it is presented later. To illustrate the Bayesian experimental calibration process only the two most promising model candidates are presented. These two models, illustrated in Figure 2, have been selected from a larger set of models which is not presented here due to lack of space. The smallest thermal network in Figure 2, the 3 states model, ℳ , is obtained by not taking into account the surrounded dotted part whereas the 4 states model, ℳ , is obtained 3 4 by adding this part to ℳ , such that, ℳ ⊂ ℳ . 3 3 4 Figure 2: 3 states model, ℳ and 4 states model, ℳ 3 4 In Error! Reference source not found. the nodes represent: • 𝑥 : the temperature of the building envelope (°C), • 𝑥 : the indoor air temperature (°C), • 𝑥 : the temperature of the medium (internal walls, furniture, etc) (°C), • 𝑥 : the temperature of the sensor (°C). When the node 𝑥 is used, it is selected as the model output and is equivalent to the south zone air temperature measured 𝑇 (°C), otherwise 𝑥 is the model output. 𝑠 𝑖 The different heat transfers are characterized by the thermal resistances: • 𝑅 : between the outside and the middle of the building envelope (K/W), • 𝑅 : between half the building envelope and the south zone (K/W), • 𝑅 : between the south zone and the medium (K/W), • 𝑅 : between the air in the north zone and the south zone (K/W), • 𝑅 : between the south zone and the sensor (K/W). The accumulation of energy is modeled by the thermal capacities: • 𝐶 : for the building envelope (J/K), • 𝐶 : for the air in the south zone (J/K), • 𝐶 : for the medium (internal walls, furniture, etc) (J/K), • 𝐶 : for the sensor. The parameters 𝑎 and 𝑎 are respectively the effective area through which the solar radiation enters the building 𝑊 𝐼 envelope and the effective window area of the building. State-space models can be easily deduced from thermal networks by considering heat balance in each temperature nodes (Ghiaus 2013): 𝐱 ̇ = 𝐀 (𝛉 )𝐱 + 𝐁 (𝛉 )𝐮 (4) 𝐲 = 𝐂 𝐱 𝑘 𝑘 where 𝐀 is the state matrix, 𝐁 the input matrix, 𝐂 the output matrix 𝛉 the parameter vector, 𝐮 the input vector and 𝐲 is an output vector which can be measured at discrete time instant 𝑡 e.g. 𝐱 = 𝐱 (𝑡 ) . ( ) 𝑘 𝑘 𝑘 𝑘 Both models from Figure 2 share the same input vector ̇ ̇ ̇ (5) 𝐮 = 𝑇 𝑇 𝑎 𝑄 𝑎 𝑄 + 𝑄 [ ] 𝑜 𝑧 𝑊 𝑔 ℎ 𝐼 𝑔 ℎ ℎ The data of the twin houses experiment are provided with a constant sampling time ∆ ; therefore, the linear time invariant continuous model (4) is discretized and additive noise terms are introduced to describe the deviation between the discrete system and the true variation of the state 𝐱 = 𝐀 (𝛉 )𝐱 + 𝐁 (𝛉 )(𝐮 + 𝛂 ∆ )− 𝐁 (𝛉 )𝛂 + 𝐰 𝑘 +1 𝐝 𝑘 𝑘 𝑡 𝑘 (6) 𝐲 = 𝐂 𝐱 + 𝐯 𝑘 𝑘 𝑘 where 𝐰 and 𝐯 are white noise processes with respective covariance 𝚺 (𝛉 ) and 𝚺 (𝛉 ), and 𝑘 𝑘 𝐰 𝐯 𝐀 ∆ 𝐀 (𝛉 )= e (6.a) −𝟏 ( ) ( ) (6.b) 𝐁 𝛉 = 𝐀 𝐀 − 𝐈 𝐁 −𝟏 −𝟏 𝐁 (𝛉 )= 𝐀 (−𝐀 (𝐀 − 𝐈 )+ 𝐀 ∆ )𝐁 (6.c) 𝐝 𝐝 𝑡 If the input is assumed constant in the time interval ∆ (zero order hold), then 𝛂 = 𝟎 , whereas if the input is assumed to vary linearly (first order hold), then (Kristensen & Madsen 2003) 𝒖 − 𝒖 𝑘 +1 𝑘 𝛂 = (7) In order to make the notation lighter, the parameter dependence in 𝐀 ,𝐁 ,𝐁 ,𝚺 and 𝚺 is omitted such 𝐝 𝐰 𝐯 that 𝐀 = 𝐀 (𝛉 ). 𝐝 𝐝 Some parameters of the state-space model (6) may not be known and, then, they have to be estimated based on measured data. 3.2. Bayesian calibration with Markov Chain Monte Carlo Markov chain Monte Carlo (MCMC) is a general method for constructing posterior distributions. The main idea is to simulate a Markov chain which has been constructed such that it has the posterior distribution as its stationary distribution (Sarkka 2013). The Metropolis-Hastings (MH) algorithm is the most common type of MCMC method due to its simplicity. MH is an iterative scheme, where a new candidate 𝛉 is suggested from a proposed ∗ 𝑖 −1 𝑖 −1 distribution 𝑞 (𝛉 |𝛉 ) given the previous one 𝛉 . The candidate is then accepted or rejected according to some acceptance probability ∗ ∗ 𝑖 −1 ∗ ( | ) ( ) ( ) 𝑝 𝐲 𝛉 𝑝 𝛉 𝑞 𝛉 |𝛉 1:𝑘 𝛼 = min{1, } (8) 𝑖 −1 ∗ 𝑖 −1 𝑖 −1 𝑝 (𝐲 |𝛉 )𝑝 (𝛉 ) 𝑞 (𝛉 |𝛉 ) 1:𝑘 𝑖 −1 ∗ ∗ 𝑖 −1 where the ratio 𝑞 (𝛉 |𝛉 )⁄𝑞 (𝛉 |𝛉 ) corrects the asymmetry in the proposed distribution. 𝐝𝟏 𝐝𝟎 𝐝𝟏 𝐝𝟎 𝐝𝟏 𝐝𝟎 ∗ If the candidate 𝛉 increases significantly the posterior probability, the candidate is always accepted. However, a candidate which decrease the posterior probability can still be accepted as opposed to optimization algorithms; it allows the MH algorithm to explore regions of high posterior probability. Hence, by its stochastic nature, the MH algorithm may escape from local extrema which is a problem for many optimization algorithms used for ML estimation (Dahlin 2016). The performance of the MH algorithm is highly dependent on the choice of the proposed distribution. A commonly used choice is the Gaussian random walk, ∗ 𝑖 −1 ∗ 𝑖 −1 𝑖 −1 𝑞 (𝛉 |𝛉 )= 𝒩 (𝛉 |𝛉 ,𝚺 ) (9) 𝑖 −1 𝑁 ∗ 𝑖 −1 ∗ where 𝒩 (𝛉 |𝛉 ,𝚺 ) is a Gaussian probability density function of a random variable 𝛉 ∈ ℝ with mean 𝑖 −1 𝑁 𝑖 −1 𝑁 x 𝑁 𝑝 𝑝 𝑝 𝛉 ∈ ℝ and covariance 𝚺 ∈ ℝ , 1 1 −1 ∗ 𝑖 −1 𝑖 −1 ∗ 𝑖 −1 T 𝑖 −1 ∗ 𝑖 −1 𝒩 (𝛉 |𝛉 ,𝚺 )= exp(− (𝛉 − 𝛉 ) (𝚺 ) (𝛉 − 𝛉 )) 𝛉 𝛉 (10) 1⁄2 𝑛 ⁄2 𝑖 −1 2 (2𝜋 ) 𝚺 | | Finding a suitable covariance matrix 𝚺 is a hard task which involves many trials and becomes unrealistic for high- dimensional problems (Sarkka 2013). The Markov chain should converge to the stationary distribution in a reasonable time (burn-in phase) and the Markov chain should not be highly autocorrelated, such that the number of iteration for exploring the stationary distribution is minimized. The performance and the tuning of the MH algorithm can be respectively improved and simplified by using the gradient and Hessian of the posterior distribution in order to construct a better proposed distribution (Dahlin 2016). The next section shows a robust and accurate method for computing the likelihood, gradient and Hessian for linear Gaussian state-space models (6). 3.2.1. Construction of the posterior and proposal distribution The construction of the posterior distribution (1) requires the evaluation of the likelihood 𝑝 (𝐲 |𝛉 ) and the prior 1:𝑘 distribution 𝑝 (𝛉 ). The challenging part is the evaluation of the likelihood because the prior distribution is usually chosen such that it is easy to evaluate (Sarkka 2013). For a state-space model, the likelihood can be computed by using the prediction error decomposition ( | ) ( | ) ( | ) (11) 𝑝 𝐲 𝛉 = 𝑝 𝐲 𝛉 ∏ 𝑝 𝐲 𝐲 ,𝛉 1:𝑘 1 𝑘 1:𝑘 −1 𝑘 =2 where the predictive likelihood can be computed recursively by 𝑝 (𝐲 |𝐲 ,𝛉 )= ∫𝑝 (𝐲 |𝐱 ,𝛉 ) 𝑝 (𝐱 |𝐲 ,𝛉 ) d𝐱 (11.a) 𝑘 1:𝑘 −1 𝑘 𝑘 𝑘 1:𝑘 −1 𝑘 ( | ) ( | ) with 𝑝 𝐲 𝐱 ,𝛉 and 𝑝 𝐱 𝐲 ,𝛉 representing respectively the measurement model and the predictive 𝑘 𝑘 𝑘 1:𝑘 −1 distribution of the state. To avoid computational inaccuracy and instability, the logarithm of the unnormalized posterior distribution (right- hand side of (1)), named log-posterior, is computed instead. Since not all the states are not observed, the parameter estimation problem also requires to solve the state estimation problem (Dahlin 2016). For linear Gaussian state-space model, the integral in (11.a) can be computed in closed form by the Kalman filter. The gradient and an approximation of the Hessian are obtained by differentiation of the Kalman filter equations with respect to the unknown parameters, referred to as the sensitivity equations. Therefore, computation of 𝑁 sensitivity equations is required in parallel to the Kalman filter, where 𝑁 is the number of 𝑝 𝑝 unknown parameters. However, this strategy is numerically unstable due to rounding errors, i.e., the state covariance matrix 𝐏 may cease to be symmetric and positive definite, which leads to the failure of the computational process. This problem can be solved by using a robust square root implementation (Kulikova & 1/2 Tsyganova 2016, Tsyganova & Kulikova 2012), where only the square root factor 𝐏 is propagated instead of 1/2 the full state covariance matrix. The upper triangular matrix 𝐏 is obtained by Cholesky decomposition, such T/2 1/2 that 𝐏 = 𝐏 𝐏 . 1⁄2 The recursion starts by updating the mean value of the prior state 𝐱 and prior square root factor 𝐏 with 𝑘 |𝑘 −1 𝑘 |𝑘 −1 the current measurements 𝐱 = 𝐱 + 𝐊 𝐞̅ (12) 𝑘 |𝑘 𝑘 |𝑘 −1 𝑘 𝑘 with the standardized residuals −1 2 ̅ (13) 𝐞 = 𝐒 (𝐲 − 𝐂 𝐱 ) 𝑘 𝑘 𝐝 𝑘 |𝑘 −1 1⁄2 The normalized Kalman gain 𝐊 and the square root factor of the residual covariance, 𝐒 , are directly read from the post-array, in the right-hand side of equation (14) 1⁄2 1⁄2 𝚺 0 𝐒 𝐊 𝐯 𝑘 MU 𝐐 [ ] = [ ] 𝑘 ⁄ ⁄ 1 2 1 2 1⁄2 (14) 𝐏 𝐂 𝐏 0 𝐏 ⏟ 𝑘 |𝑘 − 1 𝑘 | 𝑘 −1 ⏟ 𝑘 |𝑘 Pre−array Post−array MU where 𝐐 is an orthogonal rotation matrix obtained by QR decomposition of the pre-array, such that the post- array is upper triangular; this notation is used throughout the paper. The partial derivatives of the quantities in the post-array (14) are computed with 1⁄2 𝜕 𝐒 T 𝑘 ‡ ‡ ‡ 1 2 (15) = ((𝓛 ) + 𝓓 + 𝓤 )𝐒 𝑖 𝑖 𝑖 𝜕 𝜃 𝜕 𝐊 T T (16) 𝑘 −1⁄2 1⁄2 T ‡ ‡ T = 𝐘 + ( 𝓛 − 𝓛 )𝐊 + 𝐕 𝐒 𝐏 ( ) ( ) 𝑖 𝑖 𝑖 𝑘 𝑘 𝑘 |𝑘 𝜕 𝜃 1 2 (17) 𝜕 𝐏 𝑘 |𝑘 1⁄2 † † † = ( 𝓛 + 𝓓 + 𝓤 )𝐏 ( ) 𝑖 𝑖 𝑖 𝑘 |𝑘 𝜕 𝜃 † † † ‡ ‡ ‡ MU Where 𝓛 , 𝓓 , 𝓤 , 𝓛 , 𝓓 and 𝓤 are obtained by first multiplying the orthogonal rotation matrix 𝐐 to the partial derivative of the pre-array in (14) 1 2 𝜕 𝚺 𝜕 𝜃 𝐗 𝐘 MU 𝑖 𝑖 𝐐 = [ ] (18) 1⁄2 1⁄2 𝐕 𝐖 𝜕 𝐏 𝐂 𝜕 𝐏 𝑖 𝑖 ( ) 𝑘 |𝑘 −1 𝐝 𝑘 |𝑘 −1 [ 𝜕 𝜃 𝜕 𝜃 ] 𝑖 𝑖 and then the post-array (18) is multiplied by the inverse of the pre-array (14) −1 1⁄2 𝐒 𝐊 𝐗 𝐘 𝑘 𝑘 𝑖 𝑖 (19) 𝐆 = [ ][ ] 1⁄2 𝐕 𝐖 𝑖 𝑖 0 𝐏 𝑘 |𝑘 † † † The matrices 𝓛 , 𝓓 and 𝓤 are respectively the lower triangular, diagonal, and upper triangular parts of the ‡ ‡ ‡ submatrix 𝐆 , from row and column 𝑁 + 1 to row and column 𝑁 + 𝑁 . Matrices 𝓛 , 𝓓 and 𝓤 are respectively 𝑦 𝑦 𝑥 the lower triangular, diagonal, and upper triangular parts of the submatrix 𝐆 , from the first row and column to row and column 𝑁 (Tsyganova & Kulikova 2012). The partial derivative of the prior state mean update is computed by 𝜕 𝐱 𝜕 𝐱 𝜕 𝐊 𝜕 𝐞̅ 𝑘 |𝑘 𝑘 |𝑘 −1 𝑘 𝑘 (20) = + 𝐞̅ + 𝐊 𝑘 𝑘 𝜕 𝜃 𝜕 𝜃 𝜕 𝜃 𝜕 𝜃 𝑖 𝑖 𝑖 𝑖 with 1⁄2 𝜕 𝐞̅ 𝜕 𝐒 𝜕 𝐂 𝜕 𝐱 𝑘 ⁄ 𝐝 𝑘 |𝑘 −1 −1 2 𝑘 (21) = −𝐒 ( 𝐞 + 𝐱 + 𝐂 ) 𝑘 𝑘 |𝑘 −1 𝐝 𝜕 𝜃 𝜕 𝜃 𝜕 𝜃 𝜕 𝜃 𝑖 𝑖 𝑖 𝑖 The posterior state mean and square root factor are propagated forward in time by the state equation similar to (6) 𝐱 = 𝐀 𝐱 + 𝐁 (𝐮 + 𝛂 ∆ )− 𝐁 𝛂 (22) 𝑘 +1|𝑘 𝐝 𝑘 |𝑘 𝒌 𝑡 and by 𝐝𝟏 𝐝𝟎 1⁄2 1 2 𝐏 𝐀 𝑘 |𝑘 𝐏 TU 𝑘 +1|𝑘 (23) 𝐐 [ ] = [ ] 1 2 The partial derivative of the state equation (22) is 𝜕 𝐱 𝜕 𝐱 𝜕 𝐀 𝜕 𝐁 𝜕 𝐁 𝑘 +1|𝑘 𝑘 |𝑘 (24) = 𝐱 + 𝐀 + (𝐮 + 𝛂 ∆ )− 𝛂 𝑘 |𝑘 𝐝 𝑘 −1 𝑡 𝜕 𝜃 𝜕 𝜃 𝜕 𝜃 𝜕 𝜃 𝜕 𝜃 𝑖 𝑖 𝑖 𝑖 𝑖 where the partial derivatives of the state and input discrete matrices with respect to the continuous parameters are computed by (Mbalawata et al. 2013) 𝐀 0 𝐀 0 𝜕 𝐀 𝜕 𝐀 [ 𝐝 ] = exp [ ]∆𝑡 (24.a) 𝜕 𝜃 𝜕 𝜃 ⏟ ⏟ 𝑖 𝐀 ( 𝐌 ) 𝐈𝐌 −1 𝜕 𝐁 𝜕 𝐁 = 𝐀 (𝐀 − 𝐈 ) [ ] [ ] (24.b) 𝜕 𝜃 𝜕 𝜃 −1 −1 𝜕 𝐁 𝜕 𝐁 = 𝐀 (−𝐀 (𝐀 − 𝐈 )+ 𝐀 ∆ ) (24.c) [ ] [ ] 𝐌 𝐌 𝜕 𝜃 𝜕 𝜃 The partial derivative of the square root factor in the post-array (23) 1 2 𝜕 𝐏 𝑘 +1|𝑘 1⁄2 (25) = 𝓛 +𝓓 + 𝓤 𝐏 ( ) 𝑖 𝑖 𝑖 𝑘 +1|𝑘 𝜕 𝜃 TU requires to first multiply the orthogonal rotation matrix 𝐐 by the partial derivative of the pre-array (23) 1 2 𝜕 (𝐏 𝐀 ) 𝑘 |𝑘 𝜕 𝜃 TU 𝑖 (26) 𝐐 = [ ] 1⁄2 𝜕 𝚺 [ 𝜕 𝜃 ] where the matrices 𝓛 , 𝓓 and 𝓤 are respectively the lower triangular, diagonal, and upper triangular parts of the −T⁄2 matrix product 𝐀 𝐏 . 𝑘 |𝑘 The log-likelihood is recursively computed with 1 𝑁 𝑦 T⁄2 1⁄2 ln𝑝 (𝐲 |𝛉 )= − ln(2𝜋 )− ln(det 𝐒 𝐒 )− 𝐞 𝐞̅ (27) ∑ ( ) 1:𝑘 𝑘 𝑘 𝑘 𝑘 2 2 𝑘 =1 1⁄2 where the standardized innovations 𝐞̅ and the square-root of the innovation covariance matrix 𝐒 are computed respectively by (13) and (14). The gradient and the Hessian approximation of (27) are respectively obtained with 𝐈𝐌 𝐈𝐌 𝐝𝟏 𝐝𝟏 𝐈𝐌 𝐝𝟎 𝐝𝟎 𝐝𝟏 𝐝𝟎 𝑁 1 2 ( | ) ̅ 𝜕 ln𝑝 𝐲 𝛉 𝜕 𝐒 𝜕 𝐞 1:𝑘 −1⁄2 𝑘 (28) = −∑ tr(𝐒 ) + 𝐞̅ 𝜕 𝜃 𝜕 𝜃 𝜕 𝜃 𝑖 𝑖 𝑖 𝑘 =1 1⁄2 1⁄2 2 𝑇 𝜕 ln𝑝 (𝐲 |𝛉 ) 𝜕 𝐞̅ 𝜕 𝐞̅ 𝜕 𝐒 𝜕 𝐒 1:𝑘 𝑘 𝑘 −1⁄2 −1⁄2 𝑘 𝑘 − ≈ tr( ) + tr(𝐒 𝐒 ) (29) 𝑘 𝑘 𝜕 𝜃 𝜕 𝜃 𝜕 𝜃 𝜕 𝜃 𝜕 𝜃 𝜕 𝜃 𝑖 𝑗 𝑖 𝑗 𝑖 𝑗 𝑘 =1 It has been shown in this section that the parameter estimation problem is also a state estimation problem. For a linear Gaussian state-space model, the log-likelihood with its gradient and Hessian approximation can be computed by a square root version of the Kalman filter. This numerically stable strategy requires only to run the square root Kalman filer and 𝑁 sensitivity equations forward in time. The gradient and the Hessian of the log-posterior distribution are easily computed with ( | ) ( ) 𝜕 ln𝑝 𝛉 𝐲 𝜕 ln𝑝 𝛉 1:𝑘 (30) 𝐠 (𝛉 )= + 𝜕 𝛉 𝜕 𝛉 2 2 ( | ) ( ) 𝜕 ln𝑝 𝛉 𝐲 𝜕 ln𝑝 𝛉 1:𝑘 (31) 𝐇 (𝛉 )= − − 2 2 𝜕 𝛉 𝜕 𝛉 and are used in the MH algorithm to construct an efficient proposal distribution (Dahlin 2016) ∗ 𝑖 −1 ∗ 𝑖 −1 −1 𝑖 −1 𝑖 −1 −1 𝑖 −1 ̂ ̂ ( ) ( ) ( ) ( ) 𝑞 𝛉 |𝛉 = 𝒩 (𝛉 |𝛉 + 𝚺 𝐇 𝛉 𝐠 𝛉 ,𝚺 𝐇 𝛉 ) (32) where 𝚺 is a diagonal matrix which control the step length of the proposal distribution. Using the geometric information of the posterior distribution has the advantage of steering the Markov chain towards areas of high posterior probability (Nemeth 2014), which reduces the burn-in phase since the Markov chain takes larger steps when the Markov chain is far from the posterior mode and smaller steps as it gets closer. This approach allows to save user time and also computational time. Firstly, because the covariance matrix is given by the inverse of the Hessian approximation, thus only the step length has to be tuned. Secondly, because it increases the mixing of the Markov chain, so the MH algorithm needs less iterations (Dahlin 2016, Nemeth 2014). The proposal distribution (32) is based on Newton-type optimization; consequently, the suggested candidates 𝛉 are unconstrained and can violate the physical meaning of the system. A simple solution would be to reject candidates outside specified bounds but it could increase the autocorrelation of the Markov chain if too many candidates are rejected. A better solution is to reparametrize the model (Dahlin 2016). 3.2.2. Reparametrization of the model The idea is to use non-linear functions to transform a constrained problem into an unconstrained one. In this way, the proposal distribution cannot suggest candidates which are outside the bounds. The constrained parameters 𝛉 are transformed to unconstrained parameters 𝛈 by a one-to-one invertible functions 𝛈 = 𝐟 (𝛉 ). Two parametrizations are used: 1) the log transform 𝜂 = ln(𝜃 ) (33) 𝑖 𝑖 constraints 𝜃 between the open interval ]0,+∞[ and 2) the following transformation (Team Stan Development 2015) min 𝜃 − 𝜃 𝜂 = logit( ) (34) max min 𝜃 − 𝜃 𝑗 𝑗 min max constraints 𝜃 between the open interval ]𝜃 ,𝜃 [, where the logit function is 𝑗 𝑗 𝑗 logit(𝑧 )= ln( ) (34.a) 1− 𝑧 In the acceptance probability (8), the unnormalized posterior distribution is computed in the constrained space whereas the proposal distribution is evaluated in the unconstrained one. To homogenize the acceptance probability, the log-posterior distribution is transformed in the unconstrained space by using the Jacobian adjustment (Gelman, Carlin, et al. 2014) ln𝑝 (𝛈 |𝐲 )= ln𝑝 (𝛉 |𝐲 )+ ln|det(𝐉 )| (35) 1:𝑘 1:𝑘 | ()| | ( )| where det 𝐉 is the absolute value of the determinant of the Jacobian matrix 𝐉 ; det 𝐉 adjusts for the distortion −𝟏 caused by the non-linear transformation; 𝐉 is the Jacobian matrix of the inverse transform 𝛉 = 𝐟 (𝛈 ), such that 𝜕 𝜃 J = (35.a) 𝜕 𝜂 The Jacobian matrix is triangular if each transformed parameter only depends on a single untransformed parameter, which simplifies the determinant computation to the product of the diagonal elements. The gradient (30) and the Hessian (31) are with respect to 𝛉 , so they need to be multiplied by the Jacobian 𝐉 in order to be with respect to 𝛈 (chain rule). The partial derivative of the new term in (35) needs also to be taken into account. The gradient and the Hessian in the unconstrained space are obtained by 𝜕 ln|det(𝐉 )| 𝐠 (𝛈 )= 𝐉 𝐠 (𝛉 )+ (36) 𝜕 𝛈 𝜕 ln|det(𝐉 )| ̂ ̂ 𝐇 (𝛈 )= 𝐉 𝐇 (𝛉 )𝐉 + (37) 𝜕 𝛈 A problem arises with this type of reparametrization when 𝜃 gets close to a bound: its corresponding Jacobian ̂ ̂ ( ) ( ) ( ) term goes towards zero, so 𝐠 𝛈 and 𝐇 𝛈 are unreliable. Moreover, the Hessian estimate 𝐇 𝛈 could become ill- conditioned which prevents from its inversion. This issue is solved in optimization by adding a penalty function (Kristensen & Madsen 2003) which increases the gradient near the bounds, but this strategy is not suitable here because the proposal distribution (32) has a stochastic part, which means that candidates can still be projected towards the bounds. Instead, a prior distribution 𝑝 (𝛉 ) is used to assign a low probability near the bounds. The 𝑖𝑗 details of the MH algorithm with gradient and Hessian information for constrained parameters are given in Algorithm 1. Algorithm 1: Second order Metropolis-Hastings (Dahlin 2016) Inputs: 𝑁 (number of iterations), 𝛉 (initial parameters), 𝚺 (step length) 1:𝑁 Output: 𝛈 (samples from the posterior distribution) 0 0 ( ) 1. Transformation to the unconstrained parameter space 𝛈 = 𝐟 𝛉 with (33) and (34) 0 0 0 2. Compute: ln𝑝 (𝛈 |𝐲 ), 𝐠 (𝛈 ) and 𝐇 (𝛈 ) with (35), (36) and (37) 1:𝑘 3. 𝑟𝑓𝑜 𝑖 = 1 𝑡𝑜 𝑁 ∗ 𝑖 −1 −1 𝑖 −1 𝑖 −1 −1 𝑖 −1 ̂ ̂ 4. Suggest a new candidate 𝛈 ~ 𝒩 (𝛈 + 𝚺 𝐇 (𝛈 )𝐠 (𝛈 ),𝚺 𝐇 (𝛈 )) ∗ ∗ ∗ 5. Compute: ln𝑝 (𝛈 |𝐲 ), 𝐠 (𝛈 ) and 𝐇 (𝛈 ) with (35), (36) and (37) 1:𝑘 6. Compute the acceptance probability 𝛼 with (8) ( ) 7. Generate a uniform random variable 𝑢 ~ 𝒰 0,1 and set 8. 𝑖𝑓 𝑢 ≤ 𝛼 Accept the new candidate 𝑖 𝒊 𝒊 𝒊 ∗ ∗ ∗ ∗ ̂ ̂ ( ) ( ) ( ) ( | ) ( ) ( ) {𝛈 ,ln𝑝 𝛈 |𝐲 ,𝐠 𝛈 ,𝐇 𝛈 }← {𝛈 ,ln𝑝 𝛈 𝐲 ,𝐠 𝛈 ,𝐇 𝛈 } 1:𝑘 1:𝑘 10. 𝑒𝑒𝑙𝑠 Reject the new candidate 𝑖 𝒊 𝒊 𝒊 𝑖 −1 𝑖 −1 𝑖 −1 𝑖 −1 ̂ ̂ ( ) ( ) ( ) ( ) ( ) ( ) {𝛈 ,ln𝑝 𝛈 |𝐲 ,𝐠 𝛈 ,𝐇 𝛈 }← {𝛈 ,ln𝑝 𝛈 |𝐲 ,𝐠 𝛈 ,𝐇 𝛈 } 1:𝑘 1:𝑘 12. 𝑒𝑛𝑑 𝑖𝑓 13. 𝑒𝑛𝑑 𝑟𝑓𝑜 3.2.3. Choice of prior distribution The knowledge of possible parameter values before anything has been observed is represented probabilistically by ( ) the prior distribution 𝑝 𝛉 . A prior distribution, which is relevant with the experiment, the physical nature of the problem, or for other reasons, has to be specified by the user. Three categories of prior information are considered: non-informative, weakly informative and informative (see Gelman et al. 2014, for a complete discussion on the subject). Non-informative prior distributions attempt to not affect the posterior distribution, such that only the information in the data are relevant; this is the idea behind the ML estimation. But, these flat or almost flat prior distributions put more probability mass outside the expected range of values than inside, which can have unforeseen effect on the posterior distribution (Dahlin 2016), especially for small data set. Moreover, non-informative prior distributions, such as 𝒰 (−∞,+∞) may be improper (they do not integrate to one), thus they cannot be expressed as a probability density function. In some cases, proper posterior distribution can be obtained with an improper prior distribution, but the result must be interpreted with care (Gelman, Carlin, et al. 2014). Weakly informative prior distributions provide sufficient information to keep the parameters in a reasonable range and unlike informative prior distributions, they are not likely to outweigh the likelihood. For cases where the data set is too short or not enough informative, weakly informative prior distribution contains enough information to regularize the posterior distribution and prevent from identifiability issues; the curvature around the expected solution is increased (Team Stan Development 2015). For the parameters transformed with the logarithm (33), the prior distributions are 𝜃 ~ 𝒢 (𝑎 ,𝑏 ), where 𝒢 (𝑎 ,𝑏 ) denotes a Gamma distribution with shape 𝑎 and expected value 𝑏 . The hyper-parameters 𝑎 and 𝑏 are chosen such that the probability near zero is low and that the distribution covers the expected range of values (Figure 3). For the transformed parameter with lower and upper bounds, the prior distributions are min max min max 𝜃 ~ 𝛽 (2,2,𝜃 ,𝜃 ), where 𝛽 (𝑎 ,𝑏 ,𝜃 ,𝜃 ) is a Beta distribution with shape hyper-parameters 𝑎 and 𝑏 , 𝑗 𝑗 𝑗 𝑗 𝑗 min max lower and upper bounds 𝜃 and 𝜃 . The Beta distribution with 𝑎 = 𝑏 = 2 is symmetric and assigns low 𝑗 𝑗 probabilities for values near the bounds (Figure 3). −4 −1 Figure 3: Prior pdf, 𝒢 (2,0.03) in blue and 𝛽 (2 ,2,10 ,2∙ 10 ) in red 3.2.4. Tuning the algorithm The choice of the prior distribution is an important decision which can strongly influence the posterior distribution (Dahlin 2016) but the exploration of the posterior distribution depends on the tuning of the algorithm. The use of the Hessian approximation reduces the tuning of the proposal distribution to the choice of the step length matrix 𝚺 (32). Separate step lengths for each parameter can be used, but to simplify the tuning, when a single step length is used, such that 𝚺 = ε𝐈 . The step length affects directly the acceptance rate of the MH algorithm (the percentage of accepted candidates at stationarity). A too large 𝚺 produces broad jumps which are more likely to be rejected, which increases the autocorrelation of the Markov chains and give a low acceptance rate. On the contrary, if the step length is too small, short jumps are likely to be accepted, which gives a higher acceptance rate. However, it limits the exploration of the posterior distribution to a small neighborhood, which also increases the autocorrelation. Consequently, the acceptance rate alone is not a correct indicator of the algorithm performance. A better solution is to look at the mixing of the Markov chains at stationarity, which can be quantified by the integrated autocorrelation time (IACT) (Dahlin 2016) 𝑁 :𝑁 𝑁 :𝑁 𝑏 𝑏 IACT(𝜃 ) = 1+ 2 𝜌̂ (𝜃 ) (38) 𝑗 𝑗 𝑙 =1 𝑁 :𝑁 𝑁 :𝑁 𝑏 𝑏 where 𝜌 denotes the autocorrelation coefficient at lag 𝑙 of 𝜃 , and 𝜃 is the Markov chain of 𝜃 from the 𝑙 𝑗 𝑗 𝑗 burn-in time 𝑁 to the last iteration 𝑁 . The number of lags 𝐿 is determined as the first index for which 𝑁 :𝑁 |𝜌̂ (𝜃 )| < 2 √𝑁 − 𝑁 , when the autocorrelation coefficient becomes statistically insignificant. 𝑙 𝑏 The IACT represents the number of iterations between two uncorrelated samples; therefore, the step length should be chosen such that it minimizes the IACT. The number of iteration 𝑁 should be chosen sufficiently large, such that, once the burn-in phase has been removed, the number of samples left are sufficient to represent the posterior distribution. Tools for diagnosing convergence are discussed in the next section. 3.2.5. Convergence diagnosis The procedure of Gelman et al. (2014) is used here and presented briefly. The procedure consists of simulating 𝑀 Markov chains of 𝑁 samples, where the starting points of the 𝑀 Markov chains are randomly sampled from the prior distributions. The first step consists of inspecting visually the trace plots of the different Markov chains to determine the burn-in time 𝑁 and to check if they converge to the same posterior distributions. The 𝑀 Markov chains, with the burn-in phase removed, are split in two to give 2𝑀 chains of length (𝑁 − 𝑁 )⁄2; then the variations between and within the 2𝑀 chains are compared (Gelman, Carlin, et al. 2014). The stationarity implies that the first and the second half of each sequence come from the same distribution. A good mixing requires that the variance inside chains should be closed to the variance between chains; this is quantified by the potential scale reduction 𝑅 (see Gelman et al. 2014 for computational details). The number of iterations 𝑁 should be 𝑁𝑝 ̂ ̂ increased until 𝑅 is near one or at least 𝑅 < 1.1. The mixing of the Markov chains can also be quantified by the effective sample size (ESS) which approximates the number of independent samples in the 2𝑀 sequences. Gelman et al. (2014) suggest that the ESS should be at least superior to 5 × 2𝑀 . If the aforementioned criteria are satisfied, the samples from the 2𝑀 sequences can be used to estimate the posterior distribution. 3.3. Model validation After having calibrated different models, how to assess their reliability? The purpose of a model is to reproduce an input-output relationship; an intuitive way to start is by looking at what the model is not able to reproduce, the residuals 𝐞̅ (13) (Ljung 2002). A plot of the residuals and the data allows to understand which features are not properly described by the model and who might be responsible for. The noise terms in model (6) are assumed to be white noise sequences which implies that it should also be the case for the residuals. The white noise sequence is uncorrelated, normally distributed with a zero mean and is uniformly distributed on all frequencies (Madsen 2007). These properties are assessed by plotting the autocorrelation function (ACF) and the cumulated periodogram (CP) with their respective 95% confidence intervals. Furthermore, the residuals should be independent of the past inputs, which is tested by plotting the cross-correlation function (CCF). The reliability of the model is also tested on a data set which has not been used for the calibration (validation data set). New values for the inputs are introduced in the model and the simulated output is compared to the measured one. If the identification data set is informative enough, i.e. the different dynamics of the system are observable in the data, the model should be representative of the system and therefore the simulation should be close to the measurement. The model validation assesses the reliability of a model and gives insight on the model order selection and directions for improvement; but how to select the best model? 3.4. Model comparison In section 3.1, the selection of a model structure based on insights of the experiment has been presented. This section discusses the agreement of calibrated models with measured data and how to compare different models in order to select the most appropriate one. The model fit to the data is summarized by the log-likelihood ln𝑝 (𝐲 |𝛉 ) 1:𝑘 (27) and the prior distribution is not relevant for assessing the accuracy of a model. The best model is not necessarily characterized by the highest log-likelihood value because as the complexity of a model increases, the number of degrees of freedom increases, the parameters adjust themselves to fit a particular realization of the noise (overfitting) (Ljung 2002). In order to adjust for overfitting, the Akaike’s Information Criterion (AIC) and Bayesian Information Criterion (BIC) penalize the log-likelihood in function of the complexity of the model: AIC = −2ln𝑝 (𝐲 |𝛉 )+ 2𝑁 (39) 1:𝑘 𝑝 ( | ) BIC = −2ln𝑝 𝐲 𝛉 + 𝑁 ln𝑁 (40) 1:𝑘 𝑝 𝑠 where 𝛉 is the ML estimate, 𝑁 the number of parameters and 𝑁 the sample size. 𝑝 𝑠 The smallest AIC or BIC between different models indicates the most appropriate model. For nested models, like ℳ and ℳ , the likelihood ratio test (LRT) can be used (Bacher & Madsen 2011) 3 4 ℳ ℳ 3 4 (41) LRT = −2(ln𝑝 (𝐲 |𝛉 )− ln𝑝 (𝐲 |𝛉 )) 1:𝑘 1:𝑘 ℳ ℳ 3 4 with 𝛉 and 𝛉 the ML estimate of model ℳ and ℳ . 3 4 ℳ ℳ 2 4 3 As the number of samples 𝑁 goes to infinity, the LRT converges to 𝜒 distributed variable with (𝑁 − 𝑁 ) 𝑠 𝑝 𝑝 degrees of freedom. Usually, a 𝑝 of the LRT below 0.05, indicates that the improvement of the larger model 𝑒𝑙𝑢𝑎𝑉 ℳ over ℳ is significant and consequently the model ℳ should be preferred. 4 3 4 These criteria are based on point estimate 𝛉 and not on the posterior distribution 𝑝 (𝛉 |𝐲 ); a more Bayesian 1:𝑘 criterion is given by the Watanabe-Akaike Information Criterio (WAIC) (Gelman, Hwang, et al. 2014) ( ( | )) ( ( | )) (42) WAIC = −2∑ mean ln𝑝 𝐲 𝛉 − var ln𝑝 𝐲 𝛉 𝑘 𝑘 𝑘 =1 (𝑁 −𝑁 ) x 𝑁 𝑏 𝑠 where ln𝑝 (𝐲 |𝛉 )∈ ℝ is the log-likelihood at time instant 𝑘 and (𝑁 − 𝑁 ) is the number of sample 𝑘 𝑏 used to approximate the posterior distribution ln𝑝 (𝛉 |𝐲 ). Differently of the AIC and BIC, the WAIC is penalized 1:𝑘 by the dispersion of the log-likelihood. Evaluating these criteria on same data set used for the calibration introduced a bias in the model selection process and therefore it is advised to evaluate them with the validation data set instead; this point is illustrated in the next section for a real test case. 4. Application to the Twin houses experiment 4.1. Model comparison The capabilities of the second order MH (Algorithm 1) are now tested on the twin houses experiment presented in section 2. The purpose is to calibrate the models ℳ and ℳ (Figure 2) where the south zone temperature (green 3 4 zone Figure 1) is the output and two boundary conditions are considered, the outdoor temperature 𝑇 (°C) and the 𝐌𝐋 𝐌𝐋 𝐌𝐋 𝐌𝐋 𝐌𝐋 𝐌𝐋 𝐌𝐋 𝐌𝐋 north zone temperature 𝑇 (°C). The vector of unknown parameters 𝛉 for ℳ and ℳ are respectively given in 𝑛 3 4 2 2 2 Table 1 and Table 2. The process noise covariance is defined by 𝚺 (𝛉 )= diag(𝜎 𝜎 𝜎 ) and the 𝐰 𝑤 𝑤 𝑤 11 22 33 ( ) measurement noise variance by Σ θ = 𝜎 . The standard deviation 𝜎 of the state 𝑥 in ℳ has been fixed to v 𝑣 𝑤 𝑠 4 −6 10 instead of putting an informative prior distribution. This problem has been investigated by different authors who used relatively similar optimization strategies but different model structures. De Coninck et al. (2015) and Rehab & André (2015) used second order thermal models but with different structure and inputs. Himpe & Janssens (2015) used a model with four states which was correlated with the HVAC system and the solar radiations. They improved their model by scaling the system noise with respect to the heater and solar radiation signal. A validation data set has not been used to demonstrate the effectiveness of the model; only the improvement of the residuals is shown. With a zero order-hold assumption, the model is not able to understand the fast dynamic of the heating signal. Indeed, the time response of the electric heaters is estimated between 1 and 2 minutes, which is faster than the sampling time of the data (10 minutes). This issue is solved by considering that the inputs vary linearly between two samples (first order-hold). Around 24 days of data are used, where the first 14 days are the identification data set (ROLBS sequence, Figure 4) and the last 10 days are the validation data set (Figure 5); the detail of the inputs and outputs is given in section Figure 4: Identification data set, output, standardized residuals of ℳ (blue) and ℳ (orange), and inputs 3 4 Figure 5: Validation data set A unique step length in the proposal distribution has been used with 𝚺 = ε𝐈 , where ε = 0.3 has been selected such that it minimizes the IACT. This tuning gives an acceptance rate of approximately 30% for ℳ and 25% for ℳ . The diagnosis procedure presented in section 3.2.5 has been applied with 𝑀 = 6 Markov chains with initial parameter values randomly sampled from their respective prior distributions (Table 1 and Table 2); this is illustrated by the trace plot of the first thousand iterations in Figure 6. The first 500 samples of 𝑁 = 5500 are discarded as burn-in for the model ℳ whereas the first 1500 samples of 𝑁 = 6500 are discarded for ℳ . 3 4 The chains at stationarity for both models are split in two to give 12 chains of 2500 samples which are used to quantify the mixing of the MH algorithm. The results are summarized in Table 1 and Table 1, with the worst values highlighted in red and the best in green. The worst potential scale reductions 𝑅 are below the threshold of 1.1 and the worst effective samples sizes (ESS) are easily above 5 × 12. Consequently the 12× 2500 samples can be used to approximate the posterior distributions of the parameters. The posterior distributions of the 6 simulated Markov chains are represented in Figure 7 for ℳ and in Figure 8 for ℳ with different colors; the black line is 3 4 the global approximation of the posterior distributions using all samples. Table 1: Prior distributions, posterior modes and diagnosis tests (worst: red and best: green) of ℳ ̂ ̂ ESS Prior distributions Posterior modes 𝑅 Min/Max IACT −4 −1 −2 3 ( ) 𝑅 𝛽 2,2,10 ,2∙ 10 5.53∙ 10 1.0100 1.22∙ 10 18.86 29.47 −4 −1 −3 3 𝑅 𝛽 (2,2,10 ,2∙ 10 ) 2.09∙ 10 𝟏 .𝟎𝟎𝟑𝟗 1.29 ∙ 10 18.25 29.50 −4 −1 −3 3 𝑅 𝛽 (2,2,10 ,10 ) 2.31∙ 10 1.0045 1.25∙ 10 17.81 30.81 −4 −1 −3 3 𝑅 𝛽 (2,2,10 ,10 ) 4.98 ∙ 10 1.0086 1.16∙ 10 18.79 28.16 8 −3 −1 −2 3 ⁄ ( ) 𝐶 10 𝛽 2,2,10 ,5 ∙ 10 3.12∙ 10 1.0049 1.31∙ 10 19.46 27.42 8 −4 −1 −3 3 𝐶 ⁄10 𝛽 (2,2,10 ,10 ) 6.73∙ 10 1.0080 1.28∙ 10 17.49 32.01 8 −2 −1 𝟑 𝐶 ⁄10 𝛽 (2,2,10 ,5) 1.38∙ 10 1.0061 𝟏 .𝟒𝟗 ∙ 𝟏𝟎 17.18 24.58 −1 3 𝑎 𝛽 (2,2,10 ,5) 1.06 1.0132 1.19 ∙ 10 18.90 24.12 −1 3 𝑎 𝛽 (2,2,10 ,5) 1.24 1.0143 1.39 ∙ 10 18.40 𝟐𝟐 .𝟗𝟗 −1 𝜎 𝒢 (2,0.03) 1.02∙ 10 1.0065 1.26∙ 10 19.06 30.88 −2 𝜎 𝒢 (2,0.03) 1.38∙ 10 𝟏 .𝟎𝟐𝟐𝟗 𝟓 .𝟐𝟖 ∙ 𝟏𝟎 𝟑𝟔 .𝟐𝟗 𝟓𝟒 .𝟗𝟎 −2 𝜎 𝒢 (2,0.03) 1.82∙ 10 1.0084 1.15∙ 10 22.12 33.57 −2 2 𝜎 𝒢 (2,0.03) 1.69 ∙ 10 1.0179 6.26∙ 10 29.39 50.03 𝑥 𝛽 (2,2,15,45) 29.20 1.0146 1.15∙ 10 𝟏𝟔 .𝟒𝟑 24.49 𝑥 𝛽 (2,2,15,45) 29.45 1.0099 1.33∙ 10 18.09 27.53 Table 2: Prior distributions, posterior modes and diagnosis tests (worst: red and best: green) of ℳ ̂ ̂ Prior distributions Posterior modes ESS 𝑅 Min/Max IACT −4 −1 −2 3 ( ) [ ] 𝑅 𝛽 2,2,10 ,10 4.93 ∙ 10 1.0073 1.06∙ 10 24.17 41.44 −4 −1 −3 2 ( ) 1.0127 [ ] 𝑅 𝛽 2,2,10 ,10 1.83∙ 10 7.26∙ 10 30.34 58.48 −4 −1 −3 2 𝛽 (2,2,10 ,10 ) 2.13∙ 10 1.0136 6.04∙ 10 [ ] 𝑅 34.27 66.75 −3 −2 −3 2 𝛽 (2,2,10 ,5 ∙ 10 ) 5.35∙ 10 1.0099 7.41∙ 10 [ ] 𝑅 31.15 53.36 −4 −1 −3 2 𝛽 (2,2,10 ,10 ) 5.27∙ 10 1.0094 9.89 ∙ 10 [ ] 𝑅 24.93 35.84 𝑁𝑝 8 −3 −1 −2 2 𝐶 ⁄10 𝛽 (2,2,10 ,5 ∙ 10 ) 2.52∙ 10 1.0176 5.23∙ 10 [37.62 69.45] 8 −4 −2 −3 2 [ ] 𝐶 ⁄10 𝛽 (2,2,10 ,10 ) 5.27∙ 10 1.0182 5.69 ∙ 10 35.82 89.49 8 −2 −1 3 ⁄ ( ) 1.0083 [ ] 𝐶 10 𝛽 2,2,10 ,5 1.46∙ 10 1.13∙ 10 23.32 32.37 8 −5 −3 −5 2 𝐶 ⁄10 𝛽 (2,2,10 ,10 ) 5.81∙ 10 1.0090 6.15∙ 10 [ ] 37.72 83.21 −1 −1 2 𝛽 (2,2,10 ,5) 9.07∙ 10 1.0181 6.33∙ 10 [ ] 𝑎 36.30 69.06 −1 𝛽 (2,2,10 ,5) 1.17 1.0126 𝟏 .𝟏𝟔 ∙ 𝟏𝟎 [ ] 𝑎 22.65 𝟐𝟖 .𝟒𝟏 −2 2 𝜎 𝒢 (2,0.03) 9.87 ∙ 10 1.0095 9.69 ∙ 10 [ ] 24.21 41.21 −2 𝜎 𝒢 (2,0.03) 2.61∙ 10 1.0248 𝟒 .𝟎𝟖 ∙ 𝟏𝟎 [𝟒𝟑 .𝟗𝟐 86.54] −2 2 𝜎 𝒢 (2,0.03) 3.18∙ 10 𝟏 .𝟎𝟐𝟖𝟓 4.35∙ 10 [41.75 𝟏𝟏𝟎 .𝟓𝟒 ] −2 2 𝜎 𝒢 (2,0.03) 1.15∙ 10 1.0233 4.31∙ 10 [42.91 81.03] 𝑥 𝛽 (2,2,15,45) 28.83 1.0124 1.02∙ 10 [27.08 35.02] 𝑥 𝛽 (2,2,15,45) 28.31 𝟏 .𝟎𝟎𝟕𝟎 1.08∙ 10 [24.45 34.56] 𝑥 𝛽 (2,2,15,45) 29.64 1.0099 9.49 ∙ 10 [𝟐𝟏 .𝟗𝟓 44.21] Figure 6: Trace plot of the 6 Markov chains in the first thousand iterations (ℳ ) Figure 7: Posterior distribution of ℳ from the 6 Markov chains Figure 8: Posterior distribution of ℳ from the 6 Markov chains The reliability of the calibrated models is tested by residual analysis. The least correlated standardized residuals 𝐞 (13) are shown in Figure 4, the ACF, CCF and CP of models ℳ and ℳ are shown respectively in 3 4 Figure 9 and in Figure 10. The red lines delimit the 95% confidence intervals and the lags are the number sample shifts between the two signals. Hence, to validate the hypothesis, 5% of the lags must not cross these limits. For both models, the inputs are not correlated with the standardized residuals which means that the models are able to explain all input-output relationships. However, the white noise hypothesis is rejected for the model ℳ ; the highest standardized residual values (Figure 4) are correlated with the switches of the heating signal and the solar radiations. Figure 9: ACF, CP and CCF of the standardized residuals, ℳ Figure 10: ACF, CP and CCF of the standardized residuals, ℳ 4 The reliability of the model is also tested by comparing the measured south zone temperature of the validation data set with the simulated output. A clear advantage of Bayesian estimation is that it is possible to simulate directly from the posterior distribution, which gives a simulated output with all the uncertainties. This is very useful for model predictive control; the weather forecast is introduced in the estimated model to predict the indoor temperature. Afterwards, the trade-off between comfort and energy saving is chosen by taking either the lowest temperature prediction, such that the HVAC system is sure to maintain the comfort or by taking the highest predicted temperature such that the HVAC system uses the minimal amount of energy. The simulation from the posterior distribution is plotted in Figure 11. The measured south zone temperature is always included in the simulated output distribution for both models, but the dispersion for the model ℳ is more important. In order to select the most appropriate model, the performances of both models (section 3.4) are summarized in Table 3. For the identification data set, the model ℳ should be accepted against ℳ ; the AIC, BIC and WAIC 4 3 are smaller for ℳ than for ℳ , and the 𝑝 of the LRT confirms this choice. However, the criteria evaluated 4 3 𝑒𝑙𝑢𝑎𝑉 on the validation data set indicate that the model ℳ should be preferred; the log-likelihood of the model ℳ is 3 3 higher and less dispersed than the log-likelihood of the model ℳ , as shown in Figure 11. Table 3: Performance criteria Identification data set Validation data set ℳ ℳ ℳ ℳ 3 4 3 4 3 3 3 3 AIC −7.59 ∙ 10 −8.02∙ 10 −5.26∙ 10 −5.21∙ 10 3 3 3 3 BIC −7.51∙ 10 −7.91∙ 10 −5.19∙ 10 −5.11∙ 10 LRT 𝑝 0 1 𝑒𝑙𝑢𝑎𝑉 3 3 3 3 WAIC −7.58∙ 10 −8.01∙ 10 −5.27∙ 10 −5.23∙ 10 Figure 11: Left: measured (black) and simulated south zone temperature with the validation data set; right: log- likelihoods for the validation data set, (ℳ : blue, ℳ : orange) 3 4 The performance gap between both models is significant for the identification data set in comparison to the validation data set. In the identification data set, the ROLBS introduces an unconventional dynamic which is not representative of the intended use of the building. A more complex model is required to fit the fast variations of the south zone air temperature, but these fast variations are not present in a conventional use and therefore the extra complexity of the model ℳ is not required. As mentioned in section 3.4, using the same information from the data for calibration and for selection may be misleading (Gelman, Hwang, et al. 2014). In this case, it is also illustrated by the whiteness improvement of the residuals for the model ℳ . It can be concluded that the model ℳ is more representative of the south zone and consequently, only the model ℳ is considered in the following 3 3 of the paper. 4.2. Physical interpretation of the results The posterior distributions are compared against the building characteristics which are available in Strachan et al. (2016). The envelope thermal resistance is given by −1 −2 (43) 𝑅 = (∑ 𝑈 𝑆 + 𝑈 𝑆 ) = 4.47∙ 10 K/W 𝑤 𝑗 𝑗 𝑤𝑖 𝑤𝑖 𝑗 =1 where 𝑈 and 𝑆 are the U-values and surfaces of the different walls (south, east and west) and 𝑈 and 𝑆 are the 𝑗 𝑗 𝑤𝑖 𝑤𝑖 U-value and surfaces of the windows. The envelope thermal resistance is estimated with the posterior distributions of 𝑅 and 𝑅 such that 𝑜 𝑖 −2 −2 𝑅 = 𝑅 + 𝑅 ∈ [ ] (43.a) 𝑤 𝑜 𝑖 4.11∙ 10 9.38∙ 10 The resistance 𝑅 belongs to the estimated interval (43.a), but this interval is large which means that the uncertainties are important. The thermal resistance between the south and the north zone is given by −1 −2 ( ) 𝑅 = 𝑈 𝑆 + 𝑈 𝑆 + 3𝑈 𝑆 = 1.53∙ 10 K/W (44) 𝑧 𝑧 1 𝑧 1 𝑧 2 𝑧 2 𝑟𝑜𝑑𝑜 𝑟𝑜𝑑𝑜 where 𝑈 and 𝑆 are the U-value and the surface of the north wall of the living room, 𝑈 and 𝑆 are the U- 𝑧 1 𝑧 1 𝑧 2 𝑧 2 value and the surface of the north wall of the bathroom and the corridor, and 𝑈 and 𝑆 are the U-value and 𝑟𝑜𝑑𝑜 𝑜𝑑𝑜 𝑟 the surface of the doors. −3 −3 The estimated range from the posterior distribution of 𝑅 is [ ] which is far smaller than 4.59 ∙ 10 5.52∙ 10 the value computed in (44). This gap could mean that the infiltration between the two zones are significant. The envelope thermal capacity is defined by 3 𝐶 = ∑ 𝐶 𝑆 = 1.30∙ 10 J/K (45) 𝑤 𝑗 𝑗 𝑗 =1 with 𝐶 the heat capacity. The thermal capacity of the windows is negligible as compared to 𝐶 . 𝑗 𝑤 6 6 The estimated posterior distribution of 𝐶 covers the following range of values [ ], which is 𝑤 2.21∙ 10 4.47∙ 10 more than half less than the expected value (45). The thermal capacity of the medium 𝐶 consists of the inner walls of the south zone but also of parts of the ceiling and ground floor, such as 𝐶 = 𝐶 + 𝐶 + 𝐶 + 𝐶 = 8.00∙ 10 J/K (46) 𝑚 𝑖 𝑤 𝑖 𝑤 𝑑𝑛𝑢𝑟𝑜𝑔 𝑖𝑙𝑒𝑔𝑖𝑐𝑛 1 2 where the subscripts 𝑖𝑤 and 𝑖𝑤 denote respectively the east wall of the living room and the other light walls. 1 2 7 7 The estimated range for 𝐶 is [ ] which is the expected order of magnitude since (46) is 1.22∙ 10 1.54∙ 10 overestimated by taking into account all the volume of the ground floor and the ceiling. The thermal capacity of the indoor air is simply given by 𝐶 = 𝜌 𝑐 𝑉 = 1.79 ∙ 10 J/K (47) 𝑖 𝑎 𝑎 with 𝑐 and 𝜌 the specific heat and the density of the air, and 𝑉 the volume of the south zone. 𝑎 𝑎 In this case as well, the estimated posterior distribution of 𝐶 is consistent with the order of magnitude of the 5 5 expected value, where 𝐶 ∈ [ ]. 6.60∙ 10 6.90 ∙ 10 Determining prior knowledge on the convective resistance 𝑅 between the indoor air and the medium is not an easy task and it is not of main interest in this study. The parameters 𝑎 and 𝑎 are interpreted as effective areas 𝐼 𝑊 because 𝑄 is measured on a horizontal surface. Nevertheless, it is interesting to see that 𝑎 is superior to 𝑎 𝑔 ℎ 𝐼 𝑊 which shows the importance of direct solar radiations into the south zone. The time constants 𝛕 of the continuous system (4) are computed by (48) 𝛕 = − where 𝛌 are the eigenvalues of the state matrix and 𝛕 ,𝛌 ∈ ℝ ,. The time constants of the model ℳ , given in Table 4, are consistent in the range of the fast dynamics of the air and the slow accumulation of energy in the medium. The performances of the second-order MH are compared to a ML estimation in the next section. Furthermore, the regularization effect of the prior distribution is illustrated by identifiability analysis. Table 4: Time constants of model ℳ Time constant [hours] −1 −1 𝜏 [ ] 1.57∙ 10 1.66∙ 10 𝜏 [2.19 3.75] [ ] 𝜏 25.84 34.34 4.3. Performance comparison with maximum likelihood estimation Table 5: ML estimation with the same random initial parameters as the MH algorithm, the values in bold represent the parameters closed to their boundaries MLE 1 MLE 2 MLE 3 MLE 4 MLE 5 MLE 6 −𝟏 −𝟒 −𝟏 −𝟏 −𝟏 −𝟏 𝑅 𝟏 .𝟗𝟖 ∙ 𝟏𝟎 𝟏 .𝟎𝟐 ∙ 𝟏𝟎 𝟏 .𝟗𝟖 ∙ 𝟏𝟎 𝟏 .𝟗𝟖 ∙ 𝟏𝟎 𝟏 .𝟔𝟐 ∙ 𝟏𝟎 𝟏 .𝟗𝟖 ∙ 𝟏𝟎 −3 −2 −3 −3 −3 −3 𝑅 1.40∙ 10 4.78 ∙ 10 1.40∙ 10 1.18∙ 10 1.35∙ 10 1.42∙ 10 −2 −3 −2 −3 −3 −2 𝑅 2.24∙ 10 1.12∙ 10 2.19 ∙ 10 3.32∙ 10 2.91 ∙ 10 2.08∙ 10 −3 −3 −3 −3 −3 −𝟐 𝑅 2.94 ∙ 10 5.76 ∙ 10 2.92 ∙ 10 4.00∙ 10 𝟖 .𝟔𝟏 ∙ 𝟏𝟎 2.87 ∙ 10 8 −2 −𝟑 −2 −1 −2 −2 𝐶 ⁄10 6.50 ∙ 10 𝟏 .𝟎𝟏 ∙ 𝟏𝟎 6.46∙ 10 1.19 ∙ 10 6.94 ∙ 10 6.39 ∙ 10 8 −3 −3 −3 −3 −3 −3 𝐶 ⁄10 6.75 ∙ 10 6.86 ∙ 10 6.76 ∙ 10 6.72∙ 10 6.74∙ 10 6.81∙ 10 8 −1 −1 −1 −𝟐 −1 𝐶 ⁄10 3.03∙ 10 1.24∙ 10 3.12∙ 10 𝟏 .𝟑𝟑 ∙ 𝟏𝟎 2.3 3.29 ∙ 10 𝑎 1.01 3.15 1.02 1.74 1.08 1.03 𝑎 1.23 1.27 1.24 1.27 1.23 1.25 −2 −2 −2 −2 −2 −2 5.83 ∙ 10 1.14∙ 10 5.50∙ 10 8.71∙ 10 5.09 ∙ 10 4.67 ∙ 10 −5 −𝟖 −2 −2 −5 −2 3.28 ∙ 10 𝟏 .𝟏𝟓 ∙ 𝟏𝟎 1.30∙ 10 1.32∙ 10 2.90 ∙ 10 2.51∙ 10 −1 −2 −1 −5 −2 −1 6.45 ∙ 10 6.58 ∙ 10 6.41∙ 10 1.57 ∙ 10 9.79 ∙ 10 6.38∙ 10 −2 −2 −2 −2 −2 −𝟖 𝜎 1.90 ∙ 10 1.70∙ 10 1.63∙ 10 1.63∙ 10 1.90 ∙ 10 𝟏 .𝟑𝟖 ∙ 𝟏𝟎 𝑥 29.60 36.68 29.59 29.54 29.70 29.60 𝑥 𝟒𝟒 .𝟒𝟕 29.61 𝟒𝟒 .𝟓𝟎 29.95 25.00 𝟒𝟒 .𝟓𝟕 The performance of the second-order MH algorithm is now tested against a ML estimation with a quasi-Newton optimization. The log-likelihood and its gradient are supplied to the unconstrained MATLAB’s function −8 fminunc (optimality tolerance and step tolerance fixed to 10 ) which use a BFGS approximation of the Hessian. The same parameter transformations are used except for the standard deviations; they are bounded −8 between 10 and 5 because it has been observed that with the logarithm transformation (33), the standard deviations can be too close to zero which cause numerical instabilities. The penalty function given by Kristensen & Madsen (2003) is used to repulse the parameters near the bounds. The ML estimation is repeated 6 times with the same initial parameters as for MH algorithm; the ML parameter estimates are given in Table 5. The results are highly dependent on the initial conditions for most of the parameters; some of them are different of several orders of magnitude. It can be concluded in this case that the second-order MH algorithm has a better global convergence than the ML optimization. https://fr.mathworks.com/help/optim/ug/fminunc.html A closer look is given to the parameter 𝜎 which seems to be unidentifiable compared to the parameter 𝐶 . The 𝑤 𝑖 profiles of the log-likelihood and log-posterior are plotted in Figure 12. These profiles are obtained by maximizing the log-likelihood and the log-posterior with respect to all parameters except 𝜎 and 𝐶 , which are 𝑤 𝑖 fixed for each optimization to different values (x-axis, Figure 12). Confidence intervals for the profile likelihood can be computed by (Madsen & Thyregod 2012) ( | ) ( | ) ( ) (49) ln𝑝 𝐲 𝛉 − ln𝑝 𝐲 𝛉 > − 𝜒 𝑝 1:𝑘 ~𝑖 1:𝑘 1−𝛼 where ln𝑝 (𝐲 |𝛉 ) is the maximum log-likelihood with respect to all parameters except 𝜃 , ln𝑝 (𝐲 |𝛉 ) is the 1:𝑘 ~𝑖 𝑖 1:𝑘 ( ) maximum log-likelihood and 𝜒 𝑝 is a chi squared distribution with 𝑝 degrees of freedom and confidence 1−𝛼 level 𝛼 . ( ) The 95% confidence intervals given by − 𝜒 1 = −1.92 are represented in Figure 12 by the dotted black 0.95 lines. The profile of the log-posterior is similarly computed by ln𝑝 (𝛉 |𝐲 )− ln𝑝 (𝛉 |𝐲 ). These profiles are ~𝑖 1:𝑘 1:𝑘 computed around the posterior modes given in Table 1; the idea is to visualize the quantity of information given by the data and the regularization brought by the prior distribution. The profile of the log-likelihood of 𝜎 is asymmetric and almost flat around its maximum, which means that any value on the flat region has negligible effects on the log-likelihood; it explains the dispersion of the ML estimation in Table 5. The profile of the log- posterior shows how the prior distribution (𝒢 (2,0.03), Figure 3) increases the curvature, especially towards zero, and regularizes the identifiability problem. For the parameter 𝐶 , the information from the data overweight the prior distribution which means that the prior distribution has only a small effect on the posterior distribution. Figure 12: Profile log-likelihood (black) and profile log-posterior (blue). The dashed lines represent the 95% confidence intervals of the log-likelihood and the red dots the respective maximums of the curves 5. Conclusion The estimation of building energy demand and building energy performance are possible through experimental calibration of dynamic thermal models. Making decisions or predictions from the calibrated model requires to take into account all the uncertainties of the estimates; Bayesian calibration fits this purpose by estimating the posterior distributions of the parameters. This paper compares the three phases of an experimental calibration (selection, calibration and validation), from a Bayesian and a frequentist point of view. More specifically, proposed improvements on the Metropolis- Hastings algorithm, using gradient and Hessian information (second-order Metropolis-Hastings) are presented. It is shown that the gradient of linear and Gaussian model can be computed exactly by a robust square root version of the Kalman filter, and a Hessian estimate is proposed with low extra computational burden. A combination of change of variable and prior distribution is also proposed, which allows to constrain the parameters in a physical range. These improvements on the Metropolis-Hastings facilitate considerably the tuning of the algorithm: only a step length and the prior distributions have to be specified. Two models of respectively 15 and 18 unknown parameters have been easily calibrated with the improved Metropolis-Hastings algorithms where a unique step-length has been used, which illustrates the gain of this method over a classical Metropolis-Hastings with random walk. Furthermore, it is shown that the second-order Metropolis- Hastings algorithm has a better robustness against the initial conditions than a maximum likelihood estimation with a quasi-Newton algorithm, and it is illustrated through an identifiability analysis, that the prior distributions act as regularization when the data are not informative enough. It is highlighted that model selection criteria should be computed on a different data set than the one used for the calibration to avoid a biased selection. Indeed, in this experiment the unconventional excitation generated by the heaters implies that a more complex model should be selected, but this extra complexity is not required for a more conventional use of the HVAC system. 1. Acknowledgements This work was financially supported by BPI France in the FUI Project COMETE. Thanks you to Dr. Paul Strachan for making the Twin houses data available online; to the Dynastee network, particularly to Henrik Madsen, Peder Bacher and Rune Juhl for their statistical guidelines; as well as Dr. Johan Dahlin and Dr. Maria Kulikova for their time and expertise. 2. References Andersen, P.D. et al., 2014. Characterization of heat dynamics of an arctic low-energy house with floor heating. Building Simulation, 7(6), pp.595–614. Bacher, P. & Madsen, H., 2011. Identifying suitable models for the heat dynamics of buildings. Energy and Buildings, 43(7), pp.1511–1522. Available at: http://dx.doi.org/10.1016/j.enbuild.2011.02.005. Baker, P.H. & van Dijk, H.A.L., 2008. PASLINK and dynamic outdoor testing of building components. Building and Environment, 43(2), pp.143–151. Berger, J. et al., 2016. Bayesian inference for estimating thermal properties of a historic building wall. Building and Environment, 106, pp.327–339. Available at: http://dx.doi.org/10.1016/j.buildenv.2016.06.037. Bloem, J.., 1994. Institute for Systems Engineering a N D Informatics Workshop on Application of System Identification. Chong, A. & Lam, K.P., 2015. Uncertainty analysis and parameter estimation of HVAC systems in building energy models. Proceedings of BS2015: 14th Conference of International Building Performance Simulation Association, (Equation 1), pp.2788–2795. Chong, A. & Poh Lam, K., 2017. A Comparison of MCMC Algorithms for the Bayesian Calibration of Building Energy Models for Building Simulation 2017 Conference. , pp.494–503. De Coninck, R. et al., 2015. Toolbox for development and validation of grey-box building models for forecasting and control. Journal of Building Performance Simulation, 1493(July), pp.1–16. Available at: http://www.tandfonline.com/doi/full/10.1080/19401493.2015.1046933. CTSM-R Development Team, 2015. Continuous Time Stochastic Modelling in R, User’s Guide and Reference Manual. Www.Ctsm.Info. Available at: http://ctsm.info/. Dahlin, J., 2016. Accelerating Monte Carlo methods for Bayesian inference in dynamical models Accelerating Monte Carlo methods for Bayesian inference in dynamical models. , (1754). Available at: http://www.johandahlin.com/publications-files/phd-dahlin-thesis-final.pdf. European Commission, 2016. An EU strategy on heating and cooling 2016, EVO, 2014. International Performance Measurement and Verification Protocol Core Concepts, Gelman, A., Carlin, J.B., et al., 2014. Bayesian Data Analysis, Gelman, A., Hwang, J. & Vehtari, A., 2014. Understanding predictive information criteria for Bayesian models. Statistics and Computing, 24(6), pp.997–1016. Ghiaus, C., 2013. Causality issue in the heat balance method for calculating the design heating and cooling load. Energy, 50, pp.292–301. Available at: http://linkinghub.elsevier.com/retrieve/pii/S0360544212007864 [Accessed January 7, 2015]. Ghiaus, C. & Hazyuk, I., 2010. Calculation of optimal thermal load of intermittently heated buildings. Energy and Buildings, 42(8), pp.1248–1258. Available at: http://dx.doi.org/10.1016/j.enbuild.2010.02.017. Hazyuk, I., Ghiaus, C. & Penhouet, D., 2012a. Optimal temperature control of intermittently heated buildings using Model Predictive Control: Part I – Building modeling. Building and Environment, 51, pp.379–387. Available at: http://linkinghub.elsevier.com/retrieve/pii/S0360132311003933 [Accessed January 5, 2015]. Hazyuk, I., Ghiaus, C. & Penhouet, D., 2012b. Optimal temperature control of intermittently heated buildings using Model Predictive Control: Part II - Control algorithm. Building and Environment, 51, pp.388–394. Available at: http://dx.doi.org/10.1016/j.buildenv.2011.11.008. Heo, Y. et al., 2015. Scalable methodology for large scale building energy improvement: Relevance of calibration in model-based retrofit analysis. Building and Environment, 87, pp.342–350. Available at: http://dx.doi.org/10.1016/j.buildenv.2014.12.016. Heo, Y., Choudhary, R. & Augenbroe, G.A., 2012. Calibration of building energy models for retrofit analysis under uncertainty. Energy and Buildings, 47, pp.550–560. Available at: http://dx.doi.org/10.1016/j.enbuild.2011.12.029. Himpe, E. & Janssens, A., 2015. Characterisation of the thernial performance of a test house based on dynamic measurements. Energy Procedia, 78, pp.3294–3299. Available at: http://dx.doi.org/10.1016/j.egypro.2015.11.739. Jiménez, M.J., 2014. Reliable building energy performance characterisation based on full scale dynamic measurements, Kristensen, M.H. et al., 2017. Bayesian Calibration Of Residential Building Clusters Using A Single Geometric Building Representation Department of Engineering , Aarhus University , 8000 Aarhus C , DK Department of Engineering , University of Cambridge , Cambridge CB2 1PZ , UK AffaldVa. Kristensen, N.R. & Madsen, H., 2003. Continuous Time Stochastic Modelling. Mathematics Guide. , pp.1–32. Kulikova, M. V. & Tsyganova, J. V., 2016. A unified square-root approach for the score and Fisher information matrix computation in linear dynamic systems. Mathematics and Computers in Simulation, 119, pp.128– 141. Available at: http://dx.doi.org/10.1016/j.matcom.2015.07.007. Li, Q., Augenbroe, G. & Brown, J., 2016. Assessment of linear emulators in lightweight Bayesian calibration of dynamic building energy models for parameter estimation and performance prediction. Energy and Buildings, 124, pp.194–202. Ljung, L., 2002. System identification: theory for the user (second edition). Automatica, 38(2), pp.375–378. Madsen, H., 2007. Time series analysis, Madsen, H. & Thyregod, P., 2012. Introduction to General and Generalized Linear Models, Available at: http://dx.doi.org/10.1111/j.1751-5823.2011.00179_8.x. Mbalawata, I.S., Särkkä, S. & Haario, H., 2013. Parameter estimation in stochastic differential equations with Markov chain Monte Carlo and non-linear Kalman filtering. Computational Statistics, 28(3), pp.1195–1223. Naveros, I., 2016. Modelling Heat Transfer for Energy Efficiency Assessment of Buildings: Identification of Physical Parameters. Naveros, I. et al., 2014. Setting up and validating a complex model for a simple homogeneous wall. Energy and Buildings, 70, pp.303–317. Available at: http://linkinghub.elsevier.com/retrieve/pii/S0378778813007937 [Accessed January 13, 2015]. Nemeth, C.J., 2014. Parameter Estimation for State Space Models using Sequential Monte Carlo Algorithms. , (November). Nespoli, L., Medici, V. & Rudel, R., 2015. Grey-Box System Identification of Building Thermal Dynamics Using only Smart Meter and Air Temperature Data. Building Simulation Conference, p.Nespoli. Nielsen, A.A. & Nielsen, B.K., 1984. A dynamic test method for the thermal performance of small houses. Proc. ACEEE Conf., Santa Cruz, CA, 1984, American Council for an Energy-Efficient Economy, CA, pp.207–220. Rehab, I. & André, P., 2015. Energy Performance Charactirisation of T He Test Case “Twin House” in Holzkirchen , Based on Trnsys Simulation and Grey Box Model. Building Simulation Conference, pp.2401–2408. Sarkka, S., 2013. Bayesian Filtering and Smoothing. Cambridge University Press, p.254. Available at: http://dl.acm.org/citation.cfm?id=2534502%5Cnhttp://ebooks.cambridge.org/ref/id/CBO9781139344203. Strachan, P. et al., 2016. Empirical Whole Model Validation Modelling Specification Validation of Building Energy Simulation Tools, Team Stan Development, 2015. Stan Modeling Language: User’s Guide and Reference Manual. Version 2.7.0. Interaction Flow Modeling Language, pp.1–534. Tian, W. et al., 2016. Identifying informative energy data in Bayesian calibration of building energy models. Energy and Buildings, 119, pp.363–376. Available at: http://dx.doi.org/10.1016/j.enbuild.2016.03.042. Tsyganova, Y. V. & Kulikova, M.V., 2012. On efficient parametric identification methods for linear discrete stochastic systems. Automation and Remote Control, 73(6), pp.962–975. Available at: http://link.springer.com/10.1134/S0005117912060033. Turner, C. & Frankel, M., 2008. Energy Performance of LEED ® for New Construction Buildings. New Buildings Institute, pp.1–46. Váňa, Z. et al., 2013. Building semi-physical modeling : On selection of the model complexity ˇ a c a n. Proc. European Control Conference. De Wilde, P., 2014. The gap between predicted and measured energy performance of buildings: A framework for investigation. Automation in Construction, 41, pp.40–49. Available at: http://dx.doi.org/10.1016/j.autcon.2014.02.009. Zayane, C., 2011. Identification d ’un modèle de comportement thermique de bâtiment à partir de sa courbe de charge. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Statistics arXiv (Cornell University)

An efficient Bayesian experimental calibration of dynamic thermal models

Statistics , Volume 2019 (1904) – Apr 12, 2019

Loading next page...
 
/lp/arxiv-cornell-university/an-efficient-bayesian-experimental-calibration-of-dynamic-thermal-uT040pCH0J
ISSN
0360-5442
eISSN
ARCH-3347
DOI
10.1016/j.energy.2018.03.168
Publisher site
See Article on Publisher Site

Abstract

Experimental calibration of dynamic thermal models is required for model predictive control and characterization of building energy performance. In these applications, the uncertainty assessment of the parameter estimates is decisive; this is why a Bayesian calibration procedure (selection, calibration and validation) is presented. The calibration is based on an improved Metropolis-Hastings algorithm suitable for linear and Gaussian state-space models. The procedure, illustrated on a real house experiment, shows that the algorithm is more robust to initial conditions than a maximum likelihood optimization with a quasi-Newton algorithm. Furthermore, when the data are not informative enough, the use of prior distributions helps to regularize the problem. Keywords: Bayesian calibration, model selection and validation, dynamic thermal models, real house experiment, Metropolis-Hastings algorithm, robust gradient and Hessian computation, change of variables, prior distribution selection, identifiability Nomenclature Notations 𝑥 ,𝑦 ,𝑧 Scalars Vectors 𝐱 ,𝐲 ,𝐳 Matrices 𝐀 ,𝐁 ,𝐂 ℝ Space of dimension 𝑞 Notational conventions Matrix transpose −1 Matrix inverse −1 −𝟏 /𝟐 1/2 ( ) −T/𝟐 −1/2 ( ) ( ) det 𝐀 Determinant of the matrix 𝐀 tr(𝐀 ) Trace of the matrix 𝐀 𝐱 ̇ Time derivative of vector 𝐱 𝜕 𝐱 ⁄𝜕 𝜃 Partial derivative of 𝐱 with respect to 𝜃 diag(𝑎 ,𝑎 ,… ,𝑎 ) 1 2 𝑁 Diagonal matrix with diagonal values 𝑎 ,𝑎 ,… ,𝑎 1 2 𝑁 𝔼 [∙] Expected value 𝑝 (𝐱 ) Probability density function (pdf) of a random variable 𝐱 𝑝 (𝐱 |𝐲 ) Conditional pdf of vector 𝐱 given vector 𝐲 𝐱 ~ 𝑝 (𝐱 ) Random variable 𝐱 with probability distribution 𝑝 (𝐱 ) Proportional Approximately equal 1:𝑁 Set of values 𝐱 = [𝐱 ,𝐱 ,…,𝐱 ] 1 2 𝑁 1. Introduction The existing methods for characterizing building energy performance and energy saving provided by retrofitting are not relevant (Turner & Frankel 2008, De Wilde 2014). The energy performance estimation of buildings and associated systems must be independent of weather conditions and user behavior. From this assessment, the Efficiency Valuation Organization has developed the International Performance Measurement and Verification Protocol (IPMVP) (EVO 2014). The idea is to construct a physical model which characterizes the building intrinsic thermal dynamic and relate inputs to outputs measured on-site. Hence, the gap between the energy use given by the pre-retrofit and post-retrofit models represents the energy gained by the refurbishment. Minimizing heat losses from buildings is the most obvious solution to reduce the heating and cooling demand but the efficiency and sustainability of the energy chain, from the production to the HVAC systems , must also be improved. Nowadays, the dominant paradigm is that the energy sources need to respond to all requests at any moment. The complexity of this strategy will be augmented with the increasing of the share of renewable energy sources in the energy mix. Therefore, supply and demand must become more flexible by using demand response mechanisms and energy storage (European Commission 2016). In order to adapt the demand to the production, the energy demand must be known. Physical models characterizing the thermal dynamic of buildings associated with model predictive control can be used to forecast the energy demand while maintaining indoor comfort (Hazyuk et al. 2012a, Hazyuk et al. 2012b, Ghiaus & Hazyuk 2010). Two important societal needs are identified, the estimation of building energy demand and the estimation of energy savings brought by energy conservation measures. These two societal needs have the same scientific deadlock: the experimental estimation of the physical parameters of the dynamic thermal behavior of buildings. Such models can be obtained considering the energy balance between buildings and their surroundings and energy balance in buildings can be modelled by using thermal networks (Naveros 2016, Ghiaus 2013). Stochastic state-space models are obtained by first transforming thermal networks in deterministic state-space, and then noise terms are added to represent the deviations between the differential algebraic equations and the true variations of the states. Stochastic state-space models relate inputs to outputs, where the dynamic of the states is given by the parameters; hence by knowing inputs and parameters, the output of the system can be simulated. However, the direct problem requires to know the parameters, so the inverse problem of parameter estimation must be solved first. The interest in parameter estimation for dynamic thermal models is not new (Nielsen & Nielsen 1984) and experiments of various scales have been used to test the validity of different approaches (Bloem 1994, Baker & van Dijk 2008, Jiménez 2014). It is essential when making prediction or decision from an identified model to assess the uncertainties in the parameter estimates; it should be done by taking all the information available. From this perspective, Bayesian estimation gets more and more consideration, for instance, in quantification of energy saving from retrofit (Heo et al. 2012, Heo et al. 2015, Tian et al. 2016 and Li et al. 2016), in calibration of energy models (Chong & Lam 2015, Chong & Poh Lam 2017, Zayane 2011), in estimation of thermal characteristic of a wall (Berger et al. 2016) and in estimation of heating consumption (Kristensen et al. 2017). Bayesian methods, such as the Metropolis-Hastings (MH), are usually employed for low-dimensional problems because by increasing the number of parameters it becomes more and more difficult to tune properly the algorithms. The paper compare the three phases of an experimental model identification (selection, calibration and validation) from a Bayesian and frequentist point of view. More precisely, the paper treats the problem of parameter estimation of dynamic thermal models when the model structure is known. First, the choice of a model structure is discussed and then an implementation of the second-order Metropolis-Hastings for linear and Gaussian state-space models is proposed. Tools and guidelines for tuning and diagnosing the algorithm are also presented. Next, different criteria are presented to assess the performance of models and to guide the selection of a model structure in agreement with data. The whole procedure is tested on a real test case where the differences between the Bayesian and the frequentist approach are illustrated. 2. Twin houses experiment Twin houses are a real outdoor experiment conducted by the Fraunhofer Institute near Munich during April and May 2014. It is an unoccupied single family house (twin house O5) with 100 m ground floor, a cellar and an attic space; a full description of the house and the experiment is given by Strachan et al. (2016). The experiment was designed such that the south zone (Figure 1) has only two boundary conditions: the external temperature and the adjacent spaces (cellar, attic, north zone). The adjacent spaces were held at 22 °C with blinds closed to reduce the chance of overheating and the doors separating the north and south zone were sealed off. The electric heaters on the south zone were synchronized on a Randomly Ordered Logarithmic Binary Sequence (ROLBS) to maintain a similar temperature in the different rooms (800 W in the living room, 500 W in the south bedroom and 500 W in the bathroom). The ROLBS signal was designed for three reasons: maximize the temperature difference with the boundary conditions, excite the range of time constant from 1 hour to 90 hours and decorrelate the heating signal with the solar radiation. The heaters are lightweight, with a time response estimated around 1 or 2 minutes by the Fraunhofer Institute (Strachan et al. 2016) and a split coefficient between convective and radiative heat gains of 70⁄30 %. Mechanical ventilation was set to supply a volume flow rate of 60 m /h into the living room and extract 30 m /h in the bathroom and the south bedroom. Figure 1: Layout of the twin house O5 The stratification of the air in each room of the south zone is measured with temperature sensors at 10 cm, 110 cm and 170 cm from the ground. This level of accuracy is not required to fit a simple dynamic thermal model; therefore the temperature of the rooms are chosen as the average of the three heights and the south zone temperature 𝑇 (°C) is a weighted average of the spaces (living room, south bedroom, corridor and bathroom) by their respective surfaces. The boundary temperature 𝑇 (°C) is also a weighted average of the temperatures in the different spaces (kitchen, lobby and north bedroom). A weather station near the house provides the outside air ( ) ( ) temperature 𝑇 °C and the global solar irradiance measured on a horizontal surface 𝑄 W/m . The heating in 𝑜 𝑔 ℎ the south zone is done by three electric heaters; therefore 𝑄 (W) represents the total heat input injected. Buildings are dynamic systems which may be modelled by using thermal networks from where state-space models can be deduced (Ghiaus 2013). This procedure is used to find a suitable model of the twin house where the south zone (in green on Figure 1) is considered as the main thermal zone and the north zone as an adjacent space. 3. Experimental calibration process The experimental calibration process is decomposed in 3 phases: selection, calibration and validation (Ljung 2002). First, a set of likely model structures characterized by unknown physical parameter vector 𝛉 is chosen based on a-priori knowledge (physics, experiment details, etc). Then, the calibration assesses how these model structures relate to observed data and to physical considerations. The calibration consists of finding a set of parameters which best represents the input-output relationship of a model through observed data, 𝐮 = [𝐮 ,𝐮 ,…,𝐮 ] and 𝐲 = 1:𝑘 1 2 𝑘 1:𝑘 [𝐲 ,𝐲 ,…,𝐲 ]. Finally, the performances of the calibrated models are evaluated to select the model which is the 1 2 𝑘 best suited for its intended use. The experimental calibration process is usually treated either from a frequentist or a Bayesian point of view. In Bayesian estimation, the unknown parameters are treated as random variables with a certain prior distribution 𝑝 (𝛉 ), which represents the prior belief before looking at the data (Dahlin 2016). Then, all the information available in the data is summarized in the likelihood function 𝑝 (𝐲 |𝛉 ). The prior belief and the data 1:𝑘 information are combined in the Bayes’ theorem to compute the posterior distribution: 𝑝 (𝐲 |𝛉 )𝑝 (𝛉 ) 1:𝑘 ( | ) ( | ) ( ) (1) 𝑝 𝛉 𝐲 = ∝ 𝑝 𝐲 𝛉 𝑝 𝛉 1:𝑘 1:𝑘 𝑝 (𝐲 ) 1:𝑘 ( ) where 𝑝 𝐲 is a normalization constant independent of the parameters. 1:𝑘 The posterior distribution 𝑝 (𝛉 |𝐲 ) contains all the statistical information about 𝛉 ; the most probable value of the 1:𝑘 posterior distribution gives the maximum a posteriori (MAP) estimate: ( ) 𝛉 = argmax(𝑝 (𝐲 |𝛉 )𝑝 𝛉 ) (2) 𝐀𝐏𝐌 1:𝑘 ( | ) If only the information in the data is considered, maximizing the likelihood function 𝑝 𝐲 𝛉 gives the maximum 1:𝑘 likelihood (ML) estimate: 𝛉 = argmax(𝑝 (𝐲 |𝛉 )) (3) 1:𝑘 The ML estimate can be seen as a MAP estimate with uniform prior distribution, 𝑝 (𝛉 )∝ 1 (Sarkka 2013). Two philosophies exist for computing these estimates and their uncertainties. The frequentist approach relies on the fact that, as the number of observations increases, the influence of the prior distribution becomes negligible compared to the likelihood and the posterior distribution can be approximated by a Gaussian distribution (Gelman, Carlin, et al. 2014). ML estimation is popular because it requires only point estimates of the posterior modes and their corresponding uncertainties are determined by asymptotic properties. ML estimates are usually found by optimization routines as in the CTSM-R package (CTSM-R Development Team 2015). This strategy has been proven to be efficient at numerous cases (Naveros et al. 2014, Himpe & Janssens 2015, Nespoli et al. 2015, 𝐌𝐋 Andersen et al. 2014, Bacher & Madsen 2011, Váňa et al. 2013). However, the asymptotic theory does not hold for small number of observations, which is often the case in real experiment. From a Bayesian point of view, all the statistical information is summarized in the posterior distribution, thus no assumption is made. Bayesian and frequentist methods are compared in the three phases of the calibration process in order to illustrate the differences and to take advantage of both methods. 3.1. Model selection The choice of an appropriate model structure is the most crucial part in experimental calibration according to Ljung (2002). The model structure should be representative of the real physical system and also in agreement with the measured data. Based on a-priori knowledge of the experiment, a set of models of increasing complexity is defined which allows a forward selection between nested models, i.e. the simplest model can be recovered by putting extra parameters to zero (Bacher & Madsen 2011). In this paper, the distinction between model selection and model comparison is done because the model comparison requires to discuss about calibration first; therefore it is presented later. To illustrate the Bayesian experimental calibration process only the two most promising model candidates are presented. These two models, illustrated in Figure 2, have been selected from a larger set of models which is not presented here due to lack of space. The smallest thermal network in Figure 2, the 3 states model, ℳ , is obtained by not taking into account the surrounded dotted part whereas the 4 states model, ℳ , is obtained 3 4 by adding this part to ℳ , such that, ℳ ⊂ ℳ . 3 3 4 Figure 2: 3 states model, ℳ and 4 states model, ℳ 3 4 In Error! Reference source not found. the nodes represent: • 𝑥 : the temperature of the building envelope (°C), • 𝑥 : the indoor air temperature (°C), • 𝑥 : the temperature of the medium (internal walls, furniture, etc) (°C), • 𝑥 : the temperature of the sensor (°C). When the node 𝑥 is used, it is selected as the model output and is equivalent to the south zone air temperature measured 𝑇 (°C), otherwise 𝑥 is the model output. 𝑠 𝑖 The different heat transfers are characterized by the thermal resistances: • 𝑅 : between the outside and the middle of the building envelope (K/W), • 𝑅 : between half the building envelope and the south zone (K/W), • 𝑅 : between the south zone and the medium (K/W), • 𝑅 : between the air in the north zone and the south zone (K/W), • 𝑅 : between the south zone and the sensor (K/W). The accumulation of energy is modeled by the thermal capacities: • 𝐶 : for the building envelope (J/K), • 𝐶 : for the air in the south zone (J/K), • 𝐶 : for the medium (internal walls, furniture, etc) (J/K), • 𝐶 : for the sensor. The parameters 𝑎 and 𝑎 are respectively the effective area through which the solar radiation enters the building 𝑊 𝐼 envelope and the effective window area of the building. State-space models can be easily deduced from thermal networks by considering heat balance in each temperature nodes (Ghiaus 2013): 𝐱 ̇ = 𝐀 (𝛉 )𝐱 + 𝐁 (𝛉 )𝐮 (4) 𝐲 = 𝐂 𝐱 𝑘 𝑘 where 𝐀 is the state matrix, 𝐁 the input matrix, 𝐂 the output matrix 𝛉 the parameter vector, 𝐮 the input vector and 𝐲 is an output vector which can be measured at discrete time instant 𝑡 e.g. 𝐱 = 𝐱 (𝑡 ) . ( ) 𝑘 𝑘 𝑘 𝑘 Both models from Figure 2 share the same input vector ̇ ̇ ̇ (5) 𝐮 = 𝑇 𝑇 𝑎 𝑄 𝑎 𝑄 + 𝑄 [ ] 𝑜 𝑧 𝑊 𝑔 ℎ 𝐼 𝑔 ℎ ℎ The data of the twin houses experiment are provided with a constant sampling time ∆ ; therefore, the linear time invariant continuous model (4) is discretized and additive noise terms are introduced to describe the deviation between the discrete system and the true variation of the state 𝐱 = 𝐀 (𝛉 )𝐱 + 𝐁 (𝛉 )(𝐮 + 𝛂 ∆ )− 𝐁 (𝛉 )𝛂 + 𝐰 𝑘 +1 𝐝 𝑘 𝑘 𝑡 𝑘 (6) 𝐲 = 𝐂 𝐱 + 𝐯 𝑘 𝑘 𝑘 where 𝐰 and 𝐯 are white noise processes with respective covariance 𝚺 (𝛉 ) and 𝚺 (𝛉 ), and 𝑘 𝑘 𝐰 𝐯 𝐀 ∆ 𝐀 (𝛉 )= e (6.a) −𝟏 ( ) ( ) (6.b) 𝐁 𝛉 = 𝐀 𝐀 − 𝐈 𝐁 −𝟏 −𝟏 𝐁 (𝛉 )= 𝐀 (−𝐀 (𝐀 − 𝐈 )+ 𝐀 ∆ )𝐁 (6.c) 𝐝 𝐝 𝑡 If the input is assumed constant in the time interval ∆ (zero order hold), then 𝛂 = 𝟎 , whereas if the input is assumed to vary linearly (first order hold), then (Kristensen & Madsen 2003) 𝒖 − 𝒖 𝑘 +1 𝑘 𝛂 = (7) In order to make the notation lighter, the parameter dependence in 𝐀 ,𝐁 ,𝐁 ,𝚺 and 𝚺 is omitted such 𝐝 𝐰 𝐯 that 𝐀 = 𝐀 (𝛉 ). 𝐝 𝐝 Some parameters of the state-space model (6) may not be known and, then, they have to be estimated based on measured data. 3.2. Bayesian calibration with Markov Chain Monte Carlo Markov chain Monte Carlo (MCMC) is a general method for constructing posterior distributions. The main idea is to simulate a Markov chain which has been constructed such that it has the posterior distribution as its stationary distribution (Sarkka 2013). The Metropolis-Hastings (MH) algorithm is the most common type of MCMC method due to its simplicity. MH is an iterative scheme, where a new candidate 𝛉 is suggested from a proposed ∗ 𝑖 −1 𝑖 −1 distribution 𝑞 (𝛉 |𝛉 ) given the previous one 𝛉 . The candidate is then accepted or rejected according to some acceptance probability ∗ ∗ 𝑖 −1 ∗ ( | ) ( ) ( ) 𝑝 𝐲 𝛉 𝑝 𝛉 𝑞 𝛉 |𝛉 1:𝑘 𝛼 = min{1, } (8) 𝑖 −1 ∗ 𝑖 −1 𝑖 −1 𝑝 (𝐲 |𝛉 )𝑝 (𝛉 ) 𝑞 (𝛉 |𝛉 ) 1:𝑘 𝑖 −1 ∗ ∗ 𝑖 −1 where the ratio 𝑞 (𝛉 |𝛉 )⁄𝑞 (𝛉 |𝛉 ) corrects the asymmetry in the proposed distribution. 𝐝𝟏 𝐝𝟎 𝐝𝟏 𝐝𝟎 𝐝𝟏 𝐝𝟎 ∗ If the candidate 𝛉 increases significantly the posterior probability, the candidate is always accepted. However, a candidate which decrease the posterior probability can still be accepted as opposed to optimization algorithms; it allows the MH algorithm to explore regions of high posterior probability. Hence, by its stochastic nature, the MH algorithm may escape from local extrema which is a problem for many optimization algorithms used for ML estimation (Dahlin 2016). The performance of the MH algorithm is highly dependent on the choice of the proposed distribution. A commonly used choice is the Gaussian random walk, ∗ 𝑖 −1 ∗ 𝑖 −1 𝑖 −1 𝑞 (𝛉 |𝛉 )= 𝒩 (𝛉 |𝛉 ,𝚺 ) (9) 𝑖 −1 𝑁 ∗ 𝑖 −1 ∗ where 𝒩 (𝛉 |𝛉 ,𝚺 ) is a Gaussian probability density function of a random variable 𝛉 ∈ ℝ with mean 𝑖 −1 𝑁 𝑖 −1 𝑁 x 𝑁 𝑝 𝑝 𝑝 𝛉 ∈ ℝ and covariance 𝚺 ∈ ℝ , 1 1 −1 ∗ 𝑖 −1 𝑖 −1 ∗ 𝑖 −1 T 𝑖 −1 ∗ 𝑖 −1 𝒩 (𝛉 |𝛉 ,𝚺 )= exp(− (𝛉 − 𝛉 ) (𝚺 ) (𝛉 − 𝛉 )) 𝛉 𝛉 (10) 1⁄2 𝑛 ⁄2 𝑖 −1 2 (2𝜋 ) 𝚺 | | Finding a suitable covariance matrix 𝚺 is a hard task which involves many trials and becomes unrealistic for high- dimensional problems (Sarkka 2013). The Markov chain should converge to the stationary distribution in a reasonable time (burn-in phase) and the Markov chain should not be highly autocorrelated, such that the number of iteration for exploring the stationary distribution is minimized. The performance and the tuning of the MH algorithm can be respectively improved and simplified by using the gradient and Hessian of the posterior distribution in order to construct a better proposed distribution (Dahlin 2016). The next section shows a robust and accurate method for computing the likelihood, gradient and Hessian for linear Gaussian state-space models (6). 3.2.1. Construction of the posterior and proposal distribution The construction of the posterior distribution (1) requires the evaluation of the likelihood 𝑝 (𝐲 |𝛉 ) and the prior 1:𝑘 distribution 𝑝 (𝛉 ). The challenging part is the evaluation of the likelihood because the prior distribution is usually chosen such that it is easy to evaluate (Sarkka 2013). For a state-space model, the likelihood can be computed by using the prediction error decomposition ( | ) ( | ) ( | ) (11) 𝑝 𝐲 𝛉 = 𝑝 𝐲 𝛉 ∏ 𝑝 𝐲 𝐲 ,𝛉 1:𝑘 1 𝑘 1:𝑘 −1 𝑘 =2 where the predictive likelihood can be computed recursively by 𝑝 (𝐲 |𝐲 ,𝛉 )= ∫𝑝 (𝐲 |𝐱 ,𝛉 ) 𝑝 (𝐱 |𝐲 ,𝛉 ) d𝐱 (11.a) 𝑘 1:𝑘 −1 𝑘 𝑘 𝑘 1:𝑘 −1 𝑘 ( | ) ( | ) with 𝑝 𝐲 𝐱 ,𝛉 and 𝑝 𝐱 𝐲 ,𝛉 representing respectively the measurement model and the predictive 𝑘 𝑘 𝑘 1:𝑘 −1 distribution of the state. To avoid computational inaccuracy and instability, the logarithm of the unnormalized posterior distribution (right- hand side of (1)), named log-posterior, is computed instead. Since not all the states are not observed, the parameter estimation problem also requires to solve the state estimation problem (Dahlin 2016). For linear Gaussian state-space model, the integral in (11.a) can be computed in closed form by the Kalman filter. The gradient and an approximation of the Hessian are obtained by differentiation of the Kalman filter equations with respect to the unknown parameters, referred to as the sensitivity equations. Therefore, computation of 𝑁 sensitivity equations is required in parallel to the Kalman filter, where 𝑁 is the number of 𝑝 𝑝 unknown parameters. However, this strategy is numerically unstable due to rounding errors, i.e., the state covariance matrix 𝐏 may cease to be symmetric and positive definite, which leads to the failure of the computational process. This problem can be solved by using a robust square root implementation (Kulikova & 1/2 Tsyganova 2016, Tsyganova & Kulikova 2012), where only the square root factor 𝐏 is propagated instead of 1/2 the full state covariance matrix. The upper triangular matrix 𝐏 is obtained by Cholesky decomposition, such T/2 1/2 that 𝐏 = 𝐏 𝐏 . 1⁄2 The recursion starts by updating the mean value of the prior state 𝐱 and prior square root factor 𝐏 with 𝑘 |𝑘 −1 𝑘 |𝑘 −1 the current measurements 𝐱 = 𝐱 + 𝐊 𝐞̅ (12) 𝑘 |𝑘 𝑘 |𝑘 −1 𝑘 𝑘 with the standardized residuals −1 2 ̅ (13) 𝐞 = 𝐒 (𝐲 − 𝐂 𝐱 ) 𝑘 𝑘 𝐝 𝑘 |𝑘 −1 1⁄2 The normalized Kalman gain 𝐊 and the square root factor of the residual covariance, 𝐒 , are directly read from the post-array, in the right-hand side of equation (14) 1⁄2 1⁄2 𝚺 0 𝐒 𝐊 𝐯 𝑘 MU 𝐐 [ ] = [ ] 𝑘 ⁄ ⁄ 1 2 1 2 1⁄2 (14) 𝐏 𝐂 𝐏 0 𝐏 ⏟ 𝑘 |𝑘 − 1 𝑘 | 𝑘 −1 ⏟ 𝑘 |𝑘 Pre−array Post−array MU where 𝐐 is an orthogonal rotation matrix obtained by QR decomposition of the pre-array, such that the post- array is upper triangular; this notation is used throughout the paper. The partial derivatives of the quantities in the post-array (14) are computed with 1⁄2 𝜕 𝐒 T 𝑘 ‡ ‡ ‡ 1 2 (15) = ((𝓛 ) + 𝓓 + 𝓤 )𝐒 𝑖 𝑖 𝑖 𝜕 𝜃 𝜕 𝐊 T T (16) 𝑘 −1⁄2 1⁄2 T ‡ ‡ T = 𝐘 + ( 𝓛 − 𝓛 )𝐊 + 𝐕 𝐒 𝐏 ( ) ( ) 𝑖 𝑖 𝑖 𝑘 𝑘 𝑘 |𝑘 𝜕 𝜃 1 2 (17) 𝜕 𝐏 𝑘 |𝑘 1⁄2 † † † = ( 𝓛 + 𝓓 + 𝓤 )𝐏 ( ) 𝑖 𝑖 𝑖 𝑘 |𝑘 𝜕 𝜃 † † † ‡ ‡ ‡ MU Where 𝓛 , 𝓓 , 𝓤 , 𝓛 , 𝓓 and 𝓤 are obtained by first multiplying the orthogonal rotation matrix 𝐐 to the partial derivative of the pre-array in (14) 1 2 𝜕 𝚺 𝜕 𝜃 𝐗 𝐘 MU 𝑖 𝑖 𝐐 = [ ] (18) 1⁄2 1⁄2 𝐕 𝐖 𝜕 𝐏 𝐂 𝜕 𝐏 𝑖 𝑖 ( ) 𝑘 |𝑘 −1 𝐝 𝑘 |𝑘 −1 [ 𝜕 𝜃 𝜕 𝜃 ] 𝑖 𝑖 and then the post-array (18) is multiplied by the inverse of the pre-array (14) −1 1⁄2 𝐒 𝐊 𝐗 𝐘 𝑘 𝑘 𝑖 𝑖 (19) 𝐆 = [ ][ ] 1⁄2 𝐕 𝐖 𝑖 𝑖 0 𝐏 𝑘 |𝑘 † † † The matrices 𝓛 , 𝓓 and 𝓤 are respectively the lower triangular, diagonal, and upper triangular parts of the ‡ ‡ ‡ submatrix 𝐆 , from row and column 𝑁 + 1 to row and column 𝑁 + 𝑁 . Matrices 𝓛 , 𝓓 and 𝓤 are respectively 𝑦 𝑦 𝑥 the lower triangular, diagonal, and upper triangular parts of the submatrix 𝐆 , from the first row and column to row and column 𝑁 (Tsyganova & Kulikova 2012). The partial derivative of the prior state mean update is computed by 𝜕 𝐱 𝜕 𝐱 𝜕 𝐊 𝜕 𝐞̅ 𝑘 |𝑘 𝑘 |𝑘 −1 𝑘 𝑘 (20) = + 𝐞̅ + 𝐊 𝑘 𝑘 𝜕 𝜃 𝜕 𝜃 𝜕 𝜃 𝜕 𝜃 𝑖 𝑖 𝑖 𝑖 with 1⁄2 𝜕 𝐞̅ 𝜕 𝐒 𝜕 𝐂 𝜕 𝐱 𝑘 ⁄ 𝐝 𝑘 |𝑘 −1 −1 2 𝑘 (21) = −𝐒 ( 𝐞 + 𝐱 + 𝐂 ) 𝑘 𝑘 |𝑘 −1 𝐝 𝜕 𝜃 𝜕 𝜃 𝜕 𝜃 𝜕 𝜃 𝑖 𝑖 𝑖 𝑖 The posterior state mean and square root factor are propagated forward in time by the state equation similar to (6) 𝐱 = 𝐀 𝐱 + 𝐁 (𝐮 + 𝛂 ∆ )− 𝐁 𝛂 (22) 𝑘 +1|𝑘 𝐝 𝑘 |𝑘 𝒌 𝑡 and by 𝐝𝟏 𝐝𝟎 1⁄2 1 2 𝐏 𝐀 𝑘 |𝑘 𝐏 TU 𝑘 +1|𝑘 (23) 𝐐 [ ] = [ ] 1 2 The partial derivative of the state equation (22) is 𝜕 𝐱 𝜕 𝐱 𝜕 𝐀 𝜕 𝐁 𝜕 𝐁 𝑘 +1|𝑘 𝑘 |𝑘 (24) = 𝐱 + 𝐀 + (𝐮 + 𝛂 ∆ )− 𝛂 𝑘 |𝑘 𝐝 𝑘 −1 𝑡 𝜕 𝜃 𝜕 𝜃 𝜕 𝜃 𝜕 𝜃 𝜕 𝜃 𝑖 𝑖 𝑖 𝑖 𝑖 where the partial derivatives of the state and input discrete matrices with respect to the continuous parameters are computed by (Mbalawata et al. 2013) 𝐀 0 𝐀 0 𝜕 𝐀 𝜕 𝐀 [ 𝐝 ] = exp [ ]∆𝑡 (24.a) 𝜕 𝜃 𝜕 𝜃 ⏟ ⏟ 𝑖 𝐀 ( 𝐌 ) 𝐈𝐌 −1 𝜕 𝐁 𝜕 𝐁 = 𝐀 (𝐀 − 𝐈 ) [ ] [ ] (24.b) 𝜕 𝜃 𝜕 𝜃 −1 −1 𝜕 𝐁 𝜕 𝐁 = 𝐀 (−𝐀 (𝐀 − 𝐈 )+ 𝐀 ∆ ) (24.c) [ ] [ ] 𝐌 𝐌 𝜕 𝜃 𝜕 𝜃 The partial derivative of the square root factor in the post-array (23) 1 2 𝜕 𝐏 𝑘 +1|𝑘 1⁄2 (25) = 𝓛 +𝓓 + 𝓤 𝐏 ( ) 𝑖 𝑖 𝑖 𝑘 +1|𝑘 𝜕 𝜃 TU requires to first multiply the orthogonal rotation matrix 𝐐 by the partial derivative of the pre-array (23) 1 2 𝜕 (𝐏 𝐀 ) 𝑘 |𝑘 𝜕 𝜃 TU 𝑖 (26) 𝐐 = [ ] 1⁄2 𝜕 𝚺 [ 𝜕 𝜃 ] where the matrices 𝓛 , 𝓓 and 𝓤 are respectively the lower triangular, diagonal, and upper triangular parts of the −T⁄2 matrix product 𝐀 𝐏 . 𝑘 |𝑘 The log-likelihood is recursively computed with 1 𝑁 𝑦 T⁄2 1⁄2 ln𝑝 (𝐲 |𝛉 )= − ln(2𝜋 )− ln(det 𝐒 𝐒 )− 𝐞 𝐞̅ (27) ∑ ( ) 1:𝑘 𝑘 𝑘 𝑘 𝑘 2 2 𝑘 =1 1⁄2 where the standardized innovations 𝐞̅ and the square-root of the innovation covariance matrix 𝐒 are computed respectively by (13) and (14). The gradient and the Hessian approximation of (27) are respectively obtained with 𝐈𝐌 𝐈𝐌 𝐝𝟏 𝐝𝟏 𝐈𝐌 𝐝𝟎 𝐝𝟎 𝐝𝟏 𝐝𝟎 𝑁 1 2 ( | ) ̅ 𝜕 ln𝑝 𝐲 𝛉 𝜕 𝐒 𝜕 𝐞 1:𝑘 −1⁄2 𝑘 (28) = −∑ tr(𝐒 ) + 𝐞̅ 𝜕 𝜃 𝜕 𝜃 𝜕 𝜃 𝑖 𝑖 𝑖 𝑘 =1 1⁄2 1⁄2 2 𝑇 𝜕 ln𝑝 (𝐲 |𝛉 ) 𝜕 𝐞̅ 𝜕 𝐞̅ 𝜕 𝐒 𝜕 𝐒 1:𝑘 𝑘 𝑘 −1⁄2 −1⁄2 𝑘 𝑘 − ≈ tr( ) + tr(𝐒 𝐒 ) (29) 𝑘 𝑘 𝜕 𝜃 𝜕 𝜃 𝜕 𝜃 𝜕 𝜃 𝜕 𝜃 𝜕 𝜃 𝑖 𝑗 𝑖 𝑗 𝑖 𝑗 𝑘 =1 It has been shown in this section that the parameter estimation problem is also a state estimation problem. For a linear Gaussian state-space model, the log-likelihood with its gradient and Hessian approximation can be computed by a square root version of the Kalman filter. This numerically stable strategy requires only to run the square root Kalman filer and 𝑁 sensitivity equations forward in time. The gradient and the Hessian of the log-posterior distribution are easily computed with ( | ) ( ) 𝜕 ln𝑝 𝛉 𝐲 𝜕 ln𝑝 𝛉 1:𝑘 (30) 𝐠 (𝛉 )= + 𝜕 𝛉 𝜕 𝛉 2 2 ( | ) ( ) 𝜕 ln𝑝 𝛉 𝐲 𝜕 ln𝑝 𝛉 1:𝑘 (31) 𝐇 (𝛉 )= − − 2 2 𝜕 𝛉 𝜕 𝛉 and are used in the MH algorithm to construct an efficient proposal distribution (Dahlin 2016) ∗ 𝑖 −1 ∗ 𝑖 −1 −1 𝑖 −1 𝑖 −1 −1 𝑖 −1 ̂ ̂ ( ) ( ) ( ) ( ) 𝑞 𝛉 |𝛉 = 𝒩 (𝛉 |𝛉 + 𝚺 𝐇 𝛉 𝐠 𝛉 ,𝚺 𝐇 𝛉 ) (32) where 𝚺 is a diagonal matrix which control the step length of the proposal distribution. Using the geometric information of the posterior distribution has the advantage of steering the Markov chain towards areas of high posterior probability (Nemeth 2014), which reduces the burn-in phase since the Markov chain takes larger steps when the Markov chain is far from the posterior mode and smaller steps as it gets closer. This approach allows to save user time and also computational time. Firstly, because the covariance matrix is given by the inverse of the Hessian approximation, thus only the step length has to be tuned. Secondly, because it increases the mixing of the Markov chain, so the MH algorithm needs less iterations (Dahlin 2016, Nemeth 2014). The proposal distribution (32) is based on Newton-type optimization; consequently, the suggested candidates 𝛉 are unconstrained and can violate the physical meaning of the system. A simple solution would be to reject candidates outside specified bounds but it could increase the autocorrelation of the Markov chain if too many candidates are rejected. A better solution is to reparametrize the model (Dahlin 2016). 3.2.2. Reparametrization of the model The idea is to use non-linear functions to transform a constrained problem into an unconstrained one. In this way, the proposal distribution cannot suggest candidates which are outside the bounds. The constrained parameters 𝛉 are transformed to unconstrained parameters 𝛈 by a one-to-one invertible functions 𝛈 = 𝐟 (𝛉 ). Two parametrizations are used: 1) the log transform 𝜂 = ln(𝜃 ) (33) 𝑖 𝑖 constraints 𝜃 between the open interval ]0,+∞[ and 2) the following transformation (Team Stan Development 2015) min 𝜃 − 𝜃 𝜂 = logit( ) (34) max min 𝜃 − 𝜃 𝑗 𝑗 min max constraints 𝜃 between the open interval ]𝜃 ,𝜃 [, where the logit function is 𝑗 𝑗 𝑗 logit(𝑧 )= ln( ) (34.a) 1− 𝑧 In the acceptance probability (8), the unnormalized posterior distribution is computed in the constrained space whereas the proposal distribution is evaluated in the unconstrained one. To homogenize the acceptance probability, the log-posterior distribution is transformed in the unconstrained space by using the Jacobian adjustment (Gelman, Carlin, et al. 2014) ln𝑝 (𝛈 |𝐲 )= ln𝑝 (𝛉 |𝐲 )+ ln|det(𝐉 )| (35) 1:𝑘 1:𝑘 | ()| | ( )| where det 𝐉 is the absolute value of the determinant of the Jacobian matrix 𝐉 ; det 𝐉 adjusts for the distortion −𝟏 caused by the non-linear transformation; 𝐉 is the Jacobian matrix of the inverse transform 𝛉 = 𝐟 (𝛈 ), such that 𝜕 𝜃 J = (35.a) 𝜕 𝜂 The Jacobian matrix is triangular if each transformed parameter only depends on a single untransformed parameter, which simplifies the determinant computation to the product of the diagonal elements. The gradient (30) and the Hessian (31) are with respect to 𝛉 , so they need to be multiplied by the Jacobian 𝐉 in order to be with respect to 𝛈 (chain rule). The partial derivative of the new term in (35) needs also to be taken into account. The gradient and the Hessian in the unconstrained space are obtained by 𝜕 ln|det(𝐉 )| 𝐠 (𝛈 )= 𝐉 𝐠 (𝛉 )+ (36) 𝜕 𝛈 𝜕 ln|det(𝐉 )| ̂ ̂ 𝐇 (𝛈 )= 𝐉 𝐇 (𝛉 )𝐉 + (37) 𝜕 𝛈 A problem arises with this type of reparametrization when 𝜃 gets close to a bound: its corresponding Jacobian ̂ ̂ ( ) ( ) ( ) term goes towards zero, so 𝐠 𝛈 and 𝐇 𝛈 are unreliable. Moreover, the Hessian estimate 𝐇 𝛈 could become ill- conditioned which prevents from its inversion. This issue is solved in optimization by adding a penalty function (Kristensen & Madsen 2003) which increases the gradient near the bounds, but this strategy is not suitable here because the proposal distribution (32) has a stochastic part, which means that candidates can still be projected towards the bounds. Instead, a prior distribution 𝑝 (𝛉 ) is used to assign a low probability near the bounds. The 𝑖𝑗 details of the MH algorithm with gradient and Hessian information for constrained parameters are given in Algorithm 1. Algorithm 1: Second order Metropolis-Hastings (Dahlin 2016) Inputs: 𝑁 (number of iterations), 𝛉 (initial parameters), 𝚺 (step length) 1:𝑁 Output: 𝛈 (samples from the posterior distribution) 0 0 ( ) 1. Transformation to the unconstrained parameter space 𝛈 = 𝐟 𝛉 with (33) and (34) 0 0 0 2. Compute: ln𝑝 (𝛈 |𝐲 ), 𝐠 (𝛈 ) and 𝐇 (𝛈 ) with (35), (36) and (37) 1:𝑘 3. 𝑟𝑓𝑜 𝑖 = 1 𝑡𝑜 𝑁 ∗ 𝑖 −1 −1 𝑖 −1 𝑖 −1 −1 𝑖 −1 ̂ ̂ 4. Suggest a new candidate 𝛈 ~ 𝒩 (𝛈 + 𝚺 𝐇 (𝛈 )𝐠 (𝛈 ),𝚺 𝐇 (𝛈 )) ∗ ∗ ∗ 5. Compute: ln𝑝 (𝛈 |𝐲 ), 𝐠 (𝛈 ) and 𝐇 (𝛈 ) with (35), (36) and (37) 1:𝑘 6. Compute the acceptance probability 𝛼 with (8) ( ) 7. Generate a uniform random variable 𝑢 ~ 𝒰 0,1 and set 8. 𝑖𝑓 𝑢 ≤ 𝛼 Accept the new candidate 𝑖 𝒊 𝒊 𝒊 ∗ ∗ ∗ ∗ ̂ ̂ ( ) ( ) ( ) ( | ) ( ) ( ) {𝛈 ,ln𝑝 𝛈 |𝐲 ,𝐠 𝛈 ,𝐇 𝛈 }← {𝛈 ,ln𝑝 𝛈 𝐲 ,𝐠 𝛈 ,𝐇 𝛈 } 1:𝑘 1:𝑘 10. 𝑒𝑒𝑙𝑠 Reject the new candidate 𝑖 𝒊 𝒊 𝒊 𝑖 −1 𝑖 −1 𝑖 −1 𝑖 −1 ̂ ̂ ( ) ( ) ( ) ( ) ( ) ( ) {𝛈 ,ln𝑝 𝛈 |𝐲 ,𝐠 𝛈 ,𝐇 𝛈 }← {𝛈 ,ln𝑝 𝛈 |𝐲 ,𝐠 𝛈 ,𝐇 𝛈 } 1:𝑘 1:𝑘 12. 𝑒𝑛𝑑 𝑖𝑓 13. 𝑒𝑛𝑑 𝑟𝑓𝑜 3.2.3. Choice of prior distribution The knowledge of possible parameter values before anything has been observed is represented probabilistically by ( ) the prior distribution 𝑝 𝛉 . A prior distribution, which is relevant with the experiment, the physical nature of the problem, or for other reasons, has to be specified by the user. Three categories of prior information are considered: non-informative, weakly informative and informative (see Gelman et al. 2014, for a complete discussion on the subject). Non-informative prior distributions attempt to not affect the posterior distribution, such that only the information in the data are relevant; this is the idea behind the ML estimation. But, these flat or almost flat prior distributions put more probability mass outside the expected range of values than inside, which can have unforeseen effect on the posterior distribution (Dahlin 2016), especially for small data set. Moreover, non-informative prior distributions, such as 𝒰 (−∞,+∞) may be improper (they do not integrate to one), thus they cannot be expressed as a probability density function. In some cases, proper posterior distribution can be obtained with an improper prior distribution, but the result must be interpreted with care (Gelman, Carlin, et al. 2014). Weakly informative prior distributions provide sufficient information to keep the parameters in a reasonable range and unlike informative prior distributions, they are not likely to outweigh the likelihood. For cases where the data set is too short or not enough informative, weakly informative prior distribution contains enough information to regularize the posterior distribution and prevent from identifiability issues; the curvature around the expected solution is increased (Team Stan Development 2015). For the parameters transformed with the logarithm (33), the prior distributions are 𝜃 ~ 𝒢 (𝑎 ,𝑏 ), where 𝒢 (𝑎 ,𝑏 ) denotes a Gamma distribution with shape 𝑎 and expected value 𝑏 . The hyper-parameters 𝑎 and 𝑏 are chosen such that the probability near zero is low and that the distribution covers the expected range of values (Figure 3). For the transformed parameter with lower and upper bounds, the prior distributions are min max min max 𝜃 ~ 𝛽 (2,2,𝜃 ,𝜃 ), where 𝛽 (𝑎 ,𝑏 ,𝜃 ,𝜃 ) is a Beta distribution with shape hyper-parameters 𝑎 and 𝑏 , 𝑗 𝑗 𝑗 𝑗 𝑗 min max lower and upper bounds 𝜃 and 𝜃 . The Beta distribution with 𝑎 = 𝑏 = 2 is symmetric and assigns low 𝑗 𝑗 probabilities for values near the bounds (Figure 3). −4 −1 Figure 3: Prior pdf, 𝒢 (2,0.03) in blue and 𝛽 (2 ,2,10 ,2∙ 10 ) in red 3.2.4. Tuning the algorithm The choice of the prior distribution is an important decision which can strongly influence the posterior distribution (Dahlin 2016) but the exploration of the posterior distribution depends on the tuning of the algorithm. The use of the Hessian approximation reduces the tuning of the proposal distribution to the choice of the step length matrix 𝚺 (32). Separate step lengths for each parameter can be used, but to simplify the tuning, when a single step length is used, such that 𝚺 = ε𝐈 . The step length affects directly the acceptance rate of the MH algorithm (the percentage of accepted candidates at stationarity). A too large 𝚺 produces broad jumps which are more likely to be rejected, which increases the autocorrelation of the Markov chains and give a low acceptance rate. On the contrary, if the step length is too small, short jumps are likely to be accepted, which gives a higher acceptance rate. However, it limits the exploration of the posterior distribution to a small neighborhood, which also increases the autocorrelation. Consequently, the acceptance rate alone is not a correct indicator of the algorithm performance. A better solution is to look at the mixing of the Markov chains at stationarity, which can be quantified by the integrated autocorrelation time (IACT) (Dahlin 2016) 𝑁 :𝑁 𝑁 :𝑁 𝑏 𝑏 IACT(𝜃 ) = 1+ 2 𝜌̂ (𝜃 ) (38) 𝑗 𝑗 𝑙 =1 𝑁 :𝑁 𝑁 :𝑁 𝑏 𝑏 where 𝜌 denotes the autocorrelation coefficient at lag 𝑙 of 𝜃 , and 𝜃 is the Markov chain of 𝜃 from the 𝑙 𝑗 𝑗 𝑗 burn-in time 𝑁 to the last iteration 𝑁 . The number of lags 𝐿 is determined as the first index for which 𝑁 :𝑁 |𝜌̂ (𝜃 )| < 2 √𝑁 − 𝑁 , when the autocorrelation coefficient becomes statistically insignificant. 𝑙 𝑏 The IACT represents the number of iterations between two uncorrelated samples; therefore, the step length should be chosen such that it minimizes the IACT. The number of iteration 𝑁 should be chosen sufficiently large, such that, once the burn-in phase has been removed, the number of samples left are sufficient to represent the posterior distribution. Tools for diagnosing convergence are discussed in the next section. 3.2.5. Convergence diagnosis The procedure of Gelman et al. (2014) is used here and presented briefly. The procedure consists of simulating 𝑀 Markov chains of 𝑁 samples, where the starting points of the 𝑀 Markov chains are randomly sampled from the prior distributions. The first step consists of inspecting visually the trace plots of the different Markov chains to determine the burn-in time 𝑁 and to check if they converge to the same posterior distributions. The 𝑀 Markov chains, with the burn-in phase removed, are split in two to give 2𝑀 chains of length (𝑁 − 𝑁 )⁄2; then the variations between and within the 2𝑀 chains are compared (Gelman, Carlin, et al. 2014). The stationarity implies that the first and the second half of each sequence come from the same distribution. A good mixing requires that the variance inside chains should be closed to the variance between chains; this is quantified by the potential scale reduction 𝑅 (see Gelman et al. 2014 for computational details). The number of iterations 𝑁 should be 𝑁𝑝 ̂ ̂ increased until 𝑅 is near one or at least 𝑅 < 1.1. The mixing of the Markov chains can also be quantified by the effective sample size (ESS) which approximates the number of independent samples in the 2𝑀 sequences. Gelman et al. (2014) suggest that the ESS should be at least superior to 5 × 2𝑀 . If the aforementioned criteria are satisfied, the samples from the 2𝑀 sequences can be used to estimate the posterior distribution. 3.3. Model validation After having calibrated different models, how to assess their reliability? The purpose of a model is to reproduce an input-output relationship; an intuitive way to start is by looking at what the model is not able to reproduce, the residuals 𝐞̅ (13) (Ljung 2002). A plot of the residuals and the data allows to understand which features are not properly described by the model and who might be responsible for. The noise terms in model (6) are assumed to be white noise sequences which implies that it should also be the case for the residuals. The white noise sequence is uncorrelated, normally distributed with a zero mean and is uniformly distributed on all frequencies (Madsen 2007). These properties are assessed by plotting the autocorrelation function (ACF) and the cumulated periodogram (CP) with their respective 95% confidence intervals. Furthermore, the residuals should be independent of the past inputs, which is tested by plotting the cross-correlation function (CCF). The reliability of the model is also tested on a data set which has not been used for the calibration (validation data set). New values for the inputs are introduced in the model and the simulated output is compared to the measured one. If the identification data set is informative enough, i.e. the different dynamics of the system are observable in the data, the model should be representative of the system and therefore the simulation should be close to the measurement. The model validation assesses the reliability of a model and gives insight on the model order selection and directions for improvement; but how to select the best model? 3.4. Model comparison In section 3.1, the selection of a model structure based on insights of the experiment has been presented. This section discusses the agreement of calibrated models with measured data and how to compare different models in order to select the most appropriate one. The model fit to the data is summarized by the log-likelihood ln𝑝 (𝐲 |𝛉 ) 1:𝑘 (27) and the prior distribution is not relevant for assessing the accuracy of a model. The best model is not necessarily characterized by the highest log-likelihood value because as the complexity of a model increases, the number of degrees of freedom increases, the parameters adjust themselves to fit a particular realization of the noise (overfitting) (Ljung 2002). In order to adjust for overfitting, the Akaike’s Information Criterion (AIC) and Bayesian Information Criterion (BIC) penalize the log-likelihood in function of the complexity of the model: AIC = −2ln𝑝 (𝐲 |𝛉 )+ 2𝑁 (39) 1:𝑘 𝑝 ( | ) BIC = −2ln𝑝 𝐲 𝛉 + 𝑁 ln𝑁 (40) 1:𝑘 𝑝 𝑠 where 𝛉 is the ML estimate, 𝑁 the number of parameters and 𝑁 the sample size. 𝑝 𝑠 The smallest AIC or BIC between different models indicates the most appropriate model. For nested models, like ℳ and ℳ , the likelihood ratio test (LRT) can be used (Bacher & Madsen 2011) 3 4 ℳ ℳ 3 4 (41) LRT = −2(ln𝑝 (𝐲 |𝛉 )− ln𝑝 (𝐲 |𝛉 )) 1:𝑘 1:𝑘 ℳ ℳ 3 4 with 𝛉 and 𝛉 the ML estimate of model ℳ and ℳ . 3 4 ℳ ℳ 2 4 3 As the number of samples 𝑁 goes to infinity, the LRT converges to 𝜒 distributed variable with (𝑁 − 𝑁 ) 𝑠 𝑝 𝑝 degrees of freedom. Usually, a 𝑝 of the LRT below 0.05, indicates that the improvement of the larger model 𝑒𝑙𝑢𝑎𝑉 ℳ over ℳ is significant and consequently the model ℳ should be preferred. 4 3 4 These criteria are based on point estimate 𝛉 and not on the posterior distribution 𝑝 (𝛉 |𝐲 ); a more Bayesian 1:𝑘 criterion is given by the Watanabe-Akaike Information Criterio (WAIC) (Gelman, Hwang, et al. 2014) ( ( | )) ( ( | )) (42) WAIC = −2∑ mean ln𝑝 𝐲 𝛉 − var ln𝑝 𝐲 𝛉 𝑘 𝑘 𝑘 =1 (𝑁 −𝑁 ) x 𝑁 𝑏 𝑠 where ln𝑝 (𝐲 |𝛉 )∈ ℝ is the log-likelihood at time instant 𝑘 and (𝑁 − 𝑁 ) is the number of sample 𝑘 𝑏 used to approximate the posterior distribution ln𝑝 (𝛉 |𝐲 ). Differently of the AIC and BIC, the WAIC is penalized 1:𝑘 by the dispersion of the log-likelihood. Evaluating these criteria on same data set used for the calibration introduced a bias in the model selection process and therefore it is advised to evaluate them with the validation data set instead; this point is illustrated in the next section for a real test case. 4. Application to the Twin houses experiment 4.1. Model comparison The capabilities of the second order MH (Algorithm 1) are now tested on the twin houses experiment presented in section 2. The purpose is to calibrate the models ℳ and ℳ (Figure 2) where the south zone temperature (green 3 4 zone Figure 1) is the output and two boundary conditions are considered, the outdoor temperature 𝑇 (°C) and the 𝐌𝐋 𝐌𝐋 𝐌𝐋 𝐌𝐋 𝐌𝐋 𝐌𝐋 𝐌𝐋 𝐌𝐋 north zone temperature 𝑇 (°C). The vector of unknown parameters 𝛉 for ℳ and ℳ are respectively given in 𝑛 3 4 2 2 2 Table 1 and Table 2. The process noise covariance is defined by 𝚺 (𝛉 )= diag(𝜎 𝜎 𝜎 ) and the 𝐰 𝑤 𝑤 𝑤 11 22 33 ( ) measurement noise variance by Σ θ = 𝜎 . The standard deviation 𝜎 of the state 𝑥 in ℳ has been fixed to v 𝑣 𝑤 𝑠 4 −6 10 instead of putting an informative prior distribution. This problem has been investigated by different authors who used relatively similar optimization strategies but different model structures. De Coninck et al. (2015) and Rehab & André (2015) used second order thermal models but with different structure and inputs. Himpe & Janssens (2015) used a model with four states which was correlated with the HVAC system and the solar radiations. They improved their model by scaling the system noise with respect to the heater and solar radiation signal. A validation data set has not been used to demonstrate the effectiveness of the model; only the improvement of the residuals is shown. With a zero order-hold assumption, the model is not able to understand the fast dynamic of the heating signal. Indeed, the time response of the electric heaters is estimated between 1 and 2 minutes, which is faster than the sampling time of the data (10 minutes). This issue is solved by considering that the inputs vary linearly between two samples (first order-hold). Around 24 days of data are used, where the first 14 days are the identification data set (ROLBS sequence, Figure 4) and the last 10 days are the validation data set (Figure 5); the detail of the inputs and outputs is given in section Figure 4: Identification data set, output, standardized residuals of ℳ (blue) and ℳ (orange), and inputs 3 4 Figure 5: Validation data set A unique step length in the proposal distribution has been used with 𝚺 = ε𝐈 , where ε = 0.3 has been selected such that it minimizes the IACT. This tuning gives an acceptance rate of approximately 30% for ℳ and 25% for ℳ . The diagnosis procedure presented in section 3.2.5 has been applied with 𝑀 = 6 Markov chains with initial parameter values randomly sampled from their respective prior distributions (Table 1 and Table 2); this is illustrated by the trace plot of the first thousand iterations in Figure 6. The first 500 samples of 𝑁 = 5500 are discarded as burn-in for the model ℳ whereas the first 1500 samples of 𝑁 = 6500 are discarded for ℳ . 3 4 The chains at stationarity for both models are split in two to give 12 chains of 2500 samples which are used to quantify the mixing of the MH algorithm. The results are summarized in Table 1 and Table 1, with the worst values highlighted in red and the best in green. The worst potential scale reductions 𝑅 are below the threshold of 1.1 and the worst effective samples sizes (ESS) are easily above 5 × 12. Consequently the 12× 2500 samples can be used to approximate the posterior distributions of the parameters. The posterior distributions of the 6 simulated Markov chains are represented in Figure 7 for ℳ and in Figure 8 for ℳ with different colors; the black line is 3 4 the global approximation of the posterior distributions using all samples. Table 1: Prior distributions, posterior modes and diagnosis tests (worst: red and best: green) of ℳ ̂ ̂ ESS Prior distributions Posterior modes 𝑅 Min/Max IACT −4 −1 −2 3 ( ) 𝑅 𝛽 2,2,10 ,2∙ 10 5.53∙ 10 1.0100 1.22∙ 10 18.86 29.47 −4 −1 −3 3 𝑅 𝛽 (2,2,10 ,2∙ 10 ) 2.09∙ 10 𝟏 .𝟎𝟎𝟑𝟗 1.29 ∙ 10 18.25 29.50 −4 −1 −3 3 𝑅 𝛽 (2,2,10 ,10 ) 2.31∙ 10 1.0045 1.25∙ 10 17.81 30.81 −4 −1 −3 3 𝑅 𝛽 (2,2,10 ,10 ) 4.98 ∙ 10 1.0086 1.16∙ 10 18.79 28.16 8 −3 −1 −2 3 ⁄ ( ) 𝐶 10 𝛽 2,2,10 ,5 ∙ 10 3.12∙ 10 1.0049 1.31∙ 10 19.46 27.42 8 −4 −1 −3 3 𝐶 ⁄10 𝛽 (2,2,10 ,10 ) 6.73∙ 10 1.0080 1.28∙ 10 17.49 32.01 8 −2 −1 𝟑 𝐶 ⁄10 𝛽 (2,2,10 ,5) 1.38∙ 10 1.0061 𝟏 .𝟒𝟗 ∙ 𝟏𝟎 17.18 24.58 −1 3 𝑎 𝛽 (2,2,10 ,5) 1.06 1.0132 1.19 ∙ 10 18.90 24.12 −1 3 𝑎 𝛽 (2,2,10 ,5) 1.24 1.0143 1.39 ∙ 10 18.40 𝟐𝟐 .𝟗𝟗 −1 𝜎 𝒢 (2,0.03) 1.02∙ 10 1.0065 1.26∙ 10 19.06 30.88 −2 𝜎 𝒢 (2,0.03) 1.38∙ 10 𝟏 .𝟎𝟐𝟐𝟗 𝟓 .𝟐𝟖 ∙ 𝟏𝟎 𝟑𝟔 .𝟐𝟗 𝟓𝟒 .𝟗𝟎 −2 𝜎 𝒢 (2,0.03) 1.82∙ 10 1.0084 1.15∙ 10 22.12 33.57 −2 2 𝜎 𝒢 (2,0.03) 1.69 ∙ 10 1.0179 6.26∙ 10 29.39 50.03 𝑥 𝛽 (2,2,15,45) 29.20 1.0146 1.15∙ 10 𝟏𝟔 .𝟒𝟑 24.49 𝑥 𝛽 (2,2,15,45) 29.45 1.0099 1.33∙ 10 18.09 27.53 Table 2: Prior distributions, posterior modes and diagnosis tests (worst: red and best: green) of ℳ ̂ ̂ Prior distributions Posterior modes ESS 𝑅 Min/Max IACT −4 −1 −2 3 ( ) [ ] 𝑅 𝛽 2,2,10 ,10 4.93 ∙ 10 1.0073 1.06∙ 10 24.17 41.44 −4 −1 −3 2 ( ) 1.0127 [ ] 𝑅 𝛽 2,2,10 ,10 1.83∙ 10 7.26∙ 10 30.34 58.48 −4 −1 −3 2 𝛽 (2,2,10 ,10 ) 2.13∙ 10 1.0136 6.04∙ 10 [ ] 𝑅 34.27 66.75 −3 −2 −3 2 𝛽 (2,2,10 ,5 ∙ 10 ) 5.35∙ 10 1.0099 7.41∙ 10 [ ] 𝑅 31.15 53.36 −4 −1 −3 2 𝛽 (2,2,10 ,10 ) 5.27∙ 10 1.0094 9.89 ∙ 10 [ ] 𝑅 24.93 35.84 𝑁𝑝 8 −3 −1 −2 2 𝐶 ⁄10 𝛽 (2,2,10 ,5 ∙ 10 ) 2.52∙ 10 1.0176 5.23∙ 10 [37.62 69.45] 8 −4 −2 −3 2 [ ] 𝐶 ⁄10 𝛽 (2,2,10 ,10 ) 5.27∙ 10 1.0182 5.69 ∙ 10 35.82 89.49 8 −2 −1 3 ⁄ ( ) 1.0083 [ ] 𝐶 10 𝛽 2,2,10 ,5 1.46∙ 10 1.13∙ 10 23.32 32.37 8 −5 −3 −5 2 𝐶 ⁄10 𝛽 (2,2,10 ,10 ) 5.81∙ 10 1.0090 6.15∙ 10 [ ] 37.72 83.21 −1 −1 2 𝛽 (2,2,10 ,5) 9.07∙ 10 1.0181 6.33∙ 10 [ ] 𝑎 36.30 69.06 −1 𝛽 (2,2,10 ,5) 1.17 1.0126 𝟏 .𝟏𝟔 ∙ 𝟏𝟎 [ ] 𝑎 22.65 𝟐𝟖 .𝟒𝟏 −2 2 𝜎 𝒢 (2,0.03) 9.87 ∙ 10 1.0095 9.69 ∙ 10 [ ] 24.21 41.21 −2 𝜎 𝒢 (2,0.03) 2.61∙ 10 1.0248 𝟒 .𝟎𝟖 ∙ 𝟏𝟎 [𝟒𝟑 .𝟗𝟐 86.54] −2 2 𝜎 𝒢 (2,0.03) 3.18∙ 10 𝟏 .𝟎𝟐𝟖𝟓 4.35∙ 10 [41.75 𝟏𝟏𝟎 .𝟓𝟒 ] −2 2 𝜎 𝒢 (2,0.03) 1.15∙ 10 1.0233 4.31∙ 10 [42.91 81.03] 𝑥 𝛽 (2,2,15,45) 28.83 1.0124 1.02∙ 10 [27.08 35.02] 𝑥 𝛽 (2,2,15,45) 28.31 𝟏 .𝟎𝟎𝟕𝟎 1.08∙ 10 [24.45 34.56] 𝑥 𝛽 (2,2,15,45) 29.64 1.0099 9.49 ∙ 10 [𝟐𝟏 .𝟗𝟓 44.21] Figure 6: Trace plot of the 6 Markov chains in the first thousand iterations (ℳ ) Figure 7: Posterior distribution of ℳ from the 6 Markov chains Figure 8: Posterior distribution of ℳ from the 6 Markov chains The reliability of the calibrated models is tested by residual analysis. The least correlated standardized residuals 𝐞 (13) are shown in Figure 4, the ACF, CCF and CP of models ℳ and ℳ are shown respectively in 3 4 Figure 9 and in Figure 10. The red lines delimit the 95% confidence intervals and the lags are the number sample shifts between the two signals. Hence, to validate the hypothesis, 5% of the lags must not cross these limits. For both models, the inputs are not correlated with the standardized residuals which means that the models are able to explain all input-output relationships. However, the white noise hypothesis is rejected for the model ℳ ; the highest standardized residual values (Figure 4) are correlated with the switches of the heating signal and the solar radiations. Figure 9: ACF, CP and CCF of the standardized residuals, ℳ Figure 10: ACF, CP and CCF of the standardized residuals, ℳ 4 The reliability of the model is also tested by comparing the measured south zone temperature of the validation data set with the simulated output. A clear advantage of Bayesian estimation is that it is possible to simulate directly from the posterior distribution, which gives a simulated output with all the uncertainties. This is very useful for model predictive control; the weather forecast is introduced in the estimated model to predict the indoor temperature. Afterwards, the trade-off between comfort and energy saving is chosen by taking either the lowest temperature prediction, such that the HVAC system is sure to maintain the comfort or by taking the highest predicted temperature such that the HVAC system uses the minimal amount of energy. The simulation from the posterior distribution is plotted in Figure 11. The measured south zone temperature is always included in the simulated output distribution for both models, but the dispersion for the model ℳ is more important. In order to select the most appropriate model, the performances of both models (section 3.4) are summarized in Table 3. For the identification data set, the model ℳ should be accepted against ℳ ; the AIC, BIC and WAIC 4 3 are smaller for ℳ than for ℳ , and the 𝑝 of the LRT confirms this choice. However, the criteria evaluated 4 3 𝑒𝑙𝑢𝑎𝑉 on the validation data set indicate that the model ℳ should be preferred; the log-likelihood of the model ℳ is 3 3 higher and less dispersed than the log-likelihood of the model ℳ , as shown in Figure 11. Table 3: Performance criteria Identification data set Validation data set ℳ ℳ ℳ ℳ 3 4 3 4 3 3 3 3 AIC −7.59 ∙ 10 −8.02∙ 10 −5.26∙ 10 −5.21∙ 10 3 3 3 3 BIC −7.51∙ 10 −7.91∙ 10 −5.19∙ 10 −5.11∙ 10 LRT 𝑝 0 1 𝑒𝑙𝑢𝑎𝑉 3 3 3 3 WAIC −7.58∙ 10 −8.01∙ 10 −5.27∙ 10 −5.23∙ 10 Figure 11: Left: measured (black) and simulated south zone temperature with the validation data set; right: log- likelihoods for the validation data set, (ℳ : blue, ℳ : orange) 3 4 The performance gap between both models is significant for the identification data set in comparison to the validation data set. In the identification data set, the ROLBS introduces an unconventional dynamic which is not representative of the intended use of the building. A more complex model is required to fit the fast variations of the south zone air temperature, but these fast variations are not present in a conventional use and therefore the extra complexity of the model ℳ is not required. As mentioned in section 3.4, using the same information from the data for calibration and for selection may be misleading (Gelman, Hwang, et al. 2014). In this case, it is also illustrated by the whiteness improvement of the residuals for the model ℳ . It can be concluded that the model ℳ is more representative of the south zone and consequently, only the model ℳ is considered in the following 3 3 of the paper. 4.2. Physical interpretation of the results The posterior distributions are compared against the building characteristics which are available in Strachan et al. (2016). The envelope thermal resistance is given by −1 −2 (43) 𝑅 = (∑ 𝑈 𝑆 + 𝑈 𝑆 ) = 4.47∙ 10 K/W 𝑤 𝑗 𝑗 𝑤𝑖 𝑤𝑖 𝑗 =1 where 𝑈 and 𝑆 are the U-values and surfaces of the different walls (south, east and west) and 𝑈 and 𝑆 are the 𝑗 𝑗 𝑤𝑖 𝑤𝑖 U-value and surfaces of the windows. The envelope thermal resistance is estimated with the posterior distributions of 𝑅 and 𝑅 such that 𝑜 𝑖 −2 −2 𝑅 = 𝑅 + 𝑅 ∈ [ ] (43.a) 𝑤 𝑜 𝑖 4.11∙ 10 9.38∙ 10 The resistance 𝑅 belongs to the estimated interval (43.a), but this interval is large which means that the uncertainties are important. The thermal resistance between the south and the north zone is given by −1 −2 ( ) 𝑅 = 𝑈 𝑆 + 𝑈 𝑆 + 3𝑈 𝑆 = 1.53∙ 10 K/W (44) 𝑧 𝑧 1 𝑧 1 𝑧 2 𝑧 2 𝑟𝑜𝑑𝑜 𝑟𝑜𝑑𝑜 where 𝑈 and 𝑆 are the U-value and the surface of the north wall of the living room, 𝑈 and 𝑆 are the U- 𝑧 1 𝑧 1 𝑧 2 𝑧 2 value and the surface of the north wall of the bathroom and the corridor, and 𝑈 and 𝑆 are the U-value and 𝑟𝑜𝑑𝑜 𝑜𝑑𝑜 𝑟 the surface of the doors. −3 −3 The estimated range from the posterior distribution of 𝑅 is [ ] which is far smaller than 4.59 ∙ 10 5.52∙ 10 the value computed in (44). This gap could mean that the infiltration between the two zones are significant. The envelope thermal capacity is defined by 3 𝐶 = ∑ 𝐶 𝑆 = 1.30∙ 10 J/K (45) 𝑤 𝑗 𝑗 𝑗 =1 with 𝐶 the heat capacity. The thermal capacity of the windows is negligible as compared to 𝐶 . 𝑗 𝑤 6 6 The estimated posterior distribution of 𝐶 covers the following range of values [ ], which is 𝑤 2.21∙ 10 4.47∙ 10 more than half less than the expected value (45). The thermal capacity of the medium 𝐶 consists of the inner walls of the south zone but also of parts of the ceiling and ground floor, such as 𝐶 = 𝐶 + 𝐶 + 𝐶 + 𝐶 = 8.00∙ 10 J/K (46) 𝑚 𝑖 𝑤 𝑖 𝑤 𝑑𝑛𝑢𝑟𝑜𝑔 𝑖𝑙𝑒𝑔𝑖𝑐𝑛 1 2 where the subscripts 𝑖𝑤 and 𝑖𝑤 denote respectively the east wall of the living room and the other light walls. 1 2 7 7 The estimated range for 𝐶 is [ ] which is the expected order of magnitude since (46) is 1.22∙ 10 1.54∙ 10 overestimated by taking into account all the volume of the ground floor and the ceiling. The thermal capacity of the indoor air is simply given by 𝐶 = 𝜌 𝑐 𝑉 = 1.79 ∙ 10 J/K (47) 𝑖 𝑎 𝑎 with 𝑐 and 𝜌 the specific heat and the density of the air, and 𝑉 the volume of the south zone. 𝑎 𝑎 In this case as well, the estimated posterior distribution of 𝐶 is consistent with the order of magnitude of the 5 5 expected value, where 𝐶 ∈ [ ]. 6.60∙ 10 6.90 ∙ 10 Determining prior knowledge on the convective resistance 𝑅 between the indoor air and the medium is not an easy task and it is not of main interest in this study. The parameters 𝑎 and 𝑎 are interpreted as effective areas 𝐼 𝑊 because 𝑄 is measured on a horizontal surface. Nevertheless, it is interesting to see that 𝑎 is superior to 𝑎 𝑔 ℎ 𝐼 𝑊 which shows the importance of direct solar radiations into the south zone. The time constants 𝛕 of the continuous system (4) are computed by (48) 𝛕 = − where 𝛌 are the eigenvalues of the state matrix and 𝛕 ,𝛌 ∈ ℝ ,. The time constants of the model ℳ , given in Table 4, are consistent in the range of the fast dynamics of the air and the slow accumulation of energy in the medium. The performances of the second-order MH are compared to a ML estimation in the next section. Furthermore, the regularization effect of the prior distribution is illustrated by identifiability analysis. Table 4: Time constants of model ℳ Time constant [hours] −1 −1 𝜏 [ ] 1.57∙ 10 1.66∙ 10 𝜏 [2.19 3.75] [ ] 𝜏 25.84 34.34 4.3. Performance comparison with maximum likelihood estimation Table 5: ML estimation with the same random initial parameters as the MH algorithm, the values in bold represent the parameters closed to their boundaries MLE 1 MLE 2 MLE 3 MLE 4 MLE 5 MLE 6 −𝟏 −𝟒 −𝟏 −𝟏 −𝟏 −𝟏 𝑅 𝟏 .𝟗𝟖 ∙ 𝟏𝟎 𝟏 .𝟎𝟐 ∙ 𝟏𝟎 𝟏 .𝟗𝟖 ∙ 𝟏𝟎 𝟏 .𝟗𝟖 ∙ 𝟏𝟎 𝟏 .𝟔𝟐 ∙ 𝟏𝟎 𝟏 .𝟗𝟖 ∙ 𝟏𝟎 −3 −2 −3 −3 −3 −3 𝑅 1.40∙ 10 4.78 ∙ 10 1.40∙ 10 1.18∙ 10 1.35∙ 10 1.42∙ 10 −2 −3 −2 −3 −3 −2 𝑅 2.24∙ 10 1.12∙ 10 2.19 ∙ 10 3.32∙ 10 2.91 ∙ 10 2.08∙ 10 −3 −3 −3 −3 −3 −𝟐 𝑅 2.94 ∙ 10 5.76 ∙ 10 2.92 ∙ 10 4.00∙ 10 𝟖 .𝟔𝟏 ∙ 𝟏𝟎 2.87 ∙ 10 8 −2 −𝟑 −2 −1 −2 −2 𝐶 ⁄10 6.50 ∙ 10 𝟏 .𝟎𝟏 ∙ 𝟏𝟎 6.46∙ 10 1.19 ∙ 10 6.94 ∙ 10 6.39 ∙ 10 8 −3 −3 −3 −3 −3 −3 𝐶 ⁄10 6.75 ∙ 10 6.86 ∙ 10 6.76 ∙ 10 6.72∙ 10 6.74∙ 10 6.81∙ 10 8 −1 −1 −1 −𝟐 −1 𝐶 ⁄10 3.03∙ 10 1.24∙ 10 3.12∙ 10 𝟏 .𝟑𝟑 ∙ 𝟏𝟎 2.3 3.29 ∙ 10 𝑎 1.01 3.15 1.02 1.74 1.08 1.03 𝑎 1.23 1.27 1.24 1.27 1.23 1.25 −2 −2 −2 −2 −2 −2 5.83 ∙ 10 1.14∙ 10 5.50∙ 10 8.71∙ 10 5.09 ∙ 10 4.67 ∙ 10 −5 −𝟖 −2 −2 −5 −2 3.28 ∙ 10 𝟏 .𝟏𝟓 ∙ 𝟏𝟎 1.30∙ 10 1.32∙ 10 2.90 ∙ 10 2.51∙ 10 −1 −2 −1 −5 −2 −1 6.45 ∙ 10 6.58 ∙ 10 6.41∙ 10 1.57 ∙ 10 9.79 ∙ 10 6.38∙ 10 −2 −2 −2 −2 −2 −𝟖 𝜎 1.90 ∙ 10 1.70∙ 10 1.63∙ 10 1.63∙ 10 1.90 ∙ 10 𝟏 .𝟑𝟖 ∙ 𝟏𝟎 𝑥 29.60 36.68 29.59 29.54 29.70 29.60 𝑥 𝟒𝟒 .𝟒𝟕 29.61 𝟒𝟒 .𝟓𝟎 29.95 25.00 𝟒𝟒 .𝟓𝟕 The performance of the second-order MH algorithm is now tested against a ML estimation with a quasi-Newton optimization. The log-likelihood and its gradient are supplied to the unconstrained MATLAB’s function −8 fminunc (optimality tolerance and step tolerance fixed to 10 ) which use a BFGS approximation of the Hessian. The same parameter transformations are used except for the standard deviations; they are bounded −8 between 10 and 5 because it has been observed that with the logarithm transformation (33), the standard deviations can be too close to zero which cause numerical instabilities. The penalty function given by Kristensen & Madsen (2003) is used to repulse the parameters near the bounds. The ML estimation is repeated 6 times with the same initial parameters as for MH algorithm; the ML parameter estimates are given in Table 5. The results are highly dependent on the initial conditions for most of the parameters; some of them are different of several orders of magnitude. It can be concluded in this case that the second-order MH algorithm has a better global convergence than the ML optimization. https://fr.mathworks.com/help/optim/ug/fminunc.html A closer look is given to the parameter 𝜎 which seems to be unidentifiable compared to the parameter 𝐶 . The 𝑤 𝑖 profiles of the log-likelihood and log-posterior are plotted in Figure 12. These profiles are obtained by maximizing the log-likelihood and the log-posterior with respect to all parameters except 𝜎 and 𝐶 , which are 𝑤 𝑖 fixed for each optimization to different values (x-axis, Figure 12). Confidence intervals for the profile likelihood can be computed by (Madsen & Thyregod 2012) ( | ) ( | ) ( ) (49) ln𝑝 𝐲 𝛉 − ln𝑝 𝐲 𝛉 > − 𝜒 𝑝 1:𝑘 ~𝑖 1:𝑘 1−𝛼 where ln𝑝 (𝐲 |𝛉 ) is the maximum log-likelihood with respect to all parameters except 𝜃 , ln𝑝 (𝐲 |𝛉 ) is the 1:𝑘 ~𝑖 𝑖 1:𝑘 ( ) maximum log-likelihood and 𝜒 𝑝 is a chi squared distribution with 𝑝 degrees of freedom and confidence 1−𝛼 level 𝛼 . ( ) The 95% confidence intervals given by − 𝜒 1 = −1.92 are represented in Figure 12 by the dotted black 0.95 lines. The profile of the log-posterior is similarly computed by ln𝑝 (𝛉 |𝐲 )− ln𝑝 (𝛉 |𝐲 ). These profiles are ~𝑖 1:𝑘 1:𝑘 computed around the posterior modes given in Table 1; the idea is to visualize the quantity of information given by the data and the regularization brought by the prior distribution. The profile of the log-likelihood of 𝜎 is asymmetric and almost flat around its maximum, which means that any value on the flat region has negligible effects on the log-likelihood; it explains the dispersion of the ML estimation in Table 5. The profile of the log- posterior shows how the prior distribution (𝒢 (2,0.03), Figure 3) increases the curvature, especially towards zero, and regularizes the identifiability problem. For the parameter 𝐶 , the information from the data overweight the prior distribution which means that the prior distribution has only a small effect on the posterior distribution. Figure 12: Profile log-likelihood (black) and profile log-posterior (blue). The dashed lines represent the 95% confidence intervals of the log-likelihood and the red dots the respective maximums of the curves 5. Conclusion The estimation of building energy demand and building energy performance are possible through experimental calibration of dynamic thermal models. Making decisions or predictions from the calibrated model requires to take into account all the uncertainties of the estimates; Bayesian calibration fits this purpose by estimating the posterior distributions of the parameters. This paper compares the three phases of an experimental calibration (selection, calibration and validation), from a Bayesian and a frequentist point of view. More specifically, proposed improvements on the Metropolis- Hastings algorithm, using gradient and Hessian information (second-order Metropolis-Hastings) are presented. It is shown that the gradient of linear and Gaussian model can be computed exactly by a robust square root version of the Kalman filter, and a Hessian estimate is proposed with low extra computational burden. A combination of change of variable and prior distribution is also proposed, which allows to constrain the parameters in a physical range. These improvements on the Metropolis-Hastings facilitate considerably the tuning of the algorithm: only a step length and the prior distributions have to be specified. Two models of respectively 15 and 18 unknown parameters have been easily calibrated with the improved Metropolis-Hastings algorithms where a unique step-length has been used, which illustrates the gain of this method over a classical Metropolis-Hastings with random walk. Furthermore, it is shown that the second-order Metropolis- Hastings algorithm has a better robustness against the initial conditions than a maximum likelihood estimation with a quasi-Newton algorithm, and it is illustrated through an identifiability analysis, that the prior distributions act as regularization when the data are not informative enough. It is highlighted that model selection criteria should be computed on a different data set than the one used for the calibration to avoid a biased selection. Indeed, in this experiment the unconventional excitation generated by the heaters implies that a more complex model should be selected, but this extra complexity is not required for a more conventional use of the HVAC system. 1. Acknowledgements This work was financially supported by BPI France in the FUI Project COMETE. Thanks you to Dr. Paul Strachan for making the Twin houses data available online; to the Dynastee network, particularly to Henrik Madsen, Peder Bacher and Rune Juhl for their statistical guidelines; as well as Dr. Johan Dahlin and Dr. Maria Kulikova for their time and expertise. 2. References Andersen, P.D. et al., 2014. Characterization of heat dynamics of an arctic low-energy house with floor heating. Building Simulation, 7(6), pp.595–614. Bacher, P. & Madsen, H., 2011. Identifying suitable models for the heat dynamics of buildings. Energy and Buildings, 43(7), pp.1511–1522. Available at: http://dx.doi.org/10.1016/j.enbuild.2011.02.005. Baker, P.H. & van Dijk, H.A.L., 2008. PASLINK and dynamic outdoor testing of building components. Building and Environment, 43(2), pp.143–151. Berger, J. et al., 2016. Bayesian inference for estimating thermal properties of a historic building wall. Building and Environment, 106, pp.327–339. Available at: http://dx.doi.org/10.1016/j.buildenv.2016.06.037. Bloem, J.., 1994. Institute for Systems Engineering a N D Informatics Workshop on Application of System Identification. Chong, A. & Lam, K.P., 2015. Uncertainty analysis and parameter estimation of HVAC systems in building energy models. Proceedings of BS2015: 14th Conference of International Building Performance Simulation Association, (Equation 1), pp.2788–2795. Chong, A. & Poh Lam, K., 2017. A Comparison of MCMC Algorithms for the Bayesian Calibration of Building Energy Models for Building Simulation 2017 Conference. , pp.494–503. De Coninck, R. et al., 2015. Toolbox for development and validation of grey-box building models for forecasting and control. Journal of Building Performance Simulation, 1493(July), pp.1–16. Available at: http://www.tandfonline.com/doi/full/10.1080/19401493.2015.1046933. CTSM-R Development Team, 2015. Continuous Time Stochastic Modelling in R, User’s Guide and Reference Manual. Www.Ctsm.Info. Available at: http://ctsm.info/. Dahlin, J., 2016. Accelerating Monte Carlo methods for Bayesian inference in dynamical models Accelerating Monte Carlo methods for Bayesian inference in dynamical models. , (1754). Available at: http://www.johandahlin.com/publications-files/phd-dahlin-thesis-final.pdf. European Commission, 2016. An EU strategy on heating and cooling 2016, EVO, 2014. International Performance Measurement and Verification Protocol Core Concepts, Gelman, A., Carlin, J.B., et al., 2014. Bayesian Data Analysis, Gelman, A., Hwang, J. & Vehtari, A., 2014. Understanding predictive information criteria for Bayesian models. Statistics and Computing, 24(6), pp.997–1016. Ghiaus, C., 2013. Causality issue in the heat balance method for calculating the design heating and cooling load. Energy, 50, pp.292–301. Available at: http://linkinghub.elsevier.com/retrieve/pii/S0360544212007864 [Accessed January 7, 2015]. Ghiaus, C. & Hazyuk, I., 2010. Calculation of optimal thermal load of intermittently heated buildings. Energy and Buildings, 42(8), pp.1248–1258. Available at: http://dx.doi.org/10.1016/j.enbuild.2010.02.017. Hazyuk, I., Ghiaus, C. & Penhouet, D., 2012a. Optimal temperature control of intermittently heated buildings using Model Predictive Control: Part I – Building modeling. Building and Environment, 51, pp.379–387. Available at: http://linkinghub.elsevier.com/retrieve/pii/S0360132311003933 [Accessed January 5, 2015]. Hazyuk, I., Ghiaus, C. & Penhouet, D., 2012b. Optimal temperature control of intermittently heated buildings using Model Predictive Control: Part II - Control algorithm. Building and Environment, 51, pp.388–394. Available at: http://dx.doi.org/10.1016/j.buildenv.2011.11.008. Heo, Y. et al., 2015. Scalable methodology for large scale building energy improvement: Relevance of calibration in model-based retrofit analysis. Building and Environment, 87, pp.342–350. Available at: http://dx.doi.org/10.1016/j.buildenv.2014.12.016. Heo, Y., Choudhary, R. & Augenbroe, G.A., 2012. Calibration of building energy models for retrofit analysis under uncertainty. Energy and Buildings, 47, pp.550–560. Available at: http://dx.doi.org/10.1016/j.enbuild.2011.12.029. Himpe, E. & Janssens, A., 2015. Characterisation of the thernial performance of a test house based on dynamic measurements. Energy Procedia, 78, pp.3294–3299. Available at: http://dx.doi.org/10.1016/j.egypro.2015.11.739. Jiménez, M.J., 2014. Reliable building energy performance characterisation based on full scale dynamic measurements, Kristensen, M.H. et al., 2017. Bayesian Calibration Of Residential Building Clusters Using A Single Geometric Building Representation Department of Engineering , Aarhus University , 8000 Aarhus C , DK Department of Engineering , University of Cambridge , Cambridge CB2 1PZ , UK AffaldVa. Kristensen, N.R. & Madsen, H., 2003. Continuous Time Stochastic Modelling. Mathematics Guide. , pp.1–32. Kulikova, M. V. & Tsyganova, J. V., 2016. A unified square-root approach for the score and Fisher information matrix computation in linear dynamic systems. Mathematics and Computers in Simulation, 119, pp.128– 141. Available at: http://dx.doi.org/10.1016/j.matcom.2015.07.007. Li, Q., Augenbroe, G. & Brown, J., 2016. Assessment of linear emulators in lightweight Bayesian calibration of dynamic building energy models for parameter estimation and performance prediction. Energy and Buildings, 124, pp.194–202. Ljung, L., 2002. System identification: theory for the user (second edition). Automatica, 38(2), pp.375–378. Madsen, H., 2007. Time series analysis, Madsen, H. & Thyregod, P., 2012. Introduction to General and Generalized Linear Models, Available at: http://dx.doi.org/10.1111/j.1751-5823.2011.00179_8.x. Mbalawata, I.S., Särkkä, S. & Haario, H., 2013. Parameter estimation in stochastic differential equations with Markov chain Monte Carlo and non-linear Kalman filtering. Computational Statistics, 28(3), pp.1195–1223. Naveros, I., 2016. Modelling Heat Transfer for Energy Efficiency Assessment of Buildings: Identification of Physical Parameters. Naveros, I. et al., 2014. Setting up and validating a complex model for a simple homogeneous wall. Energy and Buildings, 70, pp.303–317. Available at: http://linkinghub.elsevier.com/retrieve/pii/S0378778813007937 [Accessed January 13, 2015]. Nemeth, C.J., 2014. Parameter Estimation for State Space Models using Sequential Monte Carlo Algorithms. , (November). Nespoli, L., Medici, V. & Rudel, R., 2015. Grey-Box System Identification of Building Thermal Dynamics Using only Smart Meter and Air Temperature Data. Building Simulation Conference, p.Nespoli. Nielsen, A.A. & Nielsen, B.K., 1984. A dynamic test method for the thermal performance of small houses. Proc. ACEEE Conf., Santa Cruz, CA, 1984, American Council for an Energy-Efficient Economy, CA, pp.207–220. Rehab, I. & André, P., 2015. Energy Performance Charactirisation of T He Test Case “Twin House” in Holzkirchen , Based on Trnsys Simulation and Grey Box Model. Building Simulation Conference, pp.2401–2408. Sarkka, S., 2013. Bayesian Filtering and Smoothing. Cambridge University Press, p.254. Available at: http://dl.acm.org/citation.cfm?id=2534502%5Cnhttp://ebooks.cambridge.org/ref/id/CBO9781139344203. Strachan, P. et al., 2016. Empirical Whole Model Validation Modelling Specification Validation of Building Energy Simulation Tools, Team Stan Development, 2015. Stan Modeling Language: User’s Guide and Reference Manual. Version 2.7.0. Interaction Flow Modeling Language, pp.1–534. Tian, W. et al., 2016. Identifying informative energy data in Bayesian calibration of building energy models. Energy and Buildings, 119, pp.363–376. Available at: http://dx.doi.org/10.1016/j.enbuild.2016.03.042. Tsyganova, Y. V. & Kulikova, M.V., 2012. On efficient parametric identification methods for linear discrete stochastic systems. Automation and Remote Control, 73(6), pp.962–975. Available at: http://link.springer.com/10.1134/S0005117912060033. Turner, C. & Frankel, M., 2008. Energy Performance of LEED ® for New Construction Buildings. New Buildings Institute, pp.1–46. Váňa, Z. et al., 2013. Building semi-physical modeling : On selection of the model complexity ˇ a c a n. Proc. European Control Conference. De Wilde, P., 2014. The gap between predicted and measured energy performance of buildings: A framework for investigation. Automation in Construction, 41, pp.40–49. Available at: http://dx.doi.org/10.1016/j.autcon.2014.02.009. Zayane, C., 2011. Identification d ’un modèle de comportement thermique de bâtiment à partir de sa courbe de charge.

Journal

StatisticsarXiv (Cornell University)

Published: Apr 12, 2019

There are no references for this article.