Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Enforcing Mean Reversion in State Space Models for Prawn Pond Water Quality Forecasting

Enforcing Mean Reversion in State Space Models for Prawn Pond Water Quality Forecasting The contribution of this study is a novel approach to introduce mean reversion in multi-step-ahead forecasts of state-space models. This approach is demonstrated in a prawn pond water quality forecasting application. The mean reversion constrains forecasts by gradually drawing them to an average of previously observed dynamics. This corrects deviations in forecasts caused by irregularities such as chaotic, non-linear, and stochastic trends. The key features of the approach include (1) it enforces mean reversion, (2) it provides a means to model both short and long-term dynamics, (3) it is able to apply mean reversion to select structural state-space components, and (4) it is simple to implement. Our mean reversion approach is demonstrated on various state-space models and compared with several time-series models on a prawn pond water quality dataset. Results show that mean reversion reduces long-term forecast errors by over 60% to produce the most accurate models in the comparison. Keywords: Long term forecasting, Multi-step ahead forecasting, Mean reversion, Forecast constraint, Kalman lter 1. Introduction poxia if dissolved oxygen (DO) drop to extreme val- ues (Robertson, 2006). By forecasting important In aquaculture prawn farming, managing water water quality variables, farmers are provided with quality is key for maximising quantity, quality, and the tools to take preemptive measures that encour- health of the stock. For example, high levels of age favourable pond conditions. prawn mortality can occur due to anoxia and hy- Long-term forecasting can be a challenging task with complex environmental processes such as Corresponding author; St Lucia, QLD, 4067, Australia. Email addresses: joel.dabrowski@data61.csiro.au prawn ponds. In this study, we take advantage (Joel Janek Dabrowski ), of the fact that many natural processes exhibit ashfaqur.rahman@data61.csiro.au (Ashfaqur Rahman), some form of mean reversion. This is commonly dan.pagendam@data61.csiro.au (Daniel Edward found where the process seeks a state of equilib- Pagendam), andrew.george@data61.csiro.au (Andrew George) rium. For example, the long-term trend (a week or Preprint submitted to Computers and Electronics in Agriculture February 27, 2020 arXiv:2002.11228v1 [stat.AP] 26 Feb 2020 more) of pond water temperature typically varies found in many domains other than aquaculture. within some bounds. These bounds are main- Our contributions are: (1) we provide an ap- tained as the underlying process seeks thermody- proach to enforcing mean reversion in state-space namic equilibrium within a changing environment. models (to our knowledge, no other studies have Without knowledge of the underlying process, the introduced any form of mean reversion into state longer-term dynamics can appear as a slowly vary- space models for constraining forecasts), (2) we ing stochastic trend. demonstrate this approach on several state-space Forecasting such processes can be challenging models in a real-world aquaculture application, and when stochastic trends cause forecasts to deviate. (3) we compare our approach with several time se- Models should realistically incorporate some form ries models. of constraint or bounds. Our hypothesis is that This paper is organised as follows: In section 2, such a constraint can be imposed by modelling the we review related forecasting literature. Section 3, stochastic variations with a xed attractor distribu- provides an overview of the linear dynamic system tion that long-term trends are drawn towards. In (LDS) and the Kalman lter with the purpose of this form, the long-term behaviour of the process introducing our mean reversion approach described may have some stable, marginal distribution when in section 4. The aquaculture problem and datasets integrated over time (long periods of time or just used in this study are presented in section 5. In sec- the recent past). tion 6 we demonstrate how our approach is applied In this study we propose a novel approach to in- to state space forecasting models and results are troduce an attractor distribution in non-stationary provided in section 7. In section 8 a comparison state-space models. The attractor distribution of our approach with several forecasting methods is models previously observed dynamics. Mean re- provided. The study is concluded in section 9. version is enforced through introducing pseudo- observations into the Kalman lter during forecast- 2. Related Work ing. These pseudo-observations are samples of the 2.1. Forecasting Models attractor distribution mean. The result is that the ltering operation during forecasting naturally Many industries and disciplines rely multi-step- draws the forecasts towards the mean of the previ- ahead forecasting. A wide range of forecasting ously observed dynamics. methods exist in the literature (Gooijer & Hynd- The proposed approach can model both short and man, 2006). Statistical models include state-space long-term dynamics and it allows for the selection models, regression models, exponential smooth- of which state space components should be mean ing, Box-Jenkins models (such as the autoregres- reverting. Furthermore, the approach is easily im- sive moving average (ARMA) model), long memory plemented using the standard Kalman lter and it models, autoregressive conditional heteroscedastic has broad appeal as it addresses problems that are (ARCH), and generalised ARCH (GARCH) mod- 2 els. Nonlinear machine learning models have also presented by Andrawis et al. (2011). The authors been extensively explored for forecasting. Neural note that there seems to be little work in the lit- networks in particular have a relatively large body erature relating to such combinations, despite their of literature (Zhang & Qi, 2005; Zhang et al., 1998; e ectiveness. Ruiz et al., 2018). The approach we present in this study does State-space models are generative, probabilistic, not require combining long and short term-models. interpretable, and exible (Durbin & Koopman, Rather, it provides a means to naturally include 2012). As generative models, they are able to han- both short-term and long-term dynamics in a sin- dle missing data and forecasting functionality is gle model. The short-term dynamics are modelled inherent. As probabilistic models, they provide a directly in the state-space model. The long-term natural representation of uncertainty in a forecast. dynamics are modelled using mean reversion and State-space models are interpretable as they are de- the attractor distribution. signed based on structural analysis of a problem 2.3. Mean Reversion and naturally incorporate explanatory variables. This is in contrast with data driven models such Many phenomena should realistically be mod- elled with some form of limiting distribution for as neural networks and ARMA models, which are long-term forecasts. For example, interest rates considered as black-box models. are often modelled through the use of mean- 2.2. Multi-Step-Ahead Forecasting reverting stochastic processes, such as the Ornstein- Multi-step-ahead forecasting is a challenging task Uhlenbeck process (e.g. the Vasicek model (Va- as it requires a complete model of the short and sicek, 1977) or the CIR model (Cox et al., 1985)). longer-term dynamics. Short-term modelling is re- The dynamics are limited to Brownian motion with quired to model the dynamics between the forecast a tendency towards the origin (Pavliotis, 2014). time-steps. Longer-term modelling is required to Though Brownian motion is not stationary, a linear model the dynamics across the several time-step damping term in the Ornstein-Uhlenbeck process forecasts. can cause the process to become stationary. The The general approach to long-term forecasting generalised Ornstein-Uhlenbeck process is a natu- is to model the long-term trend of the time se- ral continuous time analogue of the AR(1) process ries and ignore short term dynamics. Such models with random i.i.d. components (Rao et al., 2012). can be obtained using time series analysis meth- The ARMA model also exhibits mean reversion, ods such as regression models, state-space models, but the moving-average allows for mean-reversion Box-Jenkins models, and recurrent neural networks to occur more gradually. In general, AR and (Kandil et al., 2001; Soman et al., 2010; Granger & ARMA models are limited to modelling only sta- Jeon, 2007). It is however possible to combine long tionary sequences Box et al. (2015). Non-stationary and short-term forecasts as discussed in the review components such as trend and seasonality are re- 3 p(h jh ) moved from the time series through di erencing t t1 h h h t1 t t+1 such as in the Autoregressive Integrated Moving p(v jh ) t t Average (ARIMA) model. v v t1 v t+1 The ARMA and ARIMA models may be framed as state-space models (Durbin & Koopman, 2012). Figure 1: Graphical model representation of the latent dy- namic model such as the linear dynamic system. In general, state-space models are not limited to stationary series and provide expressive power box models like many machine learning models. through latent variables. State-space models are The proposed mean reversion approach is tested however not necessarily mean reverting. Our pro- on these models in the context of forecasting wa- posed approach provides the means to enforce mean ter quality variables. reversion in state-space models. 2.4. Water Quality Modelling 3. The Linear Dynamic System and Filter- In water quality modelling applications, several ing ecosystem-based models have been proposed for variables such as DO (Ginot & Herv e, 1994; Lu & Piedrahita, 1996; Madsen et al., 2007; Xu & Xu, 3.1. The Linear Dynamic System 2016). These are complex multivariable models The linear dynamic system (LDS) is a state- that require precisely determined parameters per- space model that assumes linear-Gaussian dynam- taining to biological and physical processes. Vari- ics (Barber, 2012; Thrun et al., 2005; Murphy, ous data-driven approaches have also been used for 2012). Consider a system comprising a latent or modelling and forecasting water quality variables. hidden variable h that evolves over time, t = These include neural networks (Zhang et al., 2019; t 1; : : : ; T . The system provides an observable vari- Ta & Wei, 2018; Ren et al., 2018; Dabrowski et al., able v from which measurements can be made. The 2018a; de Canete et al., 2016; Schmid & Koskiaho, t observable variable is considered to have been emit- 2006; Dogan et al., 2009; Rankovi c et al., 2010; Bas- ted from the latent variable h . Assuming a rst ant et al., 2010; He et al., 2011; Ahmed, 2017) and order Markov process, the graphical model describ- other machine learning models (Shi et al., 2019; Xu ing this system is illustrated in Figure 1. The edges et al., 2017; Olyaie et al., 2017; Duan et al., 2016). between the latent variables describe the transition Dabrowski et al. (2018b) describe two data- distribution p(h jh ). The edges between the la- t t1 driven state-space models for modelling DO, pH, tent and observable variables describe the emission and temperature in prawn ponds. These mod- distribution p(v jh ). t t els provide a compromise between ecosystem mod- els and machine learning models. They are data- Linear-Gaussian assumptions in the LDS result driven unlike ecosystem models, and are not black- in the following state-space equations (Petris et al., 4 2009; Grewal & Andrews, 2015) The mean and covariance relating to p(v jv ) t 1:t1 are given by h = Ah +  (1) t t1 v h = B (7) v = Bh +  (2) t t t t vv hh T v = B B +  (8) t t The variable h is the state vector, A is the state h h transition matrix, and   N (0;  ) is the state t Additionally, the cross-covariance between the la- noise vector (where  denotes a covariance ma- tent and observed variables is given by trix). The variable v is the observation vector, hv hh T =  B (9) t t B is the emission or measurement matrix, and v v The predictions are updated with the latest ob- N (0;  ) is the measurement noise vector. servations to provide the parameters for the ltered In continuous time, state-space equations are given distribution. These parameters are given by by (Grewal & Andrews, 2015; Zarchan & Muso , 2000; Durbin & Koopman, 2012) h v f =  + K (v  ) (10) t t t t t hh h(t) = Ah(t) +  (t) (3) F = (I K B) (11) t t v(t) = Bh(t) +  (t) (4) where I is the identity matrix and K is the Kalman where A and B denote the continuous time state gain given by and emission matrices. hv vv 1 K =  ( ) (12) t t hh T hh T v 1 3.2. The Kalman Filter (KF) = ( B )(B B +  ) (13) t t Inference in the LDS involves calculating 3.3. Forecasting with the LDS p(h jv ), which is the probability distribution over t 1:t The ltered distribution is computed at each time the current latent variable given all past observa- using equations (10) and (11) with observations v . tions (Barber, 2012; Murphy, 2012). The linear- During forecasting, the prediction equations (5), Gaussian assumption allows for a closed-form in- (6), (7), and (8) are used with no observations. For ference algorithm known as the Kalman lter (KF) multiple forecasts into the future, f and F t1 t1 (Kalman, 1960). The ltered distribution is repre- in equations (5) and (6) can be replaced with t1 sented as a Gaussian with mean f and covariance hh and  respectively. Multiple forecasts are thus t1 F . The KF algorithm recursively repeats a predic- generated by sequentially sampling from the model. tion and update step. In the prediction step, the Any forecasts made for times t + i, i > 0 are Gaussian distributions p(h jv ) and p(v jv ) t 1:t1 t 1:t1 calculated based on the dynamics of the model at are computed. The mean and covariance relating time t. These dynamics are contained in the ltered to p(h jv ) distributions are given by t 1:t1 distribution at time t. If the ltered distribution at = Af (5) t1 time t is not representative of the long-term trend, hh T h = AF A +  (6) t t1 long-term forecasts may be inaccurate. 5 3.4. Nonlinear and Non-Gaussian Filtering 4.2. Attractor Distribution and the Central Limit The proposed approach is to use an attractor dis- The Kalman lter is a closed form solution for tribution to draw the forecasts to the mean of a dis- a linear-Gaussian model. If a system is nonlin- tribution that approximates the central limit. Spall ear or non-Gaussian, approximate ltering methods & Wall (1984) proved the central limit theorem for such as the extended Kalman lter (EKF), the un- the Kalman lter under certain conditions. These scented Kalman lter (UKF) (Julier & Uhlmann, conditions include the standard Kalman lter as- 1997), or Monte Carlo methods such as the particle sumptions as well as uniform complete observability lter (Gordon et al., 1993) and ensemble Kalman and controllability. The intention of the study was lter (enKF) (Evensen, 1994) are required. In this to investigate the asymptotic nature of the Kalman study the EKF is used. The EKF approximates a lter. Aliev & Ozbek (1999) furthered this study nonlinear function by linearising around the current by investigating the convergence rate of the central state mean estimate (Zarchan & Muso , 2000). limit theorem for the Kalman lter. To approximate the mean of the central limit dis- 4. Mean Reversion and the Attractor Distri- tribution, the average over all ltered posterior dis- bution tributions (see Section 3.2) is computed up to time t. That is 4.1. Forecast Deviation In State-Space Models f  f : (14) 1 i i=1 State-space time series models are comprised of This approximation is used as the mean of the at- several distinct components such as trend, sea- tractor distribution. sonal, and noise (disturbances) (Durbin & Koop- man, 2012; Commandeur & Koopman, 2007; West It is also possible to compute a weighted average & Harrison, 1997; Hyndman et al., 2008; Harvey, where more emphasis is given to recent dynamics. 1990; Petris et al., 2009). The trend component A geometric progression can be used to obtain an is often represented in the form of a polynomial exponential weighted average as follows model. Especially models such as the rst-order- t ti f (1 ) i=1 f  ; (15) polynomial Dynamic Linear Model (DLM) per- ti (1 ) i=1 form well for relatively short-term forecasting but where  is some constant in the range 0 < can fail in longer term forecasts (West & Harri- 1. This provides a form of exponential smooth- son, 1997). Irregularities such as slowly varying ing (Brown, 1959; Holt, 1957; Winters, 1960) in the stochastic trends can shift the forecast trajectory mean reversion. o course. Mean reversion corrects the deviant fore- cast by drawing it back towards the attractor dis- 1 t ti Note that the form f   f (1 ) can be 1 i i=1 ti tribution mean. used if  and t are chosen such that  (1 )  1. i=1 6 4.3. Mean Reversion Through Filtering space of v . The matrix B can be manipulated to map only certain components from the latent To draw the forecast to the attractor distribu- variable space. Non-zero values can be placed in tion mean, it is proposed that the forecasts be l- B corresponding to components which should be tered with the attractor distribution as an observ- mean reverting in nature. For example, non-zero able variable. That is, set v = f as a pseudo- t 1 values could be placed in B corresponding to trend observation during forecasting. The ltered distri- components that should exhibit mean reversion be- bution can be written as (Thrun et al., 2005) haviour. Zeros can be placed in B corresponding p(h jv ) / p(v jh )p(h jv ) (16) t 1:t t t t 1:t1 to components which should not be mean revert- ing in nature. For example, seasonal components The rst term can be viewed as a likelihood of may be left to oscillate throughout a forecast. A the observation given the model state. The second demonstration of this is presented in Section 6. term can be viewed as a prior describing the pre- To de ne the measurement noise covariance dicted model state given previous observations. By for the attractor distribution pseudo-observations, using the attractor distribution as the observable consider that  represents a form of uncertainty variable, the likelihood describes the probability of of the observation. By adjusting the uncertainty, the attractor distribution given the current model the rate of convergence of the forecast to the at- state. If this likelihood is low, it implies a mismatch tractor distribution mean can be manipulated. The between what the model is forecasting and what is Kalman gain de nes the level of correction. Con- expected asymptotically. sider the representation of the Kalman gain in (13). To understand how ltering draws the forecast hh v The expression comprises B,  , and  . B is de- to the attractor distribution, consider the Kalman hh ned as discussed above and  is computed from lter update equation (10). The ltered mean is h the prediction. With these de ned, the Kalman the current prediction  , that is updated with a gain can thus be adjusted by manipulating  . If weighted di erence between observation v and the v  is set to zeros, indicating the extreme level of prediction  . The weighting factor for the error is certainty of v , the Kalman gain reduces as follows the Kalman gain. Equation (10) provides a mech- hh T hh T 1 anism to correct the model prediction with an ob- K = ( B )(B B + 0) t t servable variable v . If v is the attractor distribu- t t hh T T hh 1 1 = ( B )(B ( ) B ) t t tion, the forecast will be corrected according to the = B (17) attractor distribution. If K = B , the ltered mean in (10) reduces to 4.4. Parameters f = v , which is the attractor distribution mean. t t To de ne the emission matrix B for the attrac- If  is set to in nite values along its diagonal to tor distribution pseudo-observations, consider that indicate an extreme level of uncertainty of v , (10) B provides a mapping from the space of h to the reduces to f =  , which is mean proposed by t t 7 v the model. That is, with in nite values in  , the Modelling Water quality attractor distribution will be ignored. Forecasting Hydrophone By manipulating the uncertainty represented by Senaps Estimation , the level of correction of the forecasts is con- Spectral re ectance Visualisation trolled. This correction is performed over multiple Weather steps during ltering. The result is that the rate of Warning convergence of a forecast to the attractor distribu- Sensors Data storage Analytics tion mean is determined by  . Figure 2: Aquaculture prawn farm decision support system. 5. Datasets This study ts within a broader context of a system that is being developed for aquaculture temperature follow diurnal uctuations (Boyd & prawn farms. Several sensors have been deployed Tucker, 1998). Carbon dioxide (CO ) is continually into prawn ponds for monitoring water quality re- produced in the pond through respiration by organ- lated parameters. These sensors include water isms such as prawn and plankton. During the day, quality sensors, hydrophones, spectral re ectance, plant-based organisms use solar radiation for pho- and weather sensors. The sensor data is uploaded tosynthesis. Through photosynthesis, CO is ab- to a central cloud-based system (Senaps). Sev- sorbed and oxygen is released. Thus, DO increases eral decision support tasks are performed on the and CO decreases during the day. At night pho- stored data. The framework of the decision sup- tosynthesis ceases. The result is that DO decreases port system is illustrated in Figure 2. In this study, and CO increases at night. CO reacts with wa- 2 2 the modelling and forecasting of dissolved oxygen ter to form carbonic acid. Increased acidity reduces (DO), pH, and temperature in prawn ponds are the pH levels in the pond. Fluctuating CO thus considered. The mean reversion approach described causes uctuating pH. Furthermore, water temper- in this study is applied to data collected within this ature naturally uctuates with the changes in solar decision support system. radiation over a 24-hour period. The dataset used in this study comprises of DO, pH, and temperature readings taken from two Water quality variables may also vary in an ape- prawn ponds. The rst pond is a large 0.18ha grow- riodic manner (Boyd & Tucker, 1998). Irregular out pond and the second pond is a small 0.022ha variations may be caused by weather-related vari- nursery pond. The samples are taken at 15 minute ations and biological activity such as algal blooms. intervals over a period of 88 days. Such variations can produce the slow varying irreg- The datasets variables are seasonal in nature. ular or nonlinear uctuations that cause forecast Many water quality variables such as DO, pH and deviations. 8 6. Applied State-Space Models This matrix is converted to discrete time using a Laplace transform or the Taylor series expansion Dabrowski et al. (2018b) presented two models (Zarchan & Muso , 2000) for modelling water quality parameters in prawn 2 3 (At) (At) At ponds. The rst model is a LDS with a local linear A = e = I + At + + + 2! 3! trend component (constant velocity process) and a (19) seasonal component. The second model is a non- where t is the sample rate. linear model that provides a means to model the The emission matrix maps the elements from the seasonal amplitude using a local linear trend com- latent variable space to the observed variable space ponent. The UKF was used for inference in this according to (18). The emission matrix is thus non-linear model. These models will be used in this given by study, however the EKF algorithm will be used in- h i B = 1 0 1 0 stead of the UKF algorithm. The intention is to The attractor distribution is de ned to draw the improve the long-term (a week or more) forecasting forecasts to a xed mean of previously observed dy- capability of these models using the proposed mean namics. For the linear model, mean reversion is ap- reversion approach. plied to the trend component. Thus, the attractor 6.1. Linear Model distribution is de ned to approximate the central The observations of the linear model are mod- limit of . The following emission matrix for the elled with a seasonal, trend and noise component attractor distribution can thus be used as follows h i B = : 1 0 0 0 v = sin(!t) + +  (18) t t t In this form, mean reversion is only enforced on The seasonal component sin(!t) is modelled with and not on the seasonal component . a sinusoid with amplitude , the trend is mod- t t With the attractor distribution having a single elled with as a continuous local linear trend model, dimension, the variance  is a real number. The and the noise  is white Gaussian noise. The Let value is manually set to provide reasonable uncer- = sin(!t) such that (Dabrowski et al., 2018b) t t tainty bounds and to match the mean reversion set- h i _ tling time with the slowly varying irregular compo- h(t) = t t t t nent of the data. As discussed in Section 4, smaller The state transition matrix in continuous time, de- values provide quicker settling times and narrower noted by A is then given by 2 3 uncertainty bounds. Larger values provide slower 0 1 0 0 6 7 settling times and wider uncertainty bounds. Suit- 6 7 60 0 0 07 6 7 able values can generally be found with a brief A = 6 7 6 7 0 0 0 1 search over the sequence 10 ; i 2 Z and further re- 4 5 0 0 ! 0 ned if necessary. A search can also be conducted 9 using repeated random subsampling validation ap- That is, B is given by the Jacobian h i proaches. @b(h) @b(h) @b(h) @b(h) @b(h) @b(h) B = @ @ _ @ @ _ @ sin(!t) @ cos(!t) t t t t h i 6.2. Nonlinear Model 1 0 sin(!t) 0 0 The linear model is independent of the sinusoidal With this approximation to B, the standard amplitude in (18) (Dabrowski et al., 2018b). In- Kalman lter equations given in Section 3.2 can cluding the amplitude as a component in the state- be used. The proposed mean reversion approach is space representation results in a nonlinear model. thus directly applicable. The amplitude is modelled as a latent variable with For the nonlinear model, the datasets are as- a constant velocity process such that sumed to approach a xed mean o set and a xed h i h(t) = _ _ sin(!t) cos(!t) t t t t mean seasonal amplitude. The attractor distribu- tion thus approximates the central limit of as The state transition matrix in continuous time is well as . The emission matrix for the attractor given by 2 3 distribution is given by 0 1 0 0 0 0 6 7 2 3 6 7 60 0 0 0 0 07 1 0 0 0 0 0 6 7 4 5 B = : 6 7 6 7 0 0 0 1 0 0 0 0 1 0 0 0 6 7 A = 6 7 6 7 0 0 0 0 0 0 6 7 With a two-dimensional attractor distribution, 6 7 6 7 0 0 0 0 0 1 4 5 the variance  is a two-dimensional matrix. This 0 0 0 0 ! 0 matrix is con gured for an isotropic Gaussian This matrix is converted to discrete time using (19) with elements along the diagonal. These elements The trend element is added to a product of the are manually chosen according to the uncertainty amplitude and sinusoidal elements as indicated in bounds and the slowly varying irregular component (18). This results in a nonlinear emission model. of the data. Let b(h ) = sin(!t) + such that t t t 7. State-Space Models Results v = b(h ) + t t 7.1. Methodology The EKF approach is to approximate the nonlin- ear function b(h ) as a linearisation around the cur- t The datasets are resampled to three samples per rent state estimate. This linear approximation is day according to (Dabrowski et al., 2018b). Re- the tangent to b(h ) at the current state estimate. t sampling simulates handheld sensor readings taken Thus, the emission matrix is given by (Zarchan & by farmers, where samples are extracted at 05h00, Muso , 2000) 12h00, and 20h30. Although only 3 of the 96 sam- @b(h) ples per day are available, the sample rate in the B = @h h=f t models remains at 96 samples. The remaining 93 10 (NRMSE) is given by Dataset Samples Time Frequency DO 1200 12.5 days 15 min 1 (y y ^ ) i i i=1 =  100% (20) pH 1000 10.4 days 15 min nrmse y y max min Temperature 1100 11.5 days 15 min where y and y are the maximum and mini- max min Table 1: Forecast horizon in number of samples as well as mum dataset values respectively. The NMSE for a time for the datasets used in this demonstration. The last single sample i is given by column provides the sample rate of the sensor used to gather the dataset. Forecast horizons are determined by the selected (y y ^ ) i i =  100% (21) nrmse in ection point in the data. y y max min 7.2. Linear Model Results Plots of the forecasts for the linear model are samples are treated as missing values that are esti- presented in Figure 3. The horizontal axes describe mated through ltering and smoothing in the state the sample number. Without mean reversion, the space models. Forecasts are performed and evalu- forecast trends deviate from the ground truth as il- ated over all 96 samples per day. lustrated in Figure 3a. These deviations are due the The time series dataset is split into a training and in ection point in the long-term trend from which test set. Filtering is performed on the training set. the forecasts extend. Reasonable forecasts are ob- The attractor distributions are obtained from these tained up to the end of the rst seasonal cycle where ltered results. Forecasts are evaluated on the test variations in the true trend are minimal. After the set. The location of the split between the train- rst cycle, the forecasts begin to deviate as the true ing and test sets is speci cally chosen around some trend changes in a non-linear or stochastic manner. form of in ection point. At these in ection points, As indicated in Figure 3b, enforcing mean rever- a model without mean reversion is more likely to sion provides signi cant improvements to long term deviate from the global trend. forecasts. Mean reversion draws the deviant fore- casts back towards the average of the previously The forecasts are made over multiple steps to pro- observed dynamics. vide long-term forecasts. The number of samples The blue lled regions plot the standard devi- over which the forecasts are made are provided in ation of the posterior ltered distribution. This Table 1. represents the uncertainty in the forecast. As ex- The normalised root mean squared error is used pected, the mean reversion reduces magnitude of to provide an evaluation of the error between the the standard deviation through the pseudo obser- forecast result and the measured data. Let y ^ de- vations from the attractor distribution. The level to note the forecast and let y denote the true value which the pseudo-observations a ect the standard of some time series at time t. For a forecast over deviation depends on the attractor distribution co- N samples, the normalised root mean squared error variance  . 11 The plots for the pH dataset in Figure 3b pro- Dataset Without MR With MR vide insight into the limitations of the mean rever- DO 29.69 16.68 sion approach. The long-term forecasts settle to pH 116.69 21.90 Temperature: 31.09 16.20 the attractor distribution mean, while the uctu- ations in the trend continue to vary. That is, the Table 2: NRMSE of the linear model with and without mean slowly-varying uctuations of the data are not per- reversion (MR) over the entire forecast presented in Figure 6. fectly modelled. These uctuations are treated as stochastic variations, where there is no determinis- mean reversion. tic function to model them. Instead, they are mod- A plot of the per-sample NRMSE error (equa- elled by the xed attractor distribution. Note how- tion (21)) for the forecast is plotted in Figure 5. ever that the forecast over the rst ve days (480 The error for the model without mean reversion in- samples) is still accurate and is a signi cant im- creases over the forecast time. This demonstrates provement over the model without mean reversion. that the forecast deviates from the ground truth with increasing forecast reach. For the model with A plot of the linear model's latent variables for mean reversion, the error remains relatively con- the dissolved oxygen dataset is presented in Fig- stant over the entire forecast. This demonstrates ure 4. Mean reversion is applied to the trend com- that the model performs equally well at short and ponent . Without mean reversion, the trend of t long-term forecasting. This is especially remarkable the forecast continues linearly with a steep gradi- as the model is forecasting more than 1000 steps- ent. Mean reversion causes the trend to curve back ahead in time. towards the attractor distribution mean. By in- 7.3. Nonlinear Model Results creasing  , the time it takes for the curve to settle can be increased. Decreasing  results in a quicker Plots of the forecasts for the nonlinear model are settling time. presented in Figure 6. As for the linear model, Mean reversion is not applied to the sinusoidal mean reversion provides signi cant improvement in component, . The seasonal oscillation thus con- the forecasts and reduces the uncertainty in the tinues throughout the forecast. This demonstrates forecast. the key feature of the model where mean reversion As illustrated in Figure 6a, the oscillation com- is applied to one speci c component in the model. ponent decays over the forecast of the DO dataset. The NRMSE over the complete forecast for all This follows the trend in the data leading up to datasets is presented in Table 2. The results show the forecast, where the oscillation amplitude is de- that mean reversion produces signi cant improve- creasing. The trend in the data however does not ments in forecast ability. Though the RMSE for the continue decreasing as it does in the forecast. Mean mean reversion in the pH dataset is high, it is a sig- reversion is thus applied to both the trend compo- ni cant improvement over the linear model without nent and the amplitude component . The re- t t 12 10 6000 6500 7000 7500 8000 8500 6000 6500 7000 7500 8000 8500 9.0 8.4 7.8 6500 7000 7500 8000 8500 6500 7000 7500 8000 8500 ◦ ◦ 6500 7000 7500 8000 8500 6500 7000 7500 8000 8500 (a) Linear model forecasts without mean reversion (b) Linear model forecasts with mean reversion Figure 3: Linear model forecasts of the dissolved oxygen (mg=l), pH, and temperature ( C) over sample indexes. The red line is a plot of the forecast and the blue lled region is a plot of the forecast standard deviation. The dark grey line is a plot of the sensor data sampled at 15 minute intervals, and the light grey markers indicate sub-samples extracted at 05h00, 12h00, and 20h30. The vertical grey dotted line indicates the start of the forecast. Only the last portion of the historical data are shown. data data 10 10 t f sin(ωt) sin(ωt) − 5 − 5 0 2000 4000 6000 8000 0 2000 4000 6000 8000 (a) Latent variables for the linear model without mean (b) Latent variables for the linear model with mean reversion. reversion. Figure 4: Plots of the data, ltered mean f , the trend component , and the sinusoidal component sin(!t) for the linear t t model on the dissolved oxygen dataset over the sample index. The gaps in the data plots are due to missing data. Temp. ( C) DO (mg/l) pH pH Temp. ( C) DO (mg/l) 50 0 0 7250 7500 7750 8000 8250 8500 7250 7500 7750 8000 8250 8500 7600 7800 8000 8200 8400 7600 7800 8000 8200 8400 0 0 7400 7600 7800 8000 8200 8400 7400 7600 7800 8000 8200 8400 (a) NRMSE for the linear model without mean rever- (b) NRMSE for the linear model with mean reversion. sion. Figure 5: Per-sample NRMSE (equation (21)) for the linear model forecasts on the DO dataset presented in Figure 3. 6000 6500 7000 7500 8000 8500 6000 6500 7000 7500 8000 8500 10.0 8.8 7.5 8.0 5.0 6500 7000 7500 8000 8500 6500 7000 7500 8000 8500 ◦ ◦ 6500 7000 7500 8000 8500 6500 7000 7500 8000 8500 (a) Nonlinear model forecasts without mean reversion (b) Nonlinear model forecasts with mean reversion Figure 6: Nonlinear model forecasts of dissolved oxygen (mg=l), pH, and temperature ( C) over sample indexes. The red line is a plot of the forecast and the blue lled region is a plot of the forecast standard deviation. The dark grey line is a plot of the sensor data sampled at 15 minute intervals, and the light grey markers indicate sub-samples extracted at 05h00, 12h00, and 20h30. The vertical grey dotted line indicates the start of the forecast. Only the last portion of the historical data are shown. sult is that both of these components are corrected compared to the linear model. This is expected as to provide a more accurate forecast. and sin(!t) are separated in the nonlinear model, whereas in the linear model, they are combined into A plot of the latent variables for the DO dataset a single component. Both the trend and ampli- are presented in Figure 7. The amplitude of the tude components are a ected by the in ection sin(!t) component remains fairly constant when pH pH (%) Temp. (%) DO (%) Temp. ( C) DO (mg/l) pH Temp. (%) pH (%) DO (%) Temp. ( C) DO (mg/l) DLM model is a free-form seasonal model (West & Dataset Without MR With MR Harrison, 1997) with a rst order trend component DO 25.12 14.44 as used in the LDS. Mean reversion using equation pH 87.89 21.84 Temperature: 64.48 16.15 (14) and weighted mean reversion using equation (15) is applied to the trend components in the LDS Table 3: NRMSE of the nonlinear model with and without and DLM models. The weighted mean reversion is mean reversion (MR) over the forecast presented in Figure 6. applied with  = 0:1. In tables and gures, models using mean reversion and weighted mean reversion point in the data where the forecast begins. They are denoted by a `MR' and a `WMR' subscript re- both veer o with a steep gradient. Mean rever- spectively. sion is applied to correct and , and draw them t t The SARIMA(5,1,3)(0,1,0)96 model is used on back to the mean. The seasonal component is left all datasets. The model order was chosen accord- to oscillate throughout the forecast. ing to autocorrelation and partial autocorrelation The NRMSE over the entire forecast for all plots. The Prophet model is con gured with a lin- datasets is presented in Table 3. As for the linear ear growth trend, an additive daily seasonal com- model, the mean reversion reduces the error. Com- ponent, and an interval width of 0.8. paring the linear model results in Table 2 and the The set of models are compared on the dissolved nonlinear model results in Table 3, it is clear that oxygen, pH, and temperature datasets. In this com- the nonlinear model achieves the best results. The parison, the datasets are not resampled as was done nonlinear model is however a more complex model. in section 7. All 96 samples per day are used in all models. Each model provides a 10 day (960 sam- A plot of the per-sample NRMSE error (equation ple) forecast from the set of 10 pre-selected random (21)) is presented in Figure 8. As for the linear starting points. Ten days is selected as it represents model, mean reversion reduces the error in the long- a reasonable long-term forecast in this application. term forecasts. The average NRMSE over the 10 forecasts for each model and dataset are presented in Table 4. 8. Time Series Model Comparison The LDS performs poorly over a long-term fore- A comparison between a LDS (Dabrowski et al., cast. However, when using the mean reversion, the 2018b), a dynamic linear model (DLM) (West forecast is signi cantly improved. Using weighted & Harrison, 1997), a seasonal autoregressive in- mean reversion provides further improvements on tegrated moving average (SARIMA) model, and the pH and temperature datasets. Facebook's Prophet model (Taylor & Letham, The DLM generally does better than the LDS. It 2018) is performed. is a more complex model and is able to provide a The linear LDS model of (Dabrowski et al., https://www.statsmodels.org 2018b) is used as described in section 6.1. The https://facebook.github.io/prophet/ 15 data data 10 10 5 5 f f t t 5 5 γ γ t t 7.5 7.5 5.0 5.0 7.5 α α t t 5.0 0 2.5 0.5 0.5 sin(ωt) sin(ωt) 0.0 0.0 − 0.5 − 0.5 0 2000 4000 6000 8000 0 2000 4000 6000 8000 (a) Latent variables for the nonlinear model without (b) Latent variables for the nonlinear model with mean mean reversion. reversion. Figure 7: Plots of the data, ltered mean f , the trend component , the sinusoidal component sin(!t), and the amplitude t t component for the nonlinear model over the sample index. The gaps in the data plots are due to missing data. 0 0 7250 7500 7750 8000 8250 8500 7250 7500 7750 8000 8250 8500 0 0 7600 7800 8000 8200 8400 7600 7800 8000 8200 8400 0 0 7400 7600 7800 8000 8200 8400 7400 7600 7800 8000 8200 8400 (a) NRMSE for the nonlinear model without mean re- (b) NRMSE for the nonlinear model with mean rever- version. sion. Figure 8: Per-sample NRMSE (equation (21)) for the nonlinear model forecasts on the DO dataset presented in Figure 6. Temp. (%) pH (%) DO (%) Temp. (%) pH (%) DO (%) Dataset LDS LDS LDS DLM DLM DLM SARIMA Prophet MR WMR MR WMR DO 33.51 14.41 15.98 25.41 10.81 11.08 15.27 16.07 pH 60.14 35.03 27.61 61.74 34.36 24.76 65.13 25.61 Temperature 107.92 38.14 34.33 104.80 36.83 31.86 109.26 71.12 Average 67.19 29.19 25.97 63.98 27.33 22.56 63.22 37.60 Table 4: Average NRMSE error (%) over ten 960-step-ahead forecasts for the set of models and datasets. Mean reversion is denoted by MR. Weighted mean reversion is denoted by WMR. more re ned representation of the seasonal curves. This increased complexity comes at a signi cant cost with a 97-dimensional state vector. This can be problematic in hardware where computational power and memory are limited. In comparison with the DLM, the LDS has a 4-dimensional state vec- tor. The LDS thus performs surprisingly well in comparison. The DLM with weighted mean reversion provides the lowest average NRMSE results over all datasets. Other than the pH dataset, the other mean re- version model variants take the second, third and Figure 9: Box-whisker plots comparing the set of models fourth place. For the pH dataset, the Prophet over each dataset for the NRMSE results. model provides highly competitive results and takes second place. The SARIMA model performs well on mean reversion, the LDS and DLM models produce the dissolved oxygen dataset, otherwise it provides results with high NRMSE values and large boxes. similar results to the DLM and LDS models. The large boxes indicate a high variation in the The SARIMA model has rst order di erencing forecast accuracy. Introducing mean reversion or and the Prophet model has a linear growth trend. weighted mean reversion both increases accuracy These components function as linear trend compo- and reduces variation in the forecasts. The result nents. Thus, like state-space models, the SARIMA is a more robust model. and Prophet models are susceptible to forecast de- For the pH and temperature datasets, the viations. Given this, the Prophet model performs DLM model produces boxes which are be- WMR remarkably well. low the LDS, DLM, SARIMA, and Prophet model To illustrate the robustness of the models and the boxes. This indicates some level of statistical signif- statistical signi cance of the results, box-whisker icance that the DLM outperforms these mod- WMR plots are presented in Figure 9. In the absence of els. LDS LDS MR LDS WMR DLM DLM MR DLM WMR SARIMA Prophet Temp. (%) pH (%) DO (%) The computation times are presented in Table 5. the prawn pond water quality dataset is presented. These times include the parameter estimation as The results demonstrate that the lowest errors are well as the forecasting operations. All models are obtained when weighted mean reversion is used in implemented in Python and run on a Dual-Core In- the DLM. tel i5 processor. The mean reversion increases the A limitation of the attractor distribution is that processing time as the pseudo samples are required it is stationary. The result is that the long-term to be calculated. Weighted mean reversion further forecast is drawn to a xed mean. In future work, increases computational complexity resulting in fur- a non-stationary attractor distribution could be in- ther increased processing times. Weighted mean vestigated. The result would be that the forecast reversion in the LDS is still however quicker than would be drawn to a particular dynamic rather than the Prophet and SARIMA models. The SARIMA a xed mean. Future work could also include an model has the highest processing time, which is investigation into estimating the attractor distri- primarily due to the parameter estimation opera- bution covariance matrix  using the expectation tion. Compared with the DLM, the Prophet model maximisation algorithm. is more computationally ecient. Finally, though the proposed approach is demon- strated on an aquaculture problem, it is applicable to other problems with similar properties. Future 9. Summary and Conclusion work could include testing the approach on prob- In this study a novel mean reversion approach lems such as weather-related forecasting, electricity is presented for state-space models. The mean re- load forecasting, algal bloom forecasting, and other version is performed using an attractor distribution environmental applications with seasonal data. with a Gaussian form. The mean of this distribu- tion is approximated by the average ltered esti- References mate over previously observed samples. This mean Ahmed, A. M. (2017). Prediction of dissolved oxy- provides an approximation of the average dynamics gen in surma river by biochemical oxygen de- over the sequence. To draw a forecast towards the mand and chemical oxygen demand using the mean, ltering is applied with pseudo-observations arti cial neural networks (anns). Journal of King Saud University - Engineering Sciences , obtained from attractor distribution. The result is 29 , 151 { 158. URL: http://www.sciencedirect. that the forecast converges to the attractor distri- com/science/article/pii/S1018363914000385. bution mean in the limit. doi:https://doi.org/10.1016/j.jksues.2014.05.001. We demonstrate the approach with a linear and Aliev, F. A., & Ozbek, L. (1999). Evaluation of conver- gence rate in the central limit theorem for the kalman l- nonlinear LDS in a prawn pond water quality fore- ter. IEEE Transactions on Automatic Control , 44 , 1905{ casting application. Results show a signi cant im- 1909. doi:10.1109/9.793734. provement in long-term forecasts. Furthermore, a Andrawis, R. R., Atiya, A. F., & El-Shishiny, H. comparison between various time series models on (2011). Combination of long term and short term 18 Dataset LDS LDS LDS DLM DLM DLM SARIMA Prophet MR WMR MR WMR DO 1.96 3.61 5.12 31.83 64.38 97.17 556.43 15.22 pH 1.85 3.47 5.18 32.1 64.38 97.22 87.85 17.73 Temperature 2.16 3.95 5.61 31.6 64.53 96.46 190.23 17.51 Table 5: Average processing time in seconds over ten 960-step-ahead forecasts for the set of models and datasets. Mean reversion is denoted by MR. Weighted mean reversion is denoted by WMR. forecasts, with application to tourism demand forecast- article/pii/S0957417416303098. doi:https://doi.org/ ing. International Journal of Forecasting , 27 , 870 10.1016/j.eswa.2016.06.028. { 886. URL: http://www.sciencedirect.com/science/ Commandeur, J. J., & Koopman, S. J. (2007). An Intro- article/pii/S0169207010001147. doi:https://doi.org/ duction to State Space Time Series Analysis . Practical 10.1016/j.ijforecast.2010.05.019. Special Section 1: Econometrics. Oxford University Press. Forecasting with Arti cial Neural Networks and Compu- Cox, J. C., Ingersoll, J. E., & Ross, S. A. (1985). A theory tational Intelligence Special Section 2: Tourism Forecast- of the term structure of interest rates. Econometrica , 53 , ing. 385{407. URL: http://www.jstor.org/stable/1911242. Barber, D. (2012). Bayesian Reasoning and Machine Learn- Dabrowski, J. J., Rahman, A., & George, A. (2018a). ing . Bayesian Reasoning and Machine Learning. Cam- Prediction of dissolved oxygen from ph and water tem- bridge University Press. perature in aquaculture prawn ponds. In Proceedings Basant, N., Gupta, S., Malik, A., & Singh, K. P. (2010). of the Australasian Joint Conference on Arti cial In- Linear and nonlinear modeling for simultaneous predic- telligence - Workshops AIW'18 (pp. 2{6). New York, tion of dissolved oxygen and biochemical oxygen de- NY, USA: ACM. URL: http://doi.acm.org/10.1145/ mand of the surface water a case study. Chemo- 3314487.3314488. doi:10.1145/3314487.3314488. metrics and Intelligent Laboratory Systems , 104 , 172 Dabrowski, J. J., Rahman, A., George, A., Arnold, S., & { 180. URL: http://www.sciencedirect.com/science/ McCulloch, J. (2018b). State space models for forecast- article/pii/S0169743910001449. doi:https://doi.org/ ing water quality variables: An application in aquacul- 10.1016/j.chemolab.2010.08.005. ture prawn farming. In Proceedings of the 24th ACM Box, G., Jenkins, G., Reinsel, G., & Ljung, G. (2015). Time SIGKDD International Conference on Knowledge Dis- Series Analysis: Forecasting and Control . Wiley Series covery &#38; Data Mining KDD '18 (pp. 177{185). New in Probability and Statistics. Wiley. York, NY, USA: ACM. URL: http://doi.acm.org/10. Boyd, C. E., & Tucker, C. S. (1998). Pond aquaculture water 1145/3219819.3219841. doi:10.1145/3219819.3219841. quality management . Springer US. doi:https://doi.org/ Dogan, E., Sengorur, B., & Koklu, R. (2009). Mod- 10.1007/978-1-4615-5407-3. eling biological oxygen demand of the melen river Brown, R. (1959). Statistical Forecasting for Inventory Con- in turkey using an arti cial neural network tech- trol . McGraw-Hill. URL: https://books.google.com.au/ nique. Journal of Environmental Management , 90 , 1229 books?id=QSYnAAAAMAAJ. doi:https://doi.org/10.1016/ { 1235. URL: http://www.sciencedirect.com/science/ j.ijforecast.2003.09.015. article/pii/S0301479708001588. doi:https://doi.org/ de Canete, J. F., Saz-Orozco, P. D., Baratti, R., Mu- 10.1016/j.jenvman.2008.06.004. las, M., Ruano, A., & Garcia-Cerezo, A. (2016). Soft- Duan, W., He, B., Nover, D., Yang, G., Chen, W., Meng, H., sensing estimation of plant euent concentrations in a Zou, S., & Liu, C. (2016). Water quality assessment and biological wastewater treatment plant using an optimal pollution source identi cation of the eastern poyang lake neural network. Expert Systems with Applications , 63 , basin using multivariate statistical methods. Sustainabil- 8 { 19. URL: http://www.sciencedirect.com/science/ ity , 8 . URL: https://www.mdpi.com/2071-1050/8/2/133. 19 doi:10.3390/su8020133. 10.1016/j.ecoleng.2011.06.022. Durbin, J., & Koopman, S. (2012). Time Series Analysis Holt, C. (1957). Forecasting seasonals and trends by expo- by State Space Methods volume 38 of Oxford Statistical nentially weighted averages (onr memorandum 52/1957). Science Series . (2nd ed.). Oxford University Press. Carnegie Institute of Technology , . Evensen, G. (1994). Sequential data assimilation with Hyndman, R., Koehler, A., Ord, J., & Snyder, R. a nonlinear quasi-geostrophic model using monte (2008). Forecasting with Exponential Smoothing: The carlo methods to forecast error statistics. Journal State Space Approach . Springer Series in Statistics. of Geophysical Research: Oceans , 99 , 10143{10162. Berlin, Heidelberg: Springer Berlin Heidelberg. URL: URL: https://agupubs.onlinelibrary.wiley.com/ https://doi.org/10.1007/978-3-540-71918-2. doi:10. doi/abs/10.1029/94JC00572. doi:10.1029/94JC00572. 1007/978-3-540-71918-2. arXiv:https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.10 Julier, 29/94JC00572 S. J., & Uhlmann, . J. K. (1997). New extension of the Ginot, V., & Herv e, J.-C. (1994). Estimating kalman lter to nonlinear systems. In Signal processing, the parameters of dissolved oxygen dynamics in sensor fusion, and target recognition VI (pp. 182{194). shallow ponds. Ecological Modelling , 73 , 169 SPIE volume 3068. URL: http://dx.doi.org/10.1117/ { 187. URL: http://www.sciencedirect.com/ 12.280797. doi:10.1117/12.280797. science/article/pii/0304380094900612. doi:https: Kalman, R. E. (1960). A new approach to linear ltering //doi.org/10.1016/0304-3800(94)90061-2. and prediction problems. Journal of basic Engineering , Gooijer, J. G. D., & Hyndman, R. J. (2006). 25 years of time 82 , 35{45. series forecasting. International Journal of Forecasting , Kandil, M., El-Debeiky, S., & Hasanien, N. (2001). Overview 22 , 443 { 473. URL: http://www.sciencedirect.com/ and comparison of long-term forecasting techniques for science/article/pii/S0169207006000021. doi:https:// a fast developing utility: part i. Electric Power Systems doi.org/10.1016/j.ijforecast.2006.01.001. Twenty Research , 58 , 11 { 17. URL: http://www.sciencedirect. ve years of forecasting. com/science/article/pii/S0378779601000979. Gordon, N. J., Salmond, D. J., & Smith, A. F. M. (1993). doi:https://doi.org/10.1016/S0378-7796(01)00097-9. Novel approach to nonlinear/non-gaussian bayesian state Lu, Z., & Piedrahita, R. H. (1996). Stochastic Modeling of estimation. IEE Proceedings F - Radar and Signal Pro- temperature and dissolved oxygen in strati ed sh ponds . cessing , 140 , 107{113. doi:10.1049/ip-f-2.1993.0015. Technical Report Department of Biological and Agricul- Granger, C. W., & Jeon, Y. (2007). Long-term forecast- tural Engineering University of California, Davis, USA. ing and evaluation. International Journal of Forecasting , Thirteenth Annual Technical Report. 23 , 539 { 551. URL: http://www.sciencedirect.com/ Madsen, H. I., Vollertsen, J., & Hvitved-Jacobsen, T. (2007). science/article/pii/S0169207007000908. doi:https:// Modelling the oxygen mass balance of wet detention ponds doi.org/10.1016/j.ijforecast.2007.07.002. receiving highway runo . In G. M. Morrison, & S. Rauch Grewal, M., & Andrews, A. (2015). Kalman Filtering: The- (Eds.), Highway and Urban Environment (pp. 487{497). ory and Practice with MATLAB . John Wiley & Sons, Dordrecht: Springer Netherlands. Inc. doi:10.1002/9781118984987. Murphy, K. P. (2012). Machine learning: a probabilistic Harvey, A. C. (1990). Forecasting, Structural Time Series perspective . MIT press. Models and the Kalman Filter . Cambridge University Olyaie, E., Abyaneh, H. Z., & Mehr, A. D. (2017). Press. doi:10.1017/CBO9781107049994. A comparative analysis among computational intel- He, J., Chu, A., Ryan, M. C., Valeo, C., & Zaitlin, B. ligence techniques for dissolved oxygen prediction (2011). Abiotic in uences on dissolved oxygen in a in delaware river. Geoscience Frontiers , 8 , 517 riverine environment. Ecological Engineering , 37 , 1804 { 527. URL: http://www.sciencedirect.com/science/ { 1814. URL: http://www.sciencedirect.com/science/ article/pii/S1674987116300469. doi:https://doi.org/ article/pii/S0925857411002096. doi:https://doi.org/ 10.1016/j.gsf.2016.04.007. 20 Pavliotis, G. A. (2014). The fokker{planck equation. In doi/abs/10.1061/(ASCE)1084-0699(2006)11:2(188). Stochastic Processes and Applications: Di usion Pro- doi:10.1061/(ASCE)1084-0699(2006)11:2(188). cesses, the Fokker-Planck and Langevin Equations (pp. arXiv:https://ascelibrary.org/doi/pdf/10.1061/(ASCE)1084-0699(2006)11:2(188). 87{137). New York, NY: Springer New York. URL: https: Shi, P., Li, G., Yuan, Y., Huang, G., & Kuang, L. (2019). //doi.org/10.1007/978-1-4939-1323-7_4. doi:10.1007/ Prediction of dissolved oxygen content in aquaculture us- 978-1-4939-1323-7_4. ing clustering-based softplus extreme learning machine. Petris, G., Petrone, S., & Campagnoli, P. (2009). Dynamic Computers and Electronics in Agriculture , 157 , 329 Linear Models with R. Use R! New York, NY: Springer { 338. URL: http://www.sciencedirect.com/science/ New York. URL: https://doi.org/10.1007/b135794_1. article/pii/S0168169918310421. doi:https://doi.org/ doi:10.1007/b135794_1. 10.1016/j.compag.2019.01.004. Rankovi c, V., Radulovi c, J., Radojevi c, I., Ostoji c, A., & Soman, S. S., Zareipour, H., Malik, O., & Mandal, P. (2010). Comi c, L. (2010). Neural network modeling of dissolved A review of wind power and wind speed forecasting meth- oxygen in the grua reservoir, serbia. Ecological Modelling , ods with di erent time horizons. In North American 221 , 1239 { 1244. URL: http://www.sciencedirect.com/ Power Symposium 2010 (pp. 1{8). doi:10.1109/NAPS. science/article/pii/S0304380009008692. doi:https:// 2010.5619586. doi.org/10.1016/j.ecolmodel.2009.12.023. Spall, J. C., & Wall, K. D. (1984). Asymptotic distri- Rao, T., Rao, S., & Rao, C. (2012). Time Se- bution theory for the kalman lter state estimator. ries Analysis: Methods and Applications vol- Communications in Statistics - Theory and Meth- ume 30 of Handbook of Statistics . Else- ods , 13 , 1981{2003. URL: https://doi.org/10.1080/ vier. URL: https://www.elsevier.com/books/ 03610928408828808. doi:10.1080/03610928408828808. time-series-analysis-methods-and-applications/ arXiv:https://doi.org/10.1080/03610928408828808. rao/978-0-444-53858-1. Ta, X., & Wei, Y. (2018). Research on a dissolved Ren, Q., Zhang, L., Wei, Y., & Li, D. (2018). oxygen prediction method for recirculating aquacul- A method for predicting dissolved oxygen in aqua- ture systems based on a convolution neural network. culture water in an aquaponics system. Com- Computers and Electronics in Agriculture , 145 , 302 puters and Electronics in Agriculture , 151 , 384 { 310. URL: http://www.sciencedirect.com/science/ { 391. URL: http://www.sciencedirect.com/science/ article/pii/S016816991730786X. doi:https://doi.org/ article/pii/S0168169918303181. doi:https://doi.org/ 10.1016/j.compag.2017.12.037. 10.1016/j.compag.2018.06.013. Taylor, S. J., & Letham, B. (2018). Forecasting Robertson, C. (Ed.) (2006). Australian prawn farming man- at scale. The American Statistician , 72 , 37{ ual: health management for pro t . Queensland Depart- 45. URL: https://doi.org/10.1080/00031305. ment of Primary Industries and Fisheries (QDPI&F). 2017.1380080. doi:10.1080/00031305.2017.1380080. Ruiz, L., Rueda, R., Cullar, M., & Pegalajar, M. arXiv:https://doi.org/10.1080/00031305.2017.1380080. (2018). Energy consumption forecasting based Thrun, S., Burgard, W., & Fox, D. (2005). Probabilistic on elman neural networks with evolutive opti- Robotics . Intelligent robotics and autonomous agents. mization. Expert Systems with Applications , 92 , MIT Press. 380 { 389. URL: http://www.sciencedirect. Vasicek, O. (1977). An equilibrium characterization of the com/science/article/pii/S0957417417306565. term structure. Journal of Financial Economics , 5 , 177 doi:https://doi.org/10.1016/j.eswa.2017.09.059. { 188. URL: http://www.sciencedirect.com/science/ Schmid, B. H., & Koskiaho, J. (2006). Arti cial neural article/pii/0304405X77900162. doi:https://doi.org/ network modeling of dissolved oxygen in a wetland pond: 10.1016/0304-405X(77)90016-2. The case of hovi, nland. Journal of Hydrologic Engi- West, M., & Harrison, J. (1997). Bayesian Forecast- neering , 11 , 188{192. URL: https://ascelibrary.org/ ing and Dynamic Models . Springer Series in Statistics. 21 Springer New York. URL: https://doi.org/10.1007/ 0-387-22777-6. doi:10.1007/0-387-22777-6. Winters, P. R. (1960). Forecasting sales by ex- ponentially weighted moving averages. Manage- ment Science , 6 , 324{342. URL: https://doi.org/ 10.1287/mnsc.6.3.324. doi:10.1287/mnsc.6.3.324. arXiv:https://doi.org/10.1287/mnsc.6.3.324. Xu, L., Liu, S., & Li, D. (2017). Prediction of water temperature in prawn cultures based on a mechanism model optimized by an improved arti cial bee colony. Computers and Electronics in Agriculture , 140 , 397 { 408. URL: http://www.sciencedirect.com/science/ article/pii/S0168169916305191. doi:https://doi.org/ 10.1016/j.compag.2017.05.034. Xu, Z., & Xu, Y. J. (2016). A deterministic model for predicting hourly dissolved oxygen change: Development and application to a shallow eutrophic lake. Water , 8 . URL: http://www.mdpi.com/2073-4441/8/2/41. doi:10. 3390/w8020041. Zarchan, P., & Muso , H. (2000). Fundamentals of Kalman Filtering: A Practical Approach . Number v. 190, pt. 1 in Fundamentals of Kalman Filtering: A Practical Ap- proach. American Institute of Aeronautics and Astronau- tics, Incorporated. Zhang, G., Patuwo, B. E., & Hu, M. Y. (1998). Fore- casting with arti cial neural networks: The state of the art. International Journal of Forecasting , 14 , 35 { 62. URL: http://www.sciencedirect.com/science/ article/pii/S0169207097000447. doi:https://doi.org/ 10.1016/S0169-2070(97)00044-7. Zhang, G., & Qi, M. (2005). Neural network fore- casting for seasonal and trend time series. Eu- ropean Journal of Operational Research , 160 , 501 { 514. URL: http://www.sciencedirect.com/science/ article/pii/S0377221703005484. doi:https://doi.org/ 10.1016/j.ejor.2003.08.037. Decision Support Systems in the Internet Age. Zhang, Y., Fitch, P., Vilas, M. P., & Thorburn, P. J. (2019). Applying multi-layer arti cial neural network and mutual information to the prediction of trends in dissolved oxygen. Frontiers in Environmental Science , 7 , 46. URL: https://www.frontiersin.org/article/10. 3389/fenvs.2019.00046. doi:10.3389/fenvs.2019.00046. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Statistics arXiv (Cornell University)

Enforcing Mean Reversion in State Space Models for Prawn Pond Water Quality Forecasting

Loading next page...
 
/lp/arxiv-cornell-university/enforcing-mean-reversion-in-state-space-models-for-prawn-pond-water-zZA3yr6aUF
ISSN
0168-1699
eISSN
ARCH-3347
DOI
10.1016/j.compag.2019.105120
Publisher site
See Article on Publisher Site

Abstract

The contribution of this study is a novel approach to introduce mean reversion in multi-step-ahead forecasts of state-space models. This approach is demonstrated in a prawn pond water quality forecasting application. The mean reversion constrains forecasts by gradually drawing them to an average of previously observed dynamics. This corrects deviations in forecasts caused by irregularities such as chaotic, non-linear, and stochastic trends. The key features of the approach include (1) it enforces mean reversion, (2) it provides a means to model both short and long-term dynamics, (3) it is able to apply mean reversion to select structural state-space components, and (4) it is simple to implement. Our mean reversion approach is demonstrated on various state-space models and compared with several time-series models on a prawn pond water quality dataset. Results show that mean reversion reduces long-term forecast errors by over 60% to produce the most accurate models in the comparison. Keywords: Long term forecasting, Multi-step ahead forecasting, Mean reversion, Forecast constraint, Kalman lter 1. Introduction poxia if dissolved oxygen (DO) drop to extreme val- ues (Robertson, 2006). By forecasting important In aquaculture prawn farming, managing water water quality variables, farmers are provided with quality is key for maximising quantity, quality, and the tools to take preemptive measures that encour- health of the stock. For example, high levels of age favourable pond conditions. prawn mortality can occur due to anoxia and hy- Long-term forecasting can be a challenging task with complex environmental processes such as Corresponding author; St Lucia, QLD, 4067, Australia. Email addresses: joel.dabrowski@data61.csiro.au prawn ponds. In this study, we take advantage (Joel Janek Dabrowski ), of the fact that many natural processes exhibit ashfaqur.rahman@data61.csiro.au (Ashfaqur Rahman), some form of mean reversion. This is commonly dan.pagendam@data61.csiro.au (Daniel Edward found where the process seeks a state of equilib- Pagendam), andrew.george@data61.csiro.au (Andrew George) rium. For example, the long-term trend (a week or Preprint submitted to Computers and Electronics in Agriculture February 27, 2020 arXiv:2002.11228v1 [stat.AP] 26 Feb 2020 more) of pond water temperature typically varies found in many domains other than aquaculture. within some bounds. These bounds are main- Our contributions are: (1) we provide an ap- tained as the underlying process seeks thermody- proach to enforcing mean reversion in state-space namic equilibrium within a changing environment. models (to our knowledge, no other studies have Without knowledge of the underlying process, the introduced any form of mean reversion into state longer-term dynamics can appear as a slowly vary- space models for constraining forecasts), (2) we ing stochastic trend. demonstrate this approach on several state-space Forecasting such processes can be challenging models in a real-world aquaculture application, and when stochastic trends cause forecasts to deviate. (3) we compare our approach with several time se- Models should realistically incorporate some form ries models. of constraint or bounds. Our hypothesis is that This paper is organised as follows: In section 2, such a constraint can be imposed by modelling the we review related forecasting literature. Section 3, stochastic variations with a xed attractor distribu- provides an overview of the linear dynamic system tion that long-term trends are drawn towards. In (LDS) and the Kalman lter with the purpose of this form, the long-term behaviour of the process introducing our mean reversion approach described may have some stable, marginal distribution when in section 4. The aquaculture problem and datasets integrated over time (long periods of time or just used in this study are presented in section 5. In sec- the recent past). tion 6 we demonstrate how our approach is applied In this study we propose a novel approach to in- to state space forecasting models and results are troduce an attractor distribution in non-stationary provided in section 7. In section 8 a comparison state-space models. The attractor distribution of our approach with several forecasting methods is models previously observed dynamics. Mean re- provided. The study is concluded in section 9. version is enforced through introducing pseudo- observations into the Kalman lter during forecast- 2. Related Work ing. These pseudo-observations are samples of the 2.1. Forecasting Models attractor distribution mean. The result is that the ltering operation during forecasting naturally Many industries and disciplines rely multi-step- draws the forecasts towards the mean of the previ- ahead forecasting. A wide range of forecasting ously observed dynamics. methods exist in the literature (Gooijer & Hynd- The proposed approach can model both short and man, 2006). Statistical models include state-space long-term dynamics and it allows for the selection models, regression models, exponential smooth- of which state space components should be mean ing, Box-Jenkins models (such as the autoregres- reverting. Furthermore, the approach is easily im- sive moving average (ARMA) model), long memory plemented using the standard Kalman lter and it models, autoregressive conditional heteroscedastic has broad appeal as it addresses problems that are (ARCH), and generalised ARCH (GARCH) mod- 2 els. Nonlinear machine learning models have also presented by Andrawis et al. (2011). The authors been extensively explored for forecasting. Neural note that there seems to be little work in the lit- networks in particular have a relatively large body erature relating to such combinations, despite their of literature (Zhang & Qi, 2005; Zhang et al., 1998; e ectiveness. Ruiz et al., 2018). The approach we present in this study does State-space models are generative, probabilistic, not require combining long and short term-models. interpretable, and exible (Durbin & Koopman, Rather, it provides a means to naturally include 2012). As generative models, they are able to han- both short-term and long-term dynamics in a sin- dle missing data and forecasting functionality is gle model. The short-term dynamics are modelled inherent. As probabilistic models, they provide a directly in the state-space model. The long-term natural representation of uncertainty in a forecast. dynamics are modelled using mean reversion and State-space models are interpretable as they are de- the attractor distribution. signed based on structural analysis of a problem 2.3. Mean Reversion and naturally incorporate explanatory variables. This is in contrast with data driven models such Many phenomena should realistically be mod- elled with some form of limiting distribution for as neural networks and ARMA models, which are long-term forecasts. For example, interest rates considered as black-box models. are often modelled through the use of mean- 2.2. Multi-Step-Ahead Forecasting reverting stochastic processes, such as the Ornstein- Multi-step-ahead forecasting is a challenging task Uhlenbeck process (e.g. the Vasicek model (Va- as it requires a complete model of the short and sicek, 1977) or the CIR model (Cox et al., 1985)). longer-term dynamics. Short-term modelling is re- The dynamics are limited to Brownian motion with quired to model the dynamics between the forecast a tendency towards the origin (Pavliotis, 2014). time-steps. Longer-term modelling is required to Though Brownian motion is not stationary, a linear model the dynamics across the several time-step damping term in the Ornstein-Uhlenbeck process forecasts. can cause the process to become stationary. The The general approach to long-term forecasting generalised Ornstein-Uhlenbeck process is a natu- is to model the long-term trend of the time se- ral continuous time analogue of the AR(1) process ries and ignore short term dynamics. Such models with random i.i.d. components (Rao et al., 2012). can be obtained using time series analysis meth- The ARMA model also exhibits mean reversion, ods such as regression models, state-space models, but the moving-average allows for mean-reversion Box-Jenkins models, and recurrent neural networks to occur more gradually. In general, AR and (Kandil et al., 2001; Soman et al., 2010; Granger & ARMA models are limited to modelling only sta- Jeon, 2007). It is however possible to combine long tionary sequences Box et al. (2015). Non-stationary and short-term forecasts as discussed in the review components such as trend and seasonality are re- 3 p(h jh ) moved from the time series through di erencing t t1 h h h t1 t t+1 such as in the Autoregressive Integrated Moving p(v jh ) t t Average (ARIMA) model. v v t1 v t+1 The ARMA and ARIMA models may be framed as state-space models (Durbin & Koopman, 2012). Figure 1: Graphical model representation of the latent dy- namic model such as the linear dynamic system. In general, state-space models are not limited to stationary series and provide expressive power box models like many machine learning models. through latent variables. State-space models are The proposed mean reversion approach is tested however not necessarily mean reverting. Our pro- on these models in the context of forecasting wa- posed approach provides the means to enforce mean ter quality variables. reversion in state-space models. 2.4. Water Quality Modelling 3. The Linear Dynamic System and Filter- In water quality modelling applications, several ing ecosystem-based models have been proposed for variables such as DO (Ginot & Herv e, 1994; Lu & Piedrahita, 1996; Madsen et al., 2007; Xu & Xu, 3.1. The Linear Dynamic System 2016). These are complex multivariable models The linear dynamic system (LDS) is a state- that require precisely determined parameters per- space model that assumes linear-Gaussian dynam- taining to biological and physical processes. Vari- ics (Barber, 2012; Thrun et al., 2005; Murphy, ous data-driven approaches have also been used for 2012). Consider a system comprising a latent or modelling and forecasting water quality variables. hidden variable h that evolves over time, t = These include neural networks (Zhang et al., 2019; t 1; : : : ; T . The system provides an observable vari- Ta & Wei, 2018; Ren et al., 2018; Dabrowski et al., able v from which measurements can be made. The 2018a; de Canete et al., 2016; Schmid & Koskiaho, t observable variable is considered to have been emit- 2006; Dogan et al., 2009; Rankovi c et al., 2010; Bas- ted from the latent variable h . Assuming a rst ant et al., 2010; He et al., 2011; Ahmed, 2017) and order Markov process, the graphical model describ- other machine learning models (Shi et al., 2019; Xu ing this system is illustrated in Figure 1. The edges et al., 2017; Olyaie et al., 2017; Duan et al., 2016). between the latent variables describe the transition Dabrowski et al. (2018b) describe two data- distribution p(h jh ). The edges between the la- t t1 driven state-space models for modelling DO, pH, tent and observable variables describe the emission and temperature in prawn ponds. These mod- distribution p(v jh ). t t els provide a compromise between ecosystem mod- els and machine learning models. They are data- Linear-Gaussian assumptions in the LDS result driven unlike ecosystem models, and are not black- in the following state-space equations (Petris et al., 4 2009; Grewal & Andrews, 2015) The mean and covariance relating to p(v jv ) t 1:t1 are given by h = Ah +  (1) t t1 v h = B (7) v = Bh +  (2) t t t t vv hh T v = B B +  (8) t t The variable h is the state vector, A is the state h h transition matrix, and   N (0;  ) is the state t Additionally, the cross-covariance between the la- noise vector (where  denotes a covariance ma- tent and observed variables is given by trix). The variable v is the observation vector, hv hh T =  B (9) t t B is the emission or measurement matrix, and v v The predictions are updated with the latest ob- N (0;  ) is the measurement noise vector. servations to provide the parameters for the ltered In continuous time, state-space equations are given distribution. These parameters are given by by (Grewal & Andrews, 2015; Zarchan & Muso , 2000; Durbin & Koopman, 2012) h v f =  + K (v  ) (10) t t t t t hh h(t) = Ah(t) +  (t) (3) F = (I K B) (11) t t v(t) = Bh(t) +  (t) (4) where I is the identity matrix and K is the Kalman where A and B denote the continuous time state gain given by and emission matrices. hv vv 1 K =  ( ) (12) t t hh T hh T v 1 3.2. The Kalman Filter (KF) = ( B )(B B +  ) (13) t t Inference in the LDS involves calculating 3.3. Forecasting with the LDS p(h jv ), which is the probability distribution over t 1:t The ltered distribution is computed at each time the current latent variable given all past observa- using equations (10) and (11) with observations v . tions (Barber, 2012; Murphy, 2012). The linear- During forecasting, the prediction equations (5), Gaussian assumption allows for a closed-form in- (6), (7), and (8) are used with no observations. For ference algorithm known as the Kalman lter (KF) multiple forecasts into the future, f and F t1 t1 (Kalman, 1960). The ltered distribution is repre- in equations (5) and (6) can be replaced with t1 sented as a Gaussian with mean f and covariance hh and  respectively. Multiple forecasts are thus t1 F . The KF algorithm recursively repeats a predic- generated by sequentially sampling from the model. tion and update step. In the prediction step, the Any forecasts made for times t + i, i > 0 are Gaussian distributions p(h jv ) and p(v jv ) t 1:t1 t 1:t1 calculated based on the dynamics of the model at are computed. The mean and covariance relating time t. These dynamics are contained in the ltered to p(h jv ) distributions are given by t 1:t1 distribution at time t. If the ltered distribution at = Af (5) t1 time t is not representative of the long-term trend, hh T h = AF A +  (6) t t1 long-term forecasts may be inaccurate. 5 3.4. Nonlinear and Non-Gaussian Filtering 4.2. Attractor Distribution and the Central Limit The proposed approach is to use an attractor dis- The Kalman lter is a closed form solution for tribution to draw the forecasts to the mean of a dis- a linear-Gaussian model. If a system is nonlin- tribution that approximates the central limit. Spall ear or non-Gaussian, approximate ltering methods & Wall (1984) proved the central limit theorem for such as the extended Kalman lter (EKF), the un- the Kalman lter under certain conditions. These scented Kalman lter (UKF) (Julier & Uhlmann, conditions include the standard Kalman lter as- 1997), or Monte Carlo methods such as the particle sumptions as well as uniform complete observability lter (Gordon et al., 1993) and ensemble Kalman and controllability. The intention of the study was lter (enKF) (Evensen, 1994) are required. In this to investigate the asymptotic nature of the Kalman study the EKF is used. The EKF approximates a lter. Aliev & Ozbek (1999) furthered this study nonlinear function by linearising around the current by investigating the convergence rate of the central state mean estimate (Zarchan & Muso , 2000). limit theorem for the Kalman lter. To approximate the mean of the central limit dis- 4. Mean Reversion and the Attractor Distri- tribution, the average over all ltered posterior dis- bution tributions (see Section 3.2) is computed up to time t. That is 4.1. Forecast Deviation In State-Space Models f  f : (14) 1 i i=1 State-space time series models are comprised of This approximation is used as the mean of the at- several distinct components such as trend, sea- tractor distribution. sonal, and noise (disturbances) (Durbin & Koop- man, 2012; Commandeur & Koopman, 2007; West It is also possible to compute a weighted average & Harrison, 1997; Hyndman et al., 2008; Harvey, where more emphasis is given to recent dynamics. 1990; Petris et al., 2009). The trend component A geometric progression can be used to obtain an is often represented in the form of a polynomial exponential weighted average as follows model. Especially models such as the rst-order- t ti f (1 ) i=1 f  ; (15) polynomial Dynamic Linear Model (DLM) per- ti (1 ) i=1 form well for relatively short-term forecasting but where  is some constant in the range 0 < can fail in longer term forecasts (West & Harri- 1. This provides a form of exponential smooth- son, 1997). Irregularities such as slowly varying ing (Brown, 1959; Holt, 1957; Winters, 1960) in the stochastic trends can shift the forecast trajectory mean reversion. o course. Mean reversion corrects the deviant fore- cast by drawing it back towards the attractor dis- 1 t ti Note that the form f   f (1 ) can be 1 i i=1 ti tribution mean. used if  and t are chosen such that  (1 )  1. i=1 6 4.3. Mean Reversion Through Filtering space of v . The matrix B can be manipulated to map only certain components from the latent To draw the forecast to the attractor distribu- variable space. Non-zero values can be placed in tion mean, it is proposed that the forecasts be l- B corresponding to components which should be tered with the attractor distribution as an observ- mean reverting in nature. For example, non-zero able variable. That is, set v = f as a pseudo- t 1 values could be placed in B corresponding to trend observation during forecasting. The ltered distri- components that should exhibit mean reversion be- bution can be written as (Thrun et al., 2005) haviour. Zeros can be placed in B corresponding p(h jv ) / p(v jh )p(h jv ) (16) t 1:t t t t 1:t1 to components which should not be mean revert- ing in nature. For example, seasonal components The rst term can be viewed as a likelihood of may be left to oscillate throughout a forecast. A the observation given the model state. The second demonstration of this is presented in Section 6. term can be viewed as a prior describing the pre- To de ne the measurement noise covariance dicted model state given previous observations. By for the attractor distribution pseudo-observations, using the attractor distribution as the observable consider that  represents a form of uncertainty variable, the likelihood describes the probability of of the observation. By adjusting the uncertainty, the attractor distribution given the current model the rate of convergence of the forecast to the at- state. If this likelihood is low, it implies a mismatch tractor distribution mean can be manipulated. The between what the model is forecasting and what is Kalman gain de nes the level of correction. Con- expected asymptotically. sider the representation of the Kalman gain in (13). To understand how ltering draws the forecast hh v The expression comprises B,  , and  . B is de- to the attractor distribution, consider the Kalman hh ned as discussed above and  is computed from lter update equation (10). The ltered mean is h the prediction. With these de ned, the Kalman the current prediction  , that is updated with a gain can thus be adjusted by manipulating  . If weighted di erence between observation v and the v  is set to zeros, indicating the extreme level of prediction  . The weighting factor for the error is certainty of v , the Kalman gain reduces as follows the Kalman gain. Equation (10) provides a mech- hh T hh T 1 anism to correct the model prediction with an ob- K = ( B )(B B + 0) t t servable variable v . If v is the attractor distribu- t t hh T T hh 1 1 = ( B )(B ( ) B ) t t tion, the forecast will be corrected according to the = B (17) attractor distribution. If K = B , the ltered mean in (10) reduces to 4.4. Parameters f = v , which is the attractor distribution mean. t t To de ne the emission matrix B for the attrac- If  is set to in nite values along its diagonal to tor distribution pseudo-observations, consider that indicate an extreme level of uncertainty of v , (10) B provides a mapping from the space of h to the reduces to f =  , which is mean proposed by t t 7 v the model. That is, with in nite values in  , the Modelling Water quality attractor distribution will be ignored. Forecasting Hydrophone By manipulating the uncertainty represented by Senaps Estimation , the level of correction of the forecasts is con- Spectral re ectance Visualisation trolled. This correction is performed over multiple Weather steps during ltering. The result is that the rate of Warning convergence of a forecast to the attractor distribu- Sensors Data storage Analytics tion mean is determined by  . Figure 2: Aquaculture prawn farm decision support system. 5. Datasets This study ts within a broader context of a system that is being developed for aquaculture temperature follow diurnal uctuations (Boyd & prawn farms. Several sensors have been deployed Tucker, 1998). Carbon dioxide (CO ) is continually into prawn ponds for monitoring water quality re- produced in the pond through respiration by organ- lated parameters. These sensors include water isms such as prawn and plankton. During the day, quality sensors, hydrophones, spectral re ectance, plant-based organisms use solar radiation for pho- and weather sensors. The sensor data is uploaded tosynthesis. Through photosynthesis, CO is ab- to a central cloud-based system (Senaps). Sev- sorbed and oxygen is released. Thus, DO increases eral decision support tasks are performed on the and CO decreases during the day. At night pho- stored data. The framework of the decision sup- tosynthesis ceases. The result is that DO decreases port system is illustrated in Figure 2. In this study, and CO increases at night. CO reacts with wa- 2 2 the modelling and forecasting of dissolved oxygen ter to form carbonic acid. Increased acidity reduces (DO), pH, and temperature in prawn ponds are the pH levels in the pond. Fluctuating CO thus considered. The mean reversion approach described causes uctuating pH. Furthermore, water temper- in this study is applied to data collected within this ature naturally uctuates with the changes in solar decision support system. radiation over a 24-hour period. The dataset used in this study comprises of DO, pH, and temperature readings taken from two Water quality variables may also vary in an ape- prawn ponds. The rst pond is a large 0.18ha grow- riodic manner (Boyd & Tucker, 1998). Irregular out pond and the second pond is a small 0.022ha variations may be caused by weather-related vari- nursery pond. The samples are taken at 15 minute ations and biological activity such as algal blooms. intervals over a period of 88 days. Such variations can produce the slow varying irreg- The datasets variables are seasonal in nature. ular or nonlinear uctuations that cause forecast Many water quality variables such as DO, pH and deviations. 8 6. Applied State-Space Models This matrix is converted to discrete time using a Laplace transform or the Taylor series expansion Dabrowski et al. (2018b) presented two models (Zarchan & Muso , 2000) for modelling water quality parameters in prawn 2 3 (At) (At) At ponds. The rst model is a LDS with a local linear A = e = I + At + + + 2! 3! trend component (constant velocity process) and a (19) seasonal component. The second model is a non- where t is the sample rate. linear model that provides a means to model the The emission matrix maps the elements from the seasonal amplitude using a local linear trend com- latent variable space to the observed variable space ponent. The UKF was used for inference in this according to (18). The emission matrix is thus non-linear model. These models will be used in this given by study, however the EKF algorithm will be used in- h i B = 1 0 1 0 stead of the UKF algorithm. The intention is to The attractor distribution is de ned to draw the improve the long-term (a week or more) forecasting forecasts to a xed mean of previously observed dy- capability of these models using the proposed mean namics. For the linear model, mean reversion is ap- reversion approach. plied to the trend component. Thus, the attractor 6.1. Linear Model distribution is de ned to approximate the central The observations of the linear model are mod- limit of . The following emission matrix for the elled with a seasonal, trend and noise component attractor distribution can thus be used as follows h i B = : 1 0 0 0 v = sin(!t) + +  (18) t t t In this form, mean reversion is only enforced on The seasonal component sin(!t) is modelled with and not on the seasonal component . a sinusoid with amplitude , the trend is mod- t t With the attractor distribution having a single elled with as a continuous local linear trend model, dimension, the variance  is a real number. The and the noise  is white Gaussian noise. The Let value is manually set to provide reasonable uncer- = sin(!t) such that (Dabrowski et al., 2018b) t t tainty bounds and to match the mean reversion set- h i _ tling time with the slowly varying irregular compo- h(t) = t t t t nent of the data. As discussed in Section 4, smaller The state transition matrix in continuous time, de- values provide quicker settling times and narrower noted by A is then given by 2 3 uncertainty bounds. Larger values provide slower 0 1 0 0 6 7 settling times and wider uncertainty bounds. Suit- 6 7 60 0 0 07 6 7 able values can generally be found with a brief A = 6 7 6 7 0 0 0 1 search over the sequence 10 ; i 2 Z and further re- 4 5 0 0 ! 0 ned if necessary. A search can also be conducted 9 using repeated random subsampling validation ap- That is, B is given by the Jacobian h i proaches. @b(h) @b(h) @b(h) @b(h) @b(h) @b(h) B = @ @ _ @ @ _ @ sin(!t) @ cos(!t) t t t t h i 6.2. Nonlinear Model 1 0 sin(!t) 0 0 The linear model is independent of the sinusoidal With this approximation to B, the standard amplitude in (18) (Dabrowski et al., 2018b). In- Kalman lter equations given in Section 3.2 can cluding the amplitude as a component in the state- be used. The proposed mean reversion approach is space representation results in a nonlinear model. thus directly applicable. The amplitude is modelled as a latent variable with For the nonlinear model, the datasets are as- a constant velocity process such that sumed to approach a xed mean o set and a xed h i h(t) = _ _ sin(!t) cos(!t) t t t t mean seasonal amplitude. The attractor distribu- tion thus approximates the central limit of as The state transition matrix in continuous time is well as . The emission matrix for the attractor given by 2 3 distribution is given by 0 1 0 0 0 0 6 7 2 3 6 7 60 0 0 0 0 07 1 0 0 0 0 0 6 7 4 5 B = : 6 7 6 7 0 0 0 1 0 0 0 0 1 0 0 0 6 7 A = 6 7 6 7 0 0 0 0 0 0 6 7 With a two-dimensional attractor distribution, 6 7 6 7 0 0 0 0 0 1 4 5 the variance  is a two-dimensional matrix. This 0 0 0 0 ! 0 matrix is con gured for an isotropic Gaussian This matrix is converted to discrete time using (19) with elements along the diagonal. These elements The trend element is added to a product of the are manually chosen according to the uncertainty amplitude and sinusoidal elements as indicated in bounds and the slowly varying irregular component (18). This results in a nonlinear emission model. of the data. Let b(h ) = sin(!t) + such that t t t 7. State-Space Models Results v = b(h ) + t t 7.1. Methodology The EKF approach is to approximate the nonlin- ear function b(h ) as a linearisation around the cur- t The datasets are resampled to three samples per rent state estimate. This linear approximation is day according to (Dabrowski et al., 2018b). Re- the tangent to b(h ) at the current state estimate. t sampling simulates handheld sensor readings taken Thus, the emission matrix is given by (Zarchan & by farmers, where samples are extracted at 05h00, Muso , 2000) 12h00, and 20h30. Although only 3 of the 96 sam- @b(h) ples per day are available, the sample rate in the B = @h h=f t models remains at 96 samples. The remaining 93 10 (NRMSE) is given by Dataset Samples Time Frequency DO 1200 12.5 days 15 min 1 (y y ^ ) i i i=1 =  100% (20) pH 1000 10.4 days 15 min nrmse y y max min Temperature 1100 11.5 days 15 min where y and y are the maximum and mini- max min Table 1: Forecast horizon in number of samples as well as mum dataset values respectively. The NMSE for a time for the datasets used in this demonstration. The last single sample i is given by column provides the sample rate of the sensor used to gather the dataset. Forecast horizons are determined by the selected (y y ^ ) i i =  100% (21) nrmse in ection point in the data. y y max min 7.2. Linear Model Results Plots of the forecasts for the linear model are samples are treated as missing values that are esti- presented in Figure 3. The horizontal axes describe mated through ltering and smoothing in the state the sample number. Without mean reversion, the space models. Forecasts are performed and evalu- forecast trends deviate from the ground truth as il- ated over all 96 samples per day. lustrated in Figure 3a. These deviations are due the The time series dataset is split into a training and in ection point in the long-term trend from which test set. Filtering is performed on the training set. the forecasts extend. Reasonable forecasts are ob- The attractor distributions are obtained from these tained up to the end of the rst seasonal cycle where ltered results. Forecasts are evaluated on the test variations in the true trend are minimal. After the set. The location of the split between the train- rst cycle, the forecasts begin to deviate as the true ing and test sets is speci cally chosen around some trend changes in a non-linear or stochastic manner. form of in ection point. At these in ection points, As indicated in Figure 3b, enforcing mean rever- a model without mean reversion is more likely to sion provides signi cant improvements to long term deviate from the global trend. forecasts. Mean reversion draws the deviant fore- casts back towards the average of the previously The forecasts are made over multiple steps to pro- observed dynamics. vide long-term forecasts. The number of samples The blue lled regions plot the standard devi- over which the forecasts are made are provided in ation of the posterior ltered distribution. This Table 1. represents the uncertainty in the forecast. As ex- The normalised root mean squared error is used pected, the mean reversion reduces magnitude of to provide an evaluation of the error between the the standard deviation through the pseudo obser- forecast result and the measured data. Let y ^ de- vations from the attractor distribution. The level to note the forecast and let y denote the true value which the pseudo-observations a ect the standard of some time series at time t. For a forecast over deviation depends on the attractor distribution co- N samples, the normalised root mean squared error variance  . 11 The plots for the pH dataset in Figure 3b pro- Dataset Without MR With MR vide insight into the limitations of the mean rever- DO 29.69 16.68 sion approach. The long-term forecasts settle to pH 116.69 21.90 Temperature: 31.09 16.20 the attractor distribution mean, while the uctu- ations in the trend continue to vary. That is, the Table 2: NRMSE of the linear model with and without mean slowly-varying uctuations of the data are not per- reversion (MR) over the entire forecast presented in Figure 6. fectly modelled. These uctuations are treated as stochastic variations, where there is no determinis- mean reversion. tic function to model them. Instead, they are mod- A plot of the per-sample NRMSE error (equa- elled by the xed attractor distribution. Note how- tion (21)) for the forecast is plotted in Figure 5. ever that the forecast over the rst ve days (480 The error for the model without mean reversion in- samples) is still accurate and is a signi cant im- creases over the forecast time. This demonstrates provement over the model without mean reversion. that the forecast deviates from the ground truth with increasing forecast reach. For the model with A plot of the linear model's latent variables for mean reversion, the error remains relatively con- the dissolved oxygen dataset is presented in Fig- stant over the entire forecast. This demonstrates ure 4. Mean reversion is applied to the trend com- that the model performs equally well at short and ponent . Without mean reversion, the trend of t long-term forecasting. This is especially remarkable the forecast continues linearly with a steep gradi- as the model is forecasting more than 1000 steps- ent. Mean reversion causes the trend to curve back ahead in time. towards the attractor distribution mean. By in- 7.3. Nonlinear Model Results creasing  , the time it takes for the curve to settle can be increased. Decreasing  results in a quicker Plots of the forecasts for the nonlinear model are settling time. presented in Figure 6. As for the linear model, Mean reversion is not applied to the sinusoidal mean reversion provides signi cant improvement in component, . The seasonal oscillation thus con- the forecasts and reduces the uncertainty in the tinues throughout the forecast. This demonstrates forecast. the key feature of the model where mean reversion As illustrated in Figure 6a, the oscillation com- is applied to one speci c component in the model. ponent decays over the forecast of the DO dataset. The NRMSE over the complete forecast for all This follows the trend in the data leading up to datasets is presented in Table 2. The results show the forecast, where the oscillation amplitude is de- that mean reversion produces signi cant improve- creasing. The trend in the data however does not ments in forecast ability. Though the RMSE for the continue decreasing as it does in the forecast. Mean mean reversion in the pH dataset is high, it is a sig- reversion is thus applied to both the trend compo- ni cant improvement over the linear model without nent and the amplitude component . The re- t t 12 10 6000 6500 7000 7500 8000 8500 6000 6500 7000 7500 8000 8500 9.0 8.4 7.8 6500 7000 7500 8000 8500 6500 7000 7500 8000 8500 ◦ ◦ 6500 7000 7500 8000 8500 6500 7000 7500 8000 8500 (a) Linear model forecasts without mean reversion (b) Linear model forecasts with mean reversion Figure 3: Linear model forecasts of the dissolved oxygen (mg=l), pH, and temperature ( C) over sample indexes. The red line is a plot of the forecast and the blue lled region is a plot of the forecast standard deviation. The dark grey line is a plot of the sensor data sampled at 15 minute intervals, and the light grey markers indicate sub-samples extracted at 05h00, 12h00, and 20h30. The vertical grey dotted line indicates the start of the forecast. Only the last portion of the historical data are shown. data data 10 10 t f sin(ωt) sin(ωt) − 5 − 5 0 2000 4000 6000 8000 0 2000 4000 6000 8000 (a) Latent variables for the linear model without mean (b) Latent variables for the linear model with mean reversion. reversion. Figure 4: Plots of the data, ltered mean f , the trend component , and the sinusoidal component sin(!t) for the linear t t model on the dissolved oxygen dataset over the sample index. The gaps in the data plots are due to missing data. Temp. ( C) DO (mg/l) pH pH Temp. ( C) DO (mg/l) 50 0 0 7250 7500 7750 8000 8250 8500 7250 7500 7750 8000 8250 8500 7600 7800 8000 8200 8400 7600 7800 8000 8200 8400 0 0 7400 7600 7800 8000 8200 8400 7400 7600 7800 8000 8200 8400 (a) NRMSE for the linear model without mean rever- (b) NRMSE for the linear model with mean reversion. sion. Figure 5: Per-sample NRMSE (equation (21)) for the linear model forecasts on the DO dataset presented in Figure 3. 6000 6500 7000 7500 8000 8500 6000 6500 7000 7500 8000 8500 10.0 8.8 7.5 8.0 5.0 6500 7000 7500 8000 8500 6500 7000 7500 8000 8500 ◦ ◦ 6500 7000 7500 8000 8500 6500 7000 7500 8000 8500 (a) Nonlinear model forecasts without mean reversion (b) Nonlinear model forecasts with mean reversion Figure 6: Nonlinear model forecasts of dissolved oxygen (mg=l), pH, and temperature ( C) over sample indexes. The red line is a plot of the forecast and the blue lled region is a plot of the forecast standard deviation. The dark grey line is a plot of the sensor data sampled at 15 minute intervals, and the light grey markers indicate sub-samples extracted at 05h00, 12h00, and 20h30. The vertical grey dotted line indicates the start of the forecast. Only the last portion of the historical data are shown. sult is that both of these components are corrected compared to the linear model. This is expected as to provide a more accurate forecast. and sin(!t) are separated in the nonlinear model, whereas in the linear model, they are combined into A plot of the latent variables for the DO dataset a single component. Both the trend and ampli- are presented in Figure 7. The amplitude of the tude components are a ected by the in ection sin(!t) component remains fairly constant when pH pH (%) Temp. (%) DO (%) Temp. ( C) DO (mg/l) pH Temp. (%) pH (%) DO (%) Temp. ( C) DO (mg/l) DLM model is a free-form seasonal model (West & Dataset Without MR With MR Harrison, 1997) with a rst order trend component DO 25.12 14.44 as used in the LDS. Mean reversion using equation pH 87.89 21.84 Temperature: 64.48 16.15 (14) and weighted mean reversion using equation (15) is applied to the trend components in the LDS Table 3: NRMSE of the nonlinear model with and without and DLM models. The weighted mean reversion is mean reversion (MR) over the forecast presented in Figure 6. applied with  = 0:1. In tables and gures, models using mean reversion and weighted mean reversion point in the data where the forecast begins. They are denoted by a `MR' and a `WMR' subscript re- both veer o with a steep gradient. Mean rever- spectively. sion is applied to correct and , and draw them t t The SARIMA(5,1,3)(0,1,0)96 model is used on back to the mean. The seasonal component is left all datasets. The model order was chosen accord- to oscillate throughout the forecast. ing to autocorrelation and partial autocorrelation The NRMSE over the entire forecast for all plots. The Prophet model is con gured with a lin- datasets is presented in Table 3. As for the linear ear growth trend, an additive daily seasonal com- model, the mean reversion reduces the error. Com- ponent, and an interval width of 0.8. paring the linear model results in Table 2 and the The set of models are compared on the dissolved nonlinear model results in Table 3, it is clear that oxygen, pH, and temperature datasets. In this com- the nonlinear model achieves the best results. The parison, the datasets are not resampled as was done nonlinear model is however a more complex model. in section 7. All 96 samples per day are used in all models. Each model provides a 10 day (960 sam- A plot of the per-sample NRMSE error (equation ple) forecast from the set of 10 pre-selected random (21)) is presented in Figure 8. As for the linear starting points. Ten days is selected as it represents model, mean reversion reduces the error in the long- a reasonable long-term forecast in this application. term forecasts. The average NRMSE over the 10 forecasts for each model and dataset are presented in Table 4. 8. Time Series Model Comparison The LDS performs poorly over a long-term fore- A comparison between a LDS (Dabrowski et al., cast. However, when using the mean reversion, the 2018b), a dynamic linear model (DLM) (West forecast is signi cantly improved. Using weighted & Harrison, 1997), a seasonal autoregressive in- mean reversion provides further improvements on tegrated moving average (SARIMA) model, and the pH and temperature datasets. Facebook's Prophet model (Taylor & Letham, The DLM generally does better than the LDS. It 2018) is performed. is a more complex model and is able to provide a The linear LDS model of (Dabrowski et al., https://www.statsmodels.org 2018b) is used as described in section 6.1. The https://facebook.github.io/prophet/ 15 data data 10 10 5 5 f f t t 5 5 γ γ t t 7.5 7.5 5.0 5.0 7.5 α α t t 5.0 0 2.5 0.5 0.5 sin(ωt) sin(ωt) 0.0 0.0 − 0.5 − 0.5 0 2000 4000 6000 8000 0 2000 4000 6000 8000 (a) Latent variables for the nonlinear model without (b) Latent variables for the nonlinear model with mean mean reversion. reversion. Figure 7: Plots of the data, ltered mean f , the trend component , the sinusoidal component sin(!t), and the amplitude t t component for the nonlinear model over the sample index. The gaps in the data plots are due to missing data. 0 0 7250 7500 7750 8000 8250 8500 7250 7500 7750 8000 8250 8500 0 0 7600 7800 8000 8200 8400 7600 7800 8000 8200 8400 0 0 7400 7600 7800 8000 8200 8400 7400 7600 7800 8000 8200 8400 (a) NRMSE for the nonlinear model without mean re- (b) NRMSE for the nonlinear model with mean rever- version. sion. Figure 8: Per-sample NRMSE (equation (21)) for the nonlinear model forecasts on the DO dataset presented in Figure 6. Temp. (%) pH (%) DO (%) Temp. (%) pH (%) DO (%) Dataset LDS LDS LDS DLM DLM DLM SARIMA Prophet MR WMR MR WMR DO 33.51 14.41 15.98 25.41 10.81 11.08 15.27 16.07 pH 60.14 35.03 27.61 61.74 34.36 24.76 65.13 25.61 Temperature 107.92 38.14 34.33 104.80 36.83 31.86 109.26 71.12 Average 67.19 29.19 25.97 63.98 27.33 22.56 63.22 37.60 Table 4: Average NRMSE error (%) over ten 960-step-ahead forecasts for the set of models and datasets. Mean reversion is denoted by MR. Weighted mean reversion is denoted by WMR. more re ned representation of the seasonal curves. This increased complexity comes at a signi cant cost with a 97-dimensional state vector. This can be problematic in hardware where computational power and memory are limited. In comparison with the DLM, the LDS has a 4-dimensional state vec- tor. The LDS thus performs surprisingly well in comparison. The DLM with weighted mean reversion provides the lowest average NRMSE results over all datasets. Other than the pH dataset, the other mean re- version model variants take the second, third and Figure 9: Box-whisker plots comparing the set of models fourth place. For the pH dataset, the Prophet over each dataset for the NRMSE results. model provides highly competitive results and takes second place. The SARIMA model performs well on mean reversion, the LDS and DLM models produce the dissolved oxygen dataset, otherwise it provides results with high NRMSE values and large boxes. similar results to the DLM and LDS models. The large boxes indicate a high variation in the The SARIMA model has rst order di erencing forecast accuracy. Introducing mean reversion or and the Prophet model has a linear growth trend. weighted mean reversion both increases accuracy These components function as linear trend compo- and reduces variation in the forecasts. The result nents. Thus, like state-space models, the SARIMA is a more robust model. and Prophet models are susceptible to forecast de- For the pH and temperature datasets, the viations. Given this, the Prophet model performs DLM model produces boxes which are be- WMR remarkably well. low the LDS, DLM, SARIMA, and Prophet model To illustrate the robustness of the models and the boxes. This indicates some level of statistical signif- statistical signi cance of the results, box-whisker icance that the DLM outperforms these mod- WMR plots are presented in Figure 9. In the absence of els. LDS LDS MR LDS WMR DLM DLM MR DLM WMR SARIMA Prophet Temp. (%) pH (%) DO (%) The computation times are presented in Table 5. the prawn pond water quality dataset is presented. These times include the parameter estimation as The results demonstrate that the lowest errors are well as the forecasting operations. All models are obtained when weighted mean reversion is used in implemented in Python and run on a Dual-Core In- the DLM. tel i5 processor. The mean reversion increases the A limitation of the attractor distribution is that processing time as the pseudo samples are required it is stationary. The result is that the long-term to be calculated. Weighted mean reversion further forecast is drawn to a xed mean. In future work, increases computational complexity resulting in fur- a non-stationary attractor distribution could be in- ther increased processing times. Weighted mean vestigated. The result would be that the forecast reversion in the LDS is still however quicker than would be drawn to a particular dynamic rather than the Prophet and SARIMA models. The SARIMA a xed mean. Future work could also include an model has the highest processing time, which is investigation into estimating the attractor distri- primarily due to the parameter estimation opera- bution covariance matrix  using the expectation tion. Compared with the DLM, the Prophet model maximisation algorithm. is more computationally ecient. Finally, though the proposed approach is demon- strated on an aquaculture problem, it is applicable to other problems with similar properties. Future 9. Summary and Conclusion work could include testing the approach on prob- In this study a novel mean reversion approach lems such as weather-related forecasting, electricity is presented for state-space models. The mean re- load forecasting, algal bloom forecasting, and other version is performed using an attractor distribution environmental applications with seasonal data. with a Gaussian form. The mean of this distribu- tion is approximated by the average ltered esti- References mate over previously observed samples. This mean Ahmed, A. M. (2017). Prediction of dissolved oxy- provides an approximation of the average dynamics gen in surma river by biochemical oxygen de- over the sequence. To draw a forecast towards the mand and chemical oxygen demand using the mean, ltering is applied with pseudo-observations arti cial neural networks (anns). Journal of King Saud University - Engineering Sciences , obtained from attractor distribution. The result is 29 , 151 { 158. URL: http://www.sciencedirect. that the forecast converges to the attractor distri- com/science/article/pii/S1018363914000385. bution mean in the limit. doi:https://doi.org/10.1016/j.jksues.2014.05.001. We demonstrate the approach with a linear and Aliev, F. A., & Ozbek, L. (1999). Evaluation of conver- gence rate in the central limit theorem for the kalman l- nonlinear LDS in a prawn pond water quality fore- ter. IEEE Transactions on Automatic Control , 44 , 1905{ casting application. Results show a signi cant im- 1909. doi:10.1109/9.793734. provement in long-term forecasts. Furthermore, a Andrawis, R. R., Atiya, A. F., & El-Shishiny, H. comparison between various time series models on (2011). Combination of long term and short term 18 Dataset LDS LDS LDS DLM DLM DLM SARIMA Prophet MR WMR MR WMR DO 1.96 3.61 5.12 31.83 64.38 97.17 556.43 15.22 pH 1.85 3.47 5.18 32.1 64.38 97.22 87.85 17.73 Temperature 2.16 3.95 5.61 31.6 64.53 96.46 190.23 17.51 Table 5: Average processing time in seconds over ten 960-step-ahead forecasts for the set of models and datasets. Mean reversion is denoted by MR. Weighted mean reversion is denoted by WMR. forecasts, with application to tourism demand forecast- article/pii/S0957417416303098. doi:https://doi.org/ ing. International Journal of Forecasting , 27 , 870 10.1016/j.eswa.2016.06.028. { 886. URL: http://www.sciencedirect.com/science/ Commandeur, J. J., & Koopman, S. J. (2007). An Intro- article/pii/S0169207010001147. doi:https://doi.org/ duction to State Space Time Series Analysis . Practical 10.1016/j.ijforecast.2010.05.019. Special Section 1: Econometrics. Oxford University Press. Forecasting with Arti cial Neural Networks and Compu- Cox, J. C., Ingersoll, J. E., & Ross, S. A. (1985). A theory tational Intelligence Special Section 2: Tourism Forecast- of the term structure of interest rates. Econometrica , 53 , ing. 385{407. URL: http://www.jstor.org/stable/1911242. Barber, D. (2012). Bayesian Reasoning and Machine Learn- Dabrowski, J. J., Rahman, A., & George, A. (2018a). ing . Bayesian Reasoning and Machine Learning. Cam- Prediction of dissolved oxygen from ph and water tem- bridge University Press. perature in aquaculture prawn ponds. In Proceedings Basant, N., Gupta, S., Malik, A., & Singh, K. P. (2010). of the Australasian Joint Conference on Arti cial In- Linear and nonlinear modeling for simultaneous predic- telligence - Workshops AIW'18 (pp. 2{6). New York, tion of dissolved oxygen and biochemical oxygen de- NY, USA: ACM. URL: http://doi.acm.org/10.1145/ mand of the surface water a case study. Chemo- 3314487.3314488. doi:10.1145/3314487.3314488. metrics and Intelligent Laboratory Systems , 104 , 172 Dabrowski, J. J., Rahman, A., George, A., Arnold, S., & { 180. URL: http://www.sciencedirect.com/science/ McCulloch, J. (2018b). State space models for forecast- article/pii/S0169743910001449. doi:https://doi.org/ ing water quality variables: An application in aquacul- 10.1016/j.chemolab.2010.08.005. ture prawn farming. In Proceedings of the 24th ACM Box, G., Jenkins, G., Reinsel, G., & Ljung, G. (2015). Time SIGKDD International Conference on Knowledge Dis- Series Analysis: Forecasting and Control . Wiley Series covery &#38; Data Mining KDD '18 (pp. 177{185). New in Probability and Statistics. Wiley. York, NY, USA: ACM. URL: http://doi.acm.org/10. Boyd, C. E., & Tucker, C. S. (1998). Pond aquaculture water 1145/3219819.3219841. doi:10.1145/3219819.3219841. quality management . Springer US. doi:https://doi.org/ Dogan, E., Sengorur, B., & Koklu, R. (2009). Mod- 10.1007/978-1-4615-5407-3. eling biological oxygen demand of the melen river Brown, R. (1959). Statistical Forecasting for Inventory Con- in turkey using an arti cial neural network tech- trol . McGraw-Hill. URL: https://books.google.com.au/ nique. Journal of Environmental Management , 90 , 1229 books?id=QSYnAAAAMAAJ. doi:https://doi.org/10.1016/ { 1235. URL: http://www.sciencedirect.com/science/ j.ijforecast.2003.09.015. article/pii/S0301479708001588. doi:https://doi.org/ de Canete, J. F., Saz-Orozco, P. D., Baratti, R., Mu- 10.1016/j.jenvman.2008.06.004. las, M., Ruano, A., & Garcia-Cerezo, A. (2016). Soft- Duan, W., He, B., Nover, D., Yang, G., Chen, W., Meng, H., sensing estimation of plant euent concentrations in a Zou, S., & Liu, C. (2016). Water quality assessment and biological wastewater treatment plant using an optimal pollution source identi cation of the eastern poyang lake neural network. Expert Systems with Applications , 63 , basin using multivariate statistical methods. Sustainabil- 8 { 19. URL: http://www.sciencedirect.com/science/ ity , 8 . URL: https://www.mdpi.com/2071-1050/8/2/133. 19 doi:10.3390/su8020133. 10.1016/j.ecoleng.2011.06.022. Durbin, J., & Koopman, S. (2012). Time Series Analysis Holt, C. (1957). Forecasting seasonals and trends by expo- by State Space Methods volume 38 of Oxford Statistical nentially weighted averages (onr memorandum 52/1957). Science Series . (2nd ed.). Oxford University Press. Carnegie Institute of Technology , . Evensen, G. (1994). Sequential data assimilation with Hyndman, R., Koehler, A., Ord, J., & Snyder, R. a nonlinear quasi-geostrophic model using monte (2008). Forecasting with Exponential Smoothing: The carlo methods to forecast error statistics. Journal State Space Approach . Springer Series in Statistics. of Geophysical Research: Oceans , 99 , 10143{10162. Berlin, Heidelberg: Springer Berlin Heidelberg. URL: URL: https://agupubs.onlinelibrary.wiley.com/ https://doi.org/10.1007/978-3-540-71918-2. doi:10. doi/abs/10.1029/94JC00572. doi:10.1029/94JC00572. 1007/978-3-540-71918-2. arXiv:https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.10 Julier, 29/94JC00572 S. J., & Uhlmann, . J. K. (1997). New extension of the Ginot, V., & Herv e, J.-C. (1994). Estimating kalman lter to nonlinear systems. In Signal processing, the parameters of dissolved oxygen dynamics in sensor fusion, and target recognition VI (pp. 182{194). shallow ponds. Ecological Modelling , 73 , 169 SPIE volume 3068. URL: http://dx.doi.org/10.1117/ { 187. URL: http://www.sciencedirect.com/ 12.280797. doi:10.1117/12.280797. science/article/pii/0304380094900612. doi:https: Kalman, R. E. (1960). A new approach to linear ltering //doi.org/10.1016/0304-3800(94)90061-2. and prediction problems. Journal of basic Engineering , Gooijer, J. G. D., & Hyndman, R. J. (2006). 25 years of time 82 , 35{45. series forecasting. International Journal of Forecasting , Kandil, M., El-Debeiky, S., & Hasanien, N. (2001). Overview 22 , 443 { 473. URL: http://www.sciencedirect.com/ and comparison of long-term forecasting techniques for science/article/pii/S0169207006000021. doi:https:// a fast developing utility: part i. Electric Power Systems doi.org/10.1016/j.ijforecast.2006.01.001. Twenty Research , 58 , 11 { 17. URL: http://www.sciencedirect. ve years of forecasting. com/science/article/pii/S0378779601000979. Gordon, N. J., Salmond, D. J., & Smith, A. F. M. (1993). doi:https://doi.org/10.1016/S0378-7796(01)00097-9. Novel approach to nonlinear/non-gaussian bayesian state Lu, Z., & Piedrahita, R. H. (1996). Stochastic Modeling of estimation. IEE Proceedings F - Radar and Signal Pro- temperature and dissolved oxygen in strati ed sh ponds . cessing , 140 , 107{113. doi:10.1049/ip-f-2.1993.0015. Technical Report Department of Biological and Agricul- Granger, C. W., & Jeon, Y. (2007). Long-term forecast- tural Engineering University of California, Davis, USA. ing and evaluation. International Journal of Forecasting , Thirteenth Annual Technical Report. 23 , 539 { 551. URL: http://www.sciencedirect.com/ Madsen, H. I., Vollertsen, J., & Hvitved-Jacobsen, T. (2007). science/article/pii/S0169207007000908. doi:https:// Modelling the oxygen mass balance of wet detention ponds doi.org/10.1016/j.ijforecast.2007.07.002. receiving highway runo . In G. M. Morrison, & S. Rauch Grewal, M., & Andrews, A. (2015). Kalman Filtering: The- (Eds.), Highway and Urban Environment (pp. 487{497). ory and Practice with MATLAB . John Wiley & Sons, Dordrecht: Springer Netherlands. Inc. doi:10.1002/9781118984987. Murphy, K. P. (2012). Machine learning: a probabilistic Harvey, A. C. (1990). Forecasting, Structural Time Series perspective . MIT press. Models and the Kalman Filter . Cambridge University Olyaie, E., Abyaneh, H. Z., & Mehr, A. D. (2017). Press. doi:10.1017/CBO9781107049994. A comparative analysis among computational intel- He, J., Chu, A., Ryan, M. C., Valeo, C., & Zaitlin, B. ligence techniques for dissolved oxygen prediction (2011). Abiotic in uences on dissolved oxygen in a in delaware river. Geoscience Frontiers , 8 , 517 riverine environment. Ecological Engineering , 37 , 1804 { 527. URL: http://www.sciencedirect.com/science/ { 1814. URL: http://www.sciencedirect.com/science/ article/pii/S1674987116300469. doi:https://doi.org/ article/pii/S0925857411002096. doi:https://doi.org/ 10.1016/j.gsf.2016.04.007. 20 Pavliotis, G. A. (2014). The fokker{planck equation. In doi/abs/10.1061/(ASCE)1084-0699(2006)11:2(188). Stochastic Processes and Applications: Di usion Pro- doi:10.1061/(ASCE)1084-0699(2006)11:2(188). cesses, the Fokker-Planck and Langevin Equations (pp. arXiv:https://ascelibrary.org/doi/pdf/10.1061/(ASCE)1084-0699(2006)11:2(188). 87{137). New York, NY: Springer New York. URL: https: Shi, P., Li, G., Yuan, Y., Huang, G., & Kuang, L. (2019). //doi.org/10.1007/978-1-4939-1323-7_4. doi:10.1007/ Prediction of dissolved oxygen content in aquaculture us- 978-1-4939-1323-7_4. ing clustering-based softplus extreme learning machine. Petris, G., Petrone, S., & Campagnoli, P. (2009). Dynamic Computers and Electronics in Agriculture , 157 , 329 Linear Models with R. Use R! New York, NY: Springer { 338. URL: http://www.sciencedirect.com/science/ New York. URL: https://doi.org/10.1007/b135794_1. article/pii/S0168169918310421. doi:https://doi.org/ doi:10.1007/b135794_1. 10.1016/j.compag.2019.01.004. Rankovi c, V., Radulovi c, J., Radojevi c, I., Ostoji c, A., & Soman, S. S., Zareipour, H., Malik, O., & Mandal, P. (2010). Comi c, L. (2010). Neural network modeling of dissolved A review of wind power and wind speed forecasting meth- oxygen in the grua reservoir, serbia. Ecological Modelling , ods with di erent time horizons. In North American 221 , 1239 { 1244. URL: http://www.sciencedirect.com/ Power Symposium 2010 (pp. 1{8). doi:10.1109/NAPS. science/article/pii/S0304380009008692. doi:https:// 2010.5619586. doi.org/10.1016/j.ecolmodel.2009.12.023. Spall, J. C., & Wall, K. D. (1984). Asymptotic distri- Rao, T., Rao, S., & Rao, C. (2012). Time Se- bution theory for the kalman lter state estimator. ries Analysis: Methods and Applications vol- Communications in Statistics - Theory and Meth- ume 30 of Handbook of Statistics . Else- ods , 13 , 1981{2003. URL: https://doi.org/10.1080/ vier. URL: https://www.elsevier.com/books/ 03610928408828808. doi:10.1080/03610928408828808. time-series-analysis-methods-and-applications/ arXiv:https://doi.org/10.1080/03610928408828808. rao/978-0-444-53858-1. Ta, X., & Wei, Y. (2018). Research on a dissolved Ren, Q., Zhang, L., Wei, Y., & Li, D. (2018). oxygen prediction method for recirculating aquacul- A method for predicting dissolved oxygen in aqua- ture systems based on a convolution neural network. culture water in an aquaponics system. Com- Computers and Electronics in Agriculture , 145 , 302 puters and Electronics in Agriculture , 151 , 384 { 310. URL: http://www.sciencedirect.com/science/ { 391. URL: http://www.sciencedirect.com/science/ article/pii/S016816991730786X. doi:https://doi.org/ article/pii/S0168169918303181. doi:https://doi.org/ 10.1016/j.compag.2017.12.037. 10.1016/j.compag.2018.06.013. Taylor, S. J., & Letham, B. (2018). Forecasting Robertson, C. (Ed.) (2006). Australian prawn farming man- at scale. The American Statistician , 72 , 37{ ual: health management for pro t . Queensland Depart- 45. URL: https://doi.org/10.1080/00031305. ment of Primary Industries and Fisheries (QDPI&F). 2017.1380080. doi:10.1080/00031305.2017.1380080. Ruiz, L., Rueda, R., Cullar, M., & Pegalajar, M. arXiv:https://doi.org/10.1080/00031305.2017.1380080. (2018). Energy consumption forecasting based Thrun, S., Burgard, W., & Fox, D. (2005). Probabilistic on elman neural networks with evolutive opti- Robotics . Intelligent robotics and autonomous agents. mization. Expert Systems with Applications , 92 , MIT Press. 380 { 389. URL: http://www.sciencedirect. Vasicek, O. (1977). An equilibrium characterization of the com/science/article/pii/S0957417417306565. term structure. Journal of Financial Economics , 5 , 177 doi:https://doi.org/10.1016/j.eswa.2017.09.059. { 188. URL: http://www.sciencedirect.com/science/ Schmid, B. H., & Koskiaho, J. (2006). Arti cial neural article/pii/0304405X77900162. doi:https://doi.org/ network modeling of dissolved oxygen in a wetland pond: 10.1016/0304-405X(77)90016-2. The case of hovi, nland. Journal of Hydrologic Engi- West, M., & Harrison, J. (1997). Bayesian Forecast- neering , 11 , 188{192. URL: https://ascelibrary.org/ ing and Dynamic Models . Springer Series in Statistics. 21 Springer New York. URL: https://doi.org/10.1007/ 0-387-22777-6. doi:10.1007/0-387-22777-6. Winters, P. R. (1960). Forecasting sales by ex- ponentially weighted moving averages. Manage- ment Science , 6 , 324{342. URL: https://doi.org/ 10.1287/mnsc.6.3.324. doi:10.1287/mnsc.6.3.324. arXiv:https://doi.org/10.1287/mnsc.6.3.324. Xu, L., Liu, S., & Li, D. (2017). Prediction of water temperature in prawn cultures based on a mechanism model optimized by an improved arti cial bee colony. Computers and Electronics in Agriculture , 140 , 397 { 408. URL: http://www.sciencedirect.com/science/ article/pii/S0168169916305191. doi:https://doi.org/ 10.1016/j.compag.2017.05.034. Xu, Z., & Xu, Y. J. (2016). A deterministic model for predicting hourly dissolved oxygen change: Development and application to a shallow eutrophic lake. Water , 8 . URL: http://www.mdpi.com/2073-4441/8/2/41. doi:10. 3390/w8020041. Zarchan, P., & Muso , H. (2000). Fundamentals of Kalman Filtering: A Practical Approach . Number v. 190, pt. 1 in Fundamentals of Kalman Filtering: A Practical Ap- proach. American Institute of Aeronautics and Astronau- tics, Incorporated. Zhang, G., Patuwo, B. E., & Hu, M. Y. (1998). Fore- casting with arti cial neural networks: The state of the art. International Journal of Forecasting , 14 , 35 { 62. URL: http://www.sciencedirect.com/science/ article/pii/S0169207097000447. doi:https://doi.org/ 10.1016/S0169-2070(97)00044-7. Zhang, G., & Qi, M. (2005). Neural network fore- casting for seasonal and trend time series. Eu- ropean Journal of Operational Research , 160 , 501 { 514. URL: http://www.sciencedirect.com/science/ article/pii/S0377221703005484. doi:https://doi.org/ 10.1016/j.ejor.2003.08.037. Decision Support Systems in the Internet Age. Zhang, Y., Fitch, P., Vilas, M. P., & Thorburn, P. J. (2019). Applying multi-layer arti cial neural network and mutual information to the prediction of trends in dissolved oxygen. Frontiers in Environmental Science , 7 , 46. URL: https://www.frontiersin.org/article/10. 3389/fenvs.2019.00046. doi:10.3389/fenvs.2019.00046.

Journal

StatisticsarXiv (Cornell University)

Published: Feb 26, 2020

References