Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Determination of egg storage time at room temperature using a low-cost NIR spectrometer and machine learning techniques

Determination of egg storage time at room temperature using a low-cost NIR spectrometer and... Nowadays, consumers are more concerned about freshness and quality of food. Poultry egg storage time is a freshness and quality indicator in industrial and consumer applications, even though egg marking is not always required outside the European Union. Other authors have already published works using expensive laboratory equipment in order to determine the storage time and freshness in eggs. Oppositely, this paper presents a novel method based on low-cost devices for rapid and non-destructive prediction of egg storage time at room temperature (23±1°C). H&N brown flock with 49-week-old hens were used as source for the sampled eggs. Those samples were daily scanned with a low-cost smartphone-connected near infrared reflectance (NIR) spectrometer for a period of 22 days starting to run from the egg laid. The resulting dataset of 660 samples was randomly splitted according to a 10- fold cross validation in order to be used in a contrast and optimization process of two machine learning algorithms. During the optimization, several models were tested to develop a robust calibration model. The best model use a Savitzky Golay preprocessing technique, and an Artificial Neural Network with ten neurons in one hidden layer. Regressing the storage time of the eggs, tests achieved a coefficient of determination (R-squared) of 0.8319±0.0377 and a root mean squared error (RMSE) of 1.97. Although further work is needed, this technique has shown industrial potential and consumer utility to determine the egg's freshness by using a low-cost spectrometer connected to a smartphone. Keywords Non-destructive, chemometrics, freshness, poultry, neural networks. I. Introduction Eggs are an affordable source of nutrients in the human diet. Its freshness and quality declines with the time, been influenced by the storage conditions. Degradation can get to the point of been unfitted for human consumption (Abdel-Nour, Ngadi, Prasher, & Karimi, 2011; Akter, Kasim, Omar, & Sazili, 2014; Akyurek & Okur, 2009; Mathew, Olufemi, & Foluke, 2016). The variability in freshness might be perceived by consumers as lack of quality, for this reason it is important to study methods to monitor and preserve them better (Akter et al., 2014; Karoui et al., 2006). Important and complex changes occur in egg during storage. Predicting these changes is critical in order to monitor egg freshness. These changes include thinning of albumin, weakening the vitelline membrane and increase the water content of the yolk. The properties of foaming and emulsifying the albumen and yolk, respectively, are affected by the protein concentration, pH, ionic strength (Karoui et al., 2006). Storage time, temperature, humidity, air quality, and handling are external factors which causes the degradation of eggs. In particular, storage time is related with two major issues: the reduction of the nutritional value of eggs (Stadelman, Newkirk, & Newby, 1995) and the decreasing of freshness in a logarithmic relation (Silversides & Scott, 2001). Akter et al (2014) demonstrated that egg weight, pH, oxidation and Haugh Units are also adversely affected with increasing storage time at room temperature. In the same work, the authors propose a maximum of 14 days for the time to be stored at room temperature. The freshness can be assessed by physical, biochemical, microbial and sensory parameters. The Haugh Unit (HU) method is a widely used destructive method to measure egg quality (Haugh, 1937). However quality measurements based on HU are biased by the strain and age of the hen (Silversides & Scott, 2001). Lui et al (2007) demonstrated a high correlation between HU and storage time with a R-squared of 0.9868. Sensor technologies are an attractive strategy for non-destructive determination of freshness of the egg, either at the production plant, or at food industries (Galiş, Dale, Boudry, & Théwis, 2012; Karoui et al., 2006). In the last years, non-destructive techniques for freshness and storage time at room temperature have emerged. This techniques include electronic nose (Yongwei, Wang, Zhou, & Lu, 2009), ultrasound (M. Aboonajmi, Setarehdan, Akram, Nishizu, & Kondo, 2014), ultraviolet-visible spectroscopy (Y. Liu et al., 2007) and near infrared spectroscopy (Abdel-Nour et al., 2011; Mohammad Aboonajmi, Saberi, Abbasian Najafabadi, & Kondo, 2015; Lin, Zhao, Sun, kun Bi, & Cai, 2015; Zhao et al., 2010). The food industry has used NIR spectroscopy for long time (Stark, 1996) because it is an accurate, rapid, and non-destructive quality analysis technique (Kumaravelu & Gopal, 2015). Recent works have been published using NIR to predict storage time associated with its freshness, in atlantic salmon (Kimiya, Sivertsen, & Heia, 2013), large yellow croaker (Gangying et al., 2015), snow crab (Lorentzen, Rotabakk, Olsen, Skuland, & Siikavuopio, 2016/1), pork (Chen, Cai, Wan, & Zhao, 2011), apples (F. Liu & Tang, 2015), valerianella locusta (Giovenzana, Beghi, Buratti, Civelli, & Guidetti, 2014), and eggs (Abdel-Nour et al., 2011; Mohammad Aboonajmi et al., 2015; Lin et al., 2015; Zhao et al., 2010). In the past ten years, the evolution of small, hand-held instruments has seen considerable growth (Barton, 2016; Haughey, Galvin-King, Malechaux, & Elliott, 2014). Recently, some low-cost NIR devices have appeared in the market making NIRS applications affordable and therefore, accessible to a wider public (Haughey et al., 2014). NIR spectra is the result of vibrational transitions associated with chemical bonds present in most organic compounds (dos Santos, Lopo, Páscoa, & Lopes, 2013; Kumaravelu & Gopal, 2015; Teye, Huang, & Afoakwa, 2013). The resulting spectrum is a consequence of the modifications made simultaneously in all the properties in the sample, making the calibration process more complicated (Florkowski, Prussia, Shewfelt, & Brueckner, 2009; Martens & Naes, 1992). Chemometrics has become into an essential technique aimed to develop NIR calibration models. Using this techniques it is possible to process many numerous samples in a short time (Moros, Garrigues, & Guardia, 2010). Multivariate analysis techniques are commonly used to process spectral data, techniques such as Principal Component Analysis (PCA), Partial Least Squares (PLS) have been widely used (Kumaravelu & Gopal, 2015). Recently, some machine learning techniques are popularizing as good alternatives to the classic techniques, since they are based on pattern recognition (Brereton, 2015). The aim of this study was to assess the potential of a low-cost NIR spectrometer as a non-destructive and rapid technique for egg storage time assessment. More specific objective is to develop and evaluate a chemometric NIR calibration model based on machine learning techniques for the determination of egg storage time at room temperature. II. Materials and methods Overall methodology, as can be viewed in Figure 1, consists of three moments: the acquisition of the data (Section 2.1), data partition using a cross-validation technique (Section 2.2) and the optimization of the chemometric model (Section 2.3). At each moment several steps were followed. In following subsections the methodology is described in detail. Figure 1. Diagram of moments and steps in experimental methodology 2.1 Data acquisition Samples were collected using a smartphone-connected NIR spectrometer; each sample was uploaded to a cloud application using phone’s internet to be stored in a dataset until the end of sample collection. 2.1.1 Smartphone-connected NIR spectrometer TM SCiO handheld NIR Spectrometer is a device with a built-in light source and a silicon sensor in a NIR short wavelength of 740nm to 1070nm (Goldring, Sharon, Brodetzki, & Ruf, 2016). The spectral data are transferred to a smartphone via Bluetooth and then to a cloud application (Goldring et al., 2016). Previous works has already pointed to the potential of this low-cost device (Haughey et al., 2014; Schulte, Brink, Gruna, Herzog, & Gruger, 2015). This hardware allows to perform rapid tests which could be available on a smartphone (Cartwright, 2016; Das et al., 2015; Pügner, Knobbe, & Grüger, 2016). It has also been reported the use of this device in research related to the detection of counterfeit medicines (Guillemain, Dégardin, & Roggo, 2017; Kaur, 2015; Wilson, Kaur, Allan, Lozama, & Bell, 2017) and for predicting the storage time and expiration packaged chicken fillets (Weesepoel & Ruth, 2016). 2.1.2 Sample collection A total of 660 spectral signals between 740nm to 1070nm were recorded with a spectral resolution of 1 nm. Spectral data were stored in a cloud based dataset with its corresponding reference values in the time storage. Spectral curves correspond to 22 days of continuous monitoring, with 30 shell intact brown poultry eggs, with weights between 55g and 65 g. Eggs were collected from a flock of 20.000 hens of the strain H&N which were between 49-52 weeks old. Hens were housed in stacked cage system, and were fed with a standard ration without the use of laying egg promoters. The spectral data used for experimentation was obtained by averaging two repeated measurements taken successively in the same place. Eggs were scanned in the poultry house immediately after been laid (day 0) and then transported to the laboratory in a thermally insulated container. Measures from day 1 to day 21 were obtained in laboratory conditions monitored hourly at 23±1°C and relative humidity 90±2%. The interval between each measurement was strictly 24 hours. The procedure employed is simple, and the time required for each measurement is short. Non destructive technique was used in this experiment, since it was intended to understand how the spectrum is modified in each of the eggs over time. Using a research license of SCiO Lab, egg spectral signals were downloaded and imported into Matlab (The MathWorks Inc., Natick, MA) in order to develop and optimize the chemometric models. The dataset used for this experiments will be publicly available after manuscript is accepted. During the peer reviewing process, dataset is available for downloading in the following private link: https://data.mendeley.com/datasets/6hn67h2trb/draft?a=76022a6d-e2f2-454a-b507-4c1b59d5d0c5 2.2 Data partition Raw dataset was downloaded and then partitioned using a repeated 10-fold cross- validation technique in order to have training, validation and test subsets for the optimization of calibration model. The model performance measures should be evaluated in a set of new data which are not been used for training the model. A good model should be able to make accurate estimations on this test data (Mucherino, Papajorgji, & Pardalos, 2009). Cross-validation is one common technique applied in machine learning to maximize the use of available data. In this technique, dataset is randomly divided into multiple subsets for training and test the model. Cross-validation is used to avoid overfitting of the model. (Kuhn & Johnson, 2013; Refaeilzadeh, Tang, & Liu, 2009). In this work, spectral data were divided into training (calibration), validation and test subsets using a variation known as repeated cross-validation (Garcia & Filzmoser, 2015; Kuhn & Johnson, 2013). A repeated 10-fold cross-validation technique was chosen. Therefore, data were splitted into 10 groups, which 9 are used as calibration and validation sets and the remaining one as a test set. This process was repeated 50 times. The training and test set were changed until all folds have been tested. Data partition for each fold divided randomly the dataset, having 462 samples (70%) for training, 132 samples (20%) for validation and 66 samples (10%) for test. 2.3 Optimization of calibration model The relationship between the response in the spectral region of the NIR spectra and the target is often a nonlinear type (Bertran et al., 1999). The origin of these nonlinearities is difficult to identify, for this reason, calibration is often performed using multivariate analysis (Martens & Naes, 1992). In order to develop a chemometric model, it is required the NIR spectra, the reference values for calibration and an algorithm to link them (Barton, 2016). Parameters of the model were optimized in three consecutive phases. With the parameters achieved by the best model, there was performed an evaluation in unseen data (test set) in order to estimate future performance in new data. This step was performed in three consecutive phases of optimization. In Phase 1, two modelling algorithms (Section 2.3.1 and 2.3.2) were tested simultaneously with seven preprocessing techniques (Section 2.3.3). In Phase 2, the parameters of selected model were tuned in order to optimize its performance. In Phase 3, the feature selection threshold (Section 2.3.4) was fine tuned. 2.3.1 Partial Least Squares (PLS) PLS was introduced by the Swedish statistician Herman Wold (H. Wold, 1985). In chemometrics, PLS-regression is used as a basic method for relating two data matrices, by a linear multivariate model. However, this method goes beyond traditional regression since it has the ability to analyze data with incomplete, noisy and collinear variables (S. Wold, Sjöström, & Eriksson, 2001). This method is widely applied on NIR spectroscopy where multiple input variables are required. The accuracy of the model in PLS-regression, improves when the number of relevant variables and the number of observations is increased (Hattori & Otsuka, 2017). The PLS model is aimed to find in a multidimensional space, the direction in X, which explains the maximum variance direction in the Y. PLS regression is suited when the problem has more predictor variables than the number of observations, and when there could be multicollinearity among X values (Yu et al., 2017). 2.3.2 Artificial neural networks (ANN) Artificial neural networks (ANN) are data-modeling tools aimed to analyze complex relationships between inputs and outputs. In recent years, ANN have become a subject of much relevance in the scientific and research field, they are inspired in the human central nervous system, which have lots of numerous cells that work quickly and help in decision making (Cascardi, Micelli, & Aiello, 2017). The Multilayer Perceptron (MLP) is a type of layered neural network with connections between consecutive forwarding layers. Figure 2 shows the general scheme of a MLP, one input layer, one or more hidden layers and one output layer. The transfer function of neurons is commonly a sigmoid function, but other functions can also be used (Kruse et al., 2013; Ruck, Rogers, Kabrisky, Oxley, & Suter, 1990). Figure 2. Multilayer perceptron representation. Each neuron receives the output signals of the neurons in previous layer and provides an output for the next layer. The output layer receives as input the output of the last hidden layer and returns the output of the network. (Gardner & Dorling, 1998; Kruse et al., 2013). The number of neurons, layers and their connections is commonly known as the architecture of the neural network, been one of the key parameters to be optimized. The architecture depends on the complexity of problem, and there is no general method for choosing the best one. Choosing a good architecture is an empirical process, where multiple architectures are tested in order to find one that offers satisfactory results (Herrera, Hervas, Otero, & Sánchez, 2004; Rivero, Fernandez- Blanco, Dorado, & Pazos, 2011). The use of artificial neural networks has been successfully applied to NIR spectrometry in rapid quantification of wine compounds (Martelo-Vidal & Vázquez, 2015), evaluation of chemical components and properties of the jujube fruit (Guo, Ni, & Kokot, 2016), characterization of blends containing refined and extra virgin olive oils (Aroca-Santos, Cancilla, Pariente, & Torrecilla, 2016). 2.3.3 Preprocessing techniques NIR spectra can be preprocessed to eliminate physical phenomena which alter raw spectra. This practice is an integral part of the chemometric modeling (Blanco & Villarroya, 2002) and it is considered one of the most important steps. The data contained on NIR spectra is the composition of several signals with overlapping information. (Blanco & Villarroya, 2002; dos Santos et al., 2013; Rinnan, Berg, & Engelsen, 2009). Proper choice of technique preprocessing is difficult to assess before the validation of the model, therefore NIR spectra preprocessing is performed by trial and error (Rinnan et al., 2009; Xu et al., 2008). In this work, raw spectral signal and other six common preprocessing techniques are analyzed: 1) Raw spectra: NIR-reflectance measurement of a sample, includes both the diffusively reflected and specular reflected radiation. The latter as does not contain relevant information and it is commonly minimized by instrument design and sampling geometry. The diffusively reflected light, is the primary source of information in the NIR spectra. (Rinnan et al., 2009). 2) Savitzky Golay: This pre-processing technique was proposed in 1964 by the authors who gave name to the technical (Savitzky & Golay, 1964). The authors popularized a method which includes a smoothing step for numerical derivation of a vector. In this method a p-order polynomial is fitted in a symmetric window of w-width on the raw data, then the d-order derivative is calculated at centre point i. This operation is applied sequentially to all spectral points. The width of the window size, the degree of the fitted polynomial, are decisions that need to be made. The highest derivative that can be determined is the degree of the polynomial (Rinnan et al., 2009). For initial experiments the width was se to five, and both orders of polynomial and derivative were set to two. In subsequent experiments, the width was modified from 3 to 101 (odd numbers only), the polynomial order from one to five, and the derivative order from one to five. 3) Beer-Lambert law: suggests a linear relationship between the reflectance of the spectra and concentration of components, the law can be expressed as shown in Equation 1 (Rinnan et al., 2009): Aλ=−log ⁡ (R)≅ ϵλ ×l ×c (1) Where Aλ is the wavelength-dependent absorbance, R is the reflectance, ελ is the wavelength-dependent molar absorptivity, l is the length of the light through the sample matrix and c is the concentration of the components of interest. 4) Standard Normal Variate (SNV): is a method for scattering correction of NIR data (Barnes, Dhanoa, & Lister, 1989). The formula is expressed in Equation 2. X −x X = (2) i , snv Where Xi,snv is the SNV at a wavelength i, x̄ is the spectrum average of the sample to be corrected, and S is the standard deviation of the spectrum sample. 5) Multiplicative Scatter Correction: is a widely used pre-processing technique for NIR, it was first introduced by Martens et al. (1983). In this technique, undesirable scatter effects of spectra are removed from the data matrix prior to data modeling. MSC comprises in a first step the estimation of the correction coefficients, and a second step which consists in the correction of the recorded spectrum. Equation 3 and 4 show both steps respectively. X =b +b × X +e (3) o o r,1 r X −b o o X = =X + (4) c r b b r,1 r ,1 X X Where is the corrected spectra, is one original sample spectra measured by the c o NIR instrument, is a reference spectrum used for preprocessing of the entire dataset. In most applications, the average spectrum of the calibration set is used as the X b b reference spectrum. e is the unmodeled part of . and are the scalar o o r,1 parameters, which differ for each sample. (Rinnan et al., 2009). 6) First Spectral Derivative (FSD): these technique have been used in analytical spectroscopy for decades due to its ability to remove additive and multiplicative effects of the spectra (dos Santos et al., 2013). Finite difference is the basic method for spectral derivation (Rinnan et al., 2009); thus, the first derivative is calculated as the difference between two subsequent spectral points, as can be analyzed in Equation 5. X =X −X (5) i , fsd i i−1 Where X denotes the first derivative in the wavelength i, and X represents the i,fsd i,ssd second derivative in the wavelength i. The first derivative removes the baseline of spectra. 7) Second Spectral Derivative (SSD): This technique is also based on finite difference for spectral derivation (Rinnan et al., 2009); thus, the second derivative is calculated as the difference between two subsequent points from processed FSD signal. The formula is shown in Equation 6 : X =X −X (6) i, ssd i , fsd i, fsd−1 Where X represents the second derivative in the wavelength i. The second i,ssd derivative besides removing the baseline, remove the linear trend of spectra. According to Rinnan et al., (2009) this technique should be avoided in practice, since it is not feasible for most real measurements due to noise inflation. 2.3.4 Feature selection threshold Commonly NIR spectra contains a large amount of information along the wavelength range. For this reason, it is important to perform a technique aimed to reduce this amount of data (Blanco & Villarroya, 2002). This techniques have now become a necessity and a requirement in chemometrics (Guyon, Gunn, Nikravesh, & Zadeh, 2008; Saeys, Inza, & Larrañaga, 2007). In machine learning techniques, the proper selection of features to get of a small subset with lower sensitivity to non-linearities is usually effective to improve performance of the models (Leardi, Boggia, & Terrile, 1992; Saeys et al., 2007). Selection of relevant features related to the compound of interest and avoiding interference of others should be a target in terms of building a robust predictive model. In this study, a feature selection method was applied to select the informative wavelengths. Feature Selection does not alter the original representation of the variables. This techniques simply choose a subset of the best wavelengths for the model (Saeys et al., 2007). A threshold is normally used together with a filter technique either univariate or multivariate model to evaluate which are the best wavelengths (Szymańska et al., 2015). The present work used a multivariate filter described by Hall (1999) called Correlation- based Feature Selection (CFS). This filter is a simple algorithm that ranks feature importance according to its correlation function with the predicted variable. By using this method, the relevant informative wavelengths will be selected and therefore, it is expected to obtain an improvement of the model’s performance. 2.3.5 Model evaluation All models were evaluated using performance measures as the coefficient of determination (R-squared) in validation set from a cross-validation (dos Santos et al., 2013; Viscarra Rossel, 2008). Mean and standard of R-squared obtained of 500 models (50 repetitions by 10-fold cross-validation) were calculated for each model configuration in order to decide the best parameters of the model. Final results are presented in test set from a cross-validation. R-squared, mean absolute error (MAE) and root mean square error (RMSE) were obtained according to equations 7-9, respectively. MAE and RMSE, represent the difference between predictive values and the actual values (Armstrong & Collopy, 1992; Hyndman & Koehler, 2006). i=n ( y¿¿i−^y ) (7) ❑ i=1 R−squared = ¿ i=n ¿ y −^y ∨¿ i i (8) i=1 MAE= ¿ RMSE= ❑ (9) y y Where is the real value of the i-th observation, is the predicted value of the i-th i i observation, y is the average of real values and n is the number of observations. III. Results and Discussion This study presents a method to estimate the egg’s storage time at room temperature using a smartphone connected NIR spectrometer and machine learning techniques. Data acquisition and data partition were broadly covered in section 2.1 and 2.2. Therefore in this section results obtained in the three phases of the calibration model optimization are presented. 3.1 Phase 1: Selection of modelling algorithm and pre-processing technique This phase was aimed to simultaneously choose the modelling algorithm and the pre- processing techniques. A grid search technique (Koch et al., 2012; Ma, Zhang, & Wang, 2015), was used to evaluate models. PLS and ANN algorithms were trained using the same partition schema to ensure that both models receive exactly the same input data, making the results comparable. Due to the influence of the feature selection threshold, the preprocessing techniques were evaluated at all thresholds using a correlation based feature selection (Hall, 1999). Figure 3: R-squared in CV validation set obtained with the seven preprocessing techniques at different values of feature threshold. a) Using PLS b) using ANN. The results of the mean R-squared of 50 repeated 10-fold cross-validation achieved using PLS with 10 latent variables, can be seen in Figure 3a. Best results are obtained with pre-processing technique 4 (Standard Normal Variate) at 70% feature selection threshold and pre-processing technique 6 (First Spectral Derivative) at 50% feature selection threshold. In both cases, the results are below 0.8. In Figure 3b, can be observed the results of the mean R-squared of 50 repeated 10- fold cross-validation achieved using ANN with 10 neurons in one hidden layer. Best results are obtained with pre-processing technique 2 (Savitzky Golay) at 90% feature selection threshold, and pre-processing technique 7 (Second Spectral Derivative) at 60% feature selection threshold. In both cases, the R-squared results are above 0.8. ANN models outperform to the PLS models. Therefore, ANN will be used to be optimized. Despite the fact Savitzky Golay and Second Spectral Derivative have similar results, the latter one should be avoided in practice, because it is not feasible in practice due to the added noise (Rinnan et al., 2009). Additionally, Savitzky Golay technique, has tuning parameters which can be useful to optimize the model (Savitzky & Golay, 1964). Therefore, once the preprocessing was chosen, the optimization was executed again to optimize the feature selection threshold again. Main reason is Savitzky-Golay’s tuning parameters can transform the input pattern, and thus, a reevaluation of the selected wavelengths is necessary. Therefore, this parameter is set to 100% for now. Savitzky Golay widths, odd numbers between 3 and 101 were tested for different configurations of Polynomial order and Derivative order (between first and fifth). Table 1 show the results of the mean, standard deviation, min and max, and p-value of a Tukey Honest Significant Difference (Tukey, 1949) test of R-squared of 50 repeated 10-fold cross-validation achieved at the best width of each configuration. There is an intrinsic redundancy in the hierarchy of Savitzky Golay derivation. For each Polynomial order, the subsequent derivative order, gave the same estimate of the coefficients. In example, for the first-degree polynomial, a first derivative and second derivative will give the same answer. The results of redundant configurations are not been presented. Table 1. Results of SavGol parameters at best width, using odd numbers from 3 al 101 Polynomial Derivative width mean ± std min max Tukey HSD degree order First First 7 0,7644 ± 0,069 0,360 0,861 c Second First 7 0,7654 ± 0,067 0,445 0,894 c Second Second 29 0,8204 ± 0,047 0,610 0,909 ab Third First 19 0,7678 ± 0,065 0,472 0,880 c Third Second 21 0,8214 ± 0,045 0,675 0,910 ab Third Third 53 0,8249 ± 0,044 0,690 0,926 ab Fourth First 13 0,7679 ± 0,073 0,267 0,899 c Fourth Second 41 0,8222 ± 0,045 0,561 0,926 ab Fourth Third 51 0,8248 ± 0,043 0,618 0,914 ab Fourth Fourth 61 0,8199 ± 0,042 0,651 0,909 ab Fifth First 5 0,7684 ± 0,07 0,123 0,889 c Fifth Second 39 0,8252 ± 0,045 0,624 0,931 a Fifth Third 67 0,8276 ± 0,041 0,590 0,910 a Fifth Fourth 61 0,8189 ± 0,042 0,660 0,907 ab Fifth Fifth 67 0,8107 ± 0,045 0,607 0,913 b Rows with different letters differ significantly according to Tukey HSD for a value of p <0.01. According to the Tukey HSD test, best configuration of Savitzky Golay can be a fifth polynomial degree and a second or third derivative order. The latter was chosen since it has a greater mean and smaller standard deviation in test data. Therefore the optimized Savitzky Golay technique for this experiment was of 67 smoothing points (width), a fifth polynomial degree and a third order derivative. There were evident differences in the spectra of the eggs as a function of storage time. Figure 4a shows the result of applying Savitzky Golay preprocessing technique. As the eggs were stored for longer, the obtained spectra takes a different characteristic values. This differences can also be seen in Figure 4b, which was made with an average of the signals corresponding to the eggs stored at 0, 7, 14 and 21 days. Visually exploring the dataset it is found some patterns indicating that spectra of eggs changes as the storage time is increased. Notice in Figure 4b at wavelengths of 860- 940 nm, that fresh eggs with zero days of storage have a Savitzky Golay reflectance near to zero, while stored eggs have oscillations between 1 and -1 in the same wavelength. After this pattern fresh eggs have a peak at 933nm, while stored eggs show a similar peak earlier at 913nm. At 983nm and 999nm, the difference between eggs stored during 7 days and those stored more time is evident. In a visual exploration of data it is difficult to find differences between eggs stored for more than 14 days. According to Akter et al (2014) eggs must be stored maximum of 14 days at room temperature. The small differences between spectra of egg stored for more than 14 days may be a indicative of a deterioration in freshness of eggs. Figure 4. Spectral signal processed with optimized Savitzky Golay at different storage times. 4a) all spectral signals. 4b) mean spectra at 0,7,14,and 21 days of storage. 3.2 Phase 2: Optimization of selected algorithm parameters During phase 1, the model using an ANN algorithm obtained better results. For this reason, ANN algorithm was selected. In this phase, it was decided to optimize the architecture of the neural network. Architectures of one and two layer were evaluated using a grid search (Koch et al., 2012; Ma et al., 2015). Values ranging from 0 to 200 neurons in each layer with intervals of 10 were evaluated. Figure 6. R-squared in 50 repeated 10-fold CV validation set. a) mean b) standard deviation Figure 6b shows the standard deviation of 50 repeated 10 fold cross-validated models at each architecture, this measure indicates the stability and the repeatability of the proposed model. Best results are achieved in architectures of one hidden layer with 10 or 20 neurons, and architectures of two hidden layers in which the second layer has 10 or 20 neurons. Although the best results are achieved with an architecture of two hidden layers with 180 and 10 neurons respectively. It was observed that using an architecture of 10 neurons in one hidden layer the model obtains statistically equal results. According to Abu-Mostafa (1989) in this work is has been selected the least complex with the same results, this is an architecture of 10 neurons in one hidden layer. 3.3 Phase 3: Fine tuning of feature selection threshold The following experiment aims to find the best threshold value for feature (75, 150) selection. Figure 5 shows the influence of feature selection threshold in R-squared results. Figure 5. Results of R-squared at different values of feature threshold Results above 0.8 are obtained with values of feature selection threshold from 30 onwards. Figure 5 evidences the importance of the appropriate selection of features as inputs of the model. A feature selection threshold of 48 is selected since R-squared obtained in test data is equal to 0.8319 ± 0.0377. Although, evaluation of storage time of poultry eggs involves a complex process, this study shows that egg quality can be predicted using a ANN model with relevant features as input patterns. Our results confirm the stated by Guyon et al. (2008), that appropriate selection of relevant wavelengths is very important and must be used as a good strategy to avoid the inclusion of uninformative and redundant wavelengths in the predictive model. 3.4 Evaluation of the model in unseen data (test set). Once optimized all parameters, final configuration of the model is a Savitzky-Golay pre- processing technique, with a fifth polynomial degree, a third derivative order and a width of 67. The ANN model has 10 neurons in one hidden layer and the feature selection threshold is of 48. The calibration model using the previous mentioned parameters was tested in unseen data to evaluate its performance and generalization ability. Figure 6a shows a regression plot between the values of egg storage time predicted by the model, with their corresponding actual valors. It can be seen that most predictions present an absolute error of about 2 days, some predictions show a error of about 4 days, and one value has the worst error of 6 days. Figure 6b shows the histogram of absolute errors. Figure 6. Model performance in test set data. a) Regression fit plot actual vs predicted. b) Absolute error histogram. In order to evaluate the performance of the model in unseen data, R-squared and RMSE was calculated for the model and can be seen in Table 2. Proposed model has a high significance in all subsets. Table 2. Performance metrics in training, validation y test set training validation test n samples 462 132 66 R-squared 0.865 0.862 0.873 Mean Absolute Error 1.834 1.998 1.810 Root Mean Squared Error 2.15 2.19 1.97 F-statistic vs. constant model 3.09e+03 673 449 p-value <0.01 <0.01 <0.01 The differences between results in test and training set performance could indicate the degree of overfitting (Stockwell & Peterson, 2002). However, Table 2 shows that these parameters are similar between training set, validation set and test set. Therefore we can affirm that the models are well fitted within the context of the calibration set. Although the R-squared as a performance metric of the model is below 0.9, the RMSE is about two days, which gives safety margins to the user. Some authors have evaluated predictive models for freshness assessment. Lin et. al. (2011) obtained a R-squared of 0.879 and a RMSE 2.443 using NIR spectroscopy for Haugh unit. Sun et. al. (2015) used artificial vision and dynamic weighing, to obtain a R-squared of 0.8653 and a RMSE of 3.745. Abdel-Nour et. al. (2011) achieved a R- squared of 0.89 and a RMSE of 1.65 in validation dataset for the prediction of storage days, using a lab grade VIS/NIR spectroradiometer. Our results are similar to the mentioned above and this clearly indicate that there a is potential for portable NIR instruments as an alternative to desktop lab instrumentation in agree with Haughey (2014). Moreover, due to the recent advances in sensors technology such as Changhong (Rateni, Dario, & Cavallo, 2017) which have a embedded NIR sensor. It is possible to create applications using the proposed model. This approach could enable the consumers to predict the storage time in poultry eggs. IV. Conclusions and future work Our findings show the potential of smartphone-connected devices based on short wave NIR as an effective method for the evaluation of storage time in poultry eggs. Spectral analysis eggs is a rapid and nondestructive method for egg storage time determination. Reflectance spectral data of the egg contains relevant information about its storage time, a parameter of freshness. There were noticeable differences in the processed spectra values of the eggs as a function of storage time. Suitable predictive models were built with PLS and ANN regression techniques, however the latter performed better achieving a R-squared of 0.873 and a RMSE of 1.97 in test set data, suggesting that the spectra obtained with the smartphone connected NIR spectrometer can be used as a nondestructive method for the assessment of egg storage time, a parameter of quality and freshness. The use of a smartphone connected NIR spectrometer is recommended for egg storage time assessment. However, further work is needed to assess the long-term reliability of the system. A combination of this method with destructive techniques is recommended. Future work focus on the exhaustive experimentation at room and refrigeration temperatures using this low-cost spectrometer on eggs from hens of diverse ages and strains. The use of hyperspectral imaging for storage time prediction is also a potential technique to be studied. Acknowledgements The authors wish to thank Agrolomas CL for providing samples directly from their poultry houses. An special acknowledgement to CEDIA National Research and Education Network. Iván Ramírez-Morales and Enrique Fernández-Blanco would also like to thank the support provided by the NVIDIA Research Grants Program. This work is part of DINTA-UTMACH and RNASA-UDC research groups. Bibliografía Abdel-Nour, N., Ngadi, M., Prasher, S., & Karimi, Y. (2011). Prediction of Egg Freshness and Albumen Quality Using Visible/Near Infrared Spectroscopy. Food and Bioprocess Technology, 4(5), 731–736. https://doi.org/10.1007/s11947-009- 0265-0 Aboonajmi, M., Saberi, A., Abbasian Najafabadi, T., & Kondo, N. (2015). Quality assessment of poultry egg based on Vis-NIR Spectroscopy and RBF networks. International Journal of Food Properties, 2912(December), 150831112742007. https://doi.org/10.1080/10942912.2015.1075215 Aboonajmi, M., Setarehdan, S. K., Akram, A., Nishizu, T., & Kondo, N. (2014). Prediction of Poultry Egg Freshness Using Ultrasound. International Journal of Food Properties, 17(9), 1889–1899. https://doi.org/10.1080/10942912.2013.770015 Abu-Mostafa, Y. S. (1989). The Vapnik-Chervonenkis Dimension: Information versus Complexity in Learning. Neural Computation, 1(3), 312–317. https://doi.org/10.1162/neco.1989.1.3.312 Akter, Y., Kasim, A., Omar, H., & Sazili, A. Q. (2014). Effect of storage time and temperature on the quality characteristics of chicken eggs. Journal of Food, Agriculture. Retrieved from https://www.researchgate.net/profile/Yeasmin_Akter/publication/284471186_Effect _of_storage_time_and_temperature_on_the_quality_characteristics_of_chicken_e ggs/links/5653df8f08ae1ef929763361.pdf Akyurek, H., & Okur, A. A. (2009). Effect of storage time, temperature and hen age on egg quality in free-range layer hens. Journal of Animal and Veterinary Advances: JAVA, 8(10), 1953–1958. Retrieved from http://docsdrive.com/pdfs/medwelljournals/javaa/2009/1953-1958.pdf Armstrong, J. S., & Collopy, F. (1992). Error measures for generalizing about forecasting methods: Empirical comparisons. International Journal of Forecasting, 8(1), 69–80. https://doi.org/10.1016/0169-2070(92)90008-W Aroca-Santos, R., Cancilla, J. C., Pariente, E. S., & Torrecilla, J. S. (2016). Neural networks applied to characterize blends containing refined and extra virgin olive oils. Talanta, 161, 304–308. https://doi.org/10.1016/j.talanta.2016.08.033 Barnes, R. J., Dhanoa, M. S., & Lister, S. J. (1989). Standard Normal Variate Transformation and De-trending of Near-Infrared Diffuse Reflectance Spectra. Applied Spectroscopy, 43(5), 772–777. Retrieved from https://www.osapublishing.org/as/abstract.cfm?URI=as-43-5-772 Barton, F., II. (2016). Near infrared equipment through the ages and into the future. NIR News, 27(1), 41. https://doi.org/10.1255/nirn.1585 Bertran, E., Blanco, M., Maspoch, S., Ortiz, M. C., Sánchez, M. S., & Sarabia, L. A. (1999). Handling intrinsic non-linearity in near-infrared reflectance spectroscopy. Chemometrics and Intelligent Laboratory Systems, 49(2), 215–224. https://doi.org/ 10.1016/S0169-7439(99)00043-X Blanco, M., & Villarroya, I. (2002). NIR spectroscopy: a rapid-response analytical tool. Trends in Analytical Chemistry: TRAC, 21(4), 240–250. https://doi.org/10.1016/S0165-9936(02)00404-1 Brereton, R. G. (2015). Pattern recognition in chemometrics. Chemometrics and Intelligent Laboratory Systems, 149, Part B, 90–96. https://doi.org/10.1016/j.chemolab.2015.06.012 Cartwright, J. (2016). Technology: Smartphone science. Nature, 531(7596), 669–671. https://doi.org/10.1038/nj7596-669a Cascardi, A., Micelli, F., & Aiello, M. A. (2017). An Artificial Neural Networks model for the prediction of the compressive strength of FRP-confined concrete circular columns. Engineering Structures, 140, 199–208. Retrieved from http://www.sciencedirect.com/science/article/pii/S0141029617305461 Chen, Q., Cai, J., Wan, X., & Zhao, J. (2011). Application of linear/non-linear classification algorithms in discrimination of pork storage time using Fourier transform near infrared (FT-NIR) spectroscopy. LWT - Food Science and Technology, 44(10), 2053–2058. Retrieved from http://www.sciencedirect.com/science/article/pii/S0023643811001630 Das, A., Swedish, T., Wahi, A., Moufarrej, M., Noland, M., Gurry, T., … Raskar, R. (2015). Mobile phone based mini-spectrometer for rapid screening of skin cancer. In SPIE Sensing Technology + Applications (p. 94820M–94820M–5). International Society for Optics and Photonics. https://doi.org/10.1117/12.2182191 dos Santos, C. A. T., Lopo, M., Páscoa, R. N. M. J., & Lopes, J. A. (2013). A review on the applications of portable near-infrared spectrometers in the agro-food industry. Applied Spectroscopy, 67(11), 1215–1233. https://doi.org/10.1366/13-07228 Florkowski, W. J., Prussia, S. E., Shewfelt, R. L., & Brueckner, B. (2009). Postharvest Handling: A Systems Approach. Elsevier Science. Retrieved from https://books.google.com.ec/books?id=_euakAoRNZEC Galiş, A.-M., Dale, L. M., Boudry, C., & Théwis, A. (2012). The potential use of near- infrared spectroscopy for the quality assessment of eggs and egg products. Scientific Works C Series Veterinary Medicine, 58, 294–307. Retrieved from https://orbi.ulg.ac.be/bitstream/2268/156809/1/Art38.pdf Gangying, Z., Yangyang, G., Wei, L., Jie, H., Minmin, W., & Guohua, H. (2015). Large Yellow Croaker (Pseudosciaena crocea) Storage Time Detecting Method by Visible-NIR Spectroscopy Combined with Aperiodic Stochastic Resonance. Journal of Chinese Institute of Food Science and Technology, 1, 050. Retrieved from http://en.cnki.com.cn/Article_en/CJFDTotal-ZGSP201501050.htm Garcia, H., & Filzmoser, P. (2015). Multivariate Statistical Analysis using the R package chemometrics. Vienna, Austria. Retrieved from ftp://155.232.191.229/cran/web/packages/chemometrics/vignettes/chemometrics- vignette.pdf Gardner, M. W., & Dorling, S. R. (1998). Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences. Atmospheric Environment, 32(14–15), 2627–2636. https://doi.org/10.1016/S1352- 2310(97)00447-0 Giovenzana, V., Beghi, R., Buratti, S., Civelli, R., & Guidetti, R. (2014). Monitoring of fresh-cut Valerianella locusta Laterr. shelf life by electronic nose and VIS–NIR spectroscopy. Talanta, 120, 368–375. Retrieved from http://www.sciencedirect.com/science/article/pii/S003991401300996X Goldring, D., Sharon, D., Brodetzki, G., & Ruf, A. (2016). WITHDRAWN PATENT AS PER THE LATEST USPTO WITHDRAWN LIST. US Patent. Retrieved from http:// www.freepatentsonline.com/9354117.html Guillemain, A., Dégardin, K., & Roggo, Y. (2017). Performance of NIR handheld spectrometers for the detection of counterfeit tablets. Talanta, 165, 632–640. https://doi.org/10.1016/j.talanta.2016.12.063 Guo, Y., Ni, Y., & Kokot, S. (2016). Evaluation of chemical components and properties of the jujube fruit using near infrared spectroscopy and chemometrics. Spectrochimica Acta. Part A, Molecular and Biomolecular Spectroscopy, 153, 79– 86. https://doi.org/10.1016/j.saa.2015.08.006 Guyon, I., Gunn, S., Nikravesh, M., & Zadeh, L. A. (2008). Feature Extraction: Foundations and Applications. Springer Berlin Heidelberg. Retrieved from https://books.google.com.ec/books?id=FOTzBwAAQBAJ Hall, M. A. (1999). Correlation-based feature selection for machine learning. The University of Waikato. Retrieved from https://www.lri.fr/~pierres/donn%E9es/save/these/articles/lpr-queue/ hall99correlationbased.pdf Hattori, Y., & Otsuka, M. (2017). Modeling of feed-forward control using the partial least squares regression method in the tablet compression process. International Journal of Pharmaceutics, 524(1-2), 407–413. https://doi.org/10.1016/j.ijpharm.2017.04.004 Haughey, S. A., Galvin-King, P., Malechaux, A., & Elliott, C. T. (2014). The use of handheld near-infrared reflectance spectroscopy (NIRS) for the proximate analysis of poultry feed and to detect melamine adulteration of soya bean meal. Analytical Methods, 7(1), 181–186. https://doi.org/10.1039/C4AY02470B Haugh, R. R. (1937). The Haugh unit for measuring egg quality. Retrieved from http://en.journals.sid.ir/ViewPaper.aspx?ID=128047 Herrera, F., Hervas, C., Otero, J., & Sánchez, L. (2004). Un estudio empırico preliminar sobre los tests estadısticos más habituales en el aprendizaje automático. Tendencias de La Minerıa de Datos En Espana, Red Espanola de Minerıa de Datos Y Aprendizaje (TIC2002-11124-E), 403–412. Retrieved from http://www.lsi.us.es/~riquelme/red/Capitulos/LMD35.pdf Hyndman, R. J., & Koehler, A. B. (2006). Another look at measures of forecast accuracy. International Journal of Forecasting, 22(4), 679–688. https://doi.org/10.1016/j.ijforecast.2006.03.001 Karoui, R., Kemps, B., Bamelis, F., De Ketelaere, B., Decuypere, E., & De Baerdemaeker, J. (2006). Methods to evaluate egg freshness in research and industry: A review. European Food Research and Technology = Zeitschrift Fur Lebensmittel-Untersuchung Und -Forschung. A, 222(5-6), 727–732. https://doi.org/ 10.1007/s00217-005-0145-4 Kaur, H. (2015). Counterfeit Pharmaceuticals and Methods to Test Them. In M. Walport (Ed.), Annual Report of the Government Chief Scientific Adviser 2015: Forensic Science and Beyond: Authenticity, Provenance and Assurance Evidence and Case Studies (pp. 132–137). London: Government Office for Science. Retrieved from http://researchonline.lshtm.ac.uk/id/eprint/2528148 Kimiya, T., Sivertsen, A. H., & Heia, K. (2013). VIS/NIR spectroscopy for non- destructive freshness assessment of Atlantic salmon (Salmo salar L.) fillets. Journal of Food Engineering, 116(3), 758–764. Retrieved from http://www.sciencedirect.com/science/article/pii/S0260877413000216 Koch, P., Bischl, B., Flasch, O., Bartz-Beielstein, T., Weihs, C., & Konen, W. (2012). Tuning and evolution of support vector kernels. Evolutionary Intelligence, 5(3), 153–170. https://doi.org/10.1007/s12065-012-0073-8 Kruse, R., Borgelt, C., Klawonn, F., Moewes, C., Steinbrecher, M., & Held, P. (2013). Multi-Layer Perceptrons. In Computational Intelligence (pp. 47–81). Springer London. https://doi.org/10.1007/978-1-4471-5013-8_5 Kuhn, M., & Johnson, K. (2013). Applied Predictive Modeling: Springer New York. https://doi.org/10.1007/978-1-4614-6849-3 Kumaravelu, C., & Gopal, A. (2015). A review on the applications of Near-Infrared spectrometer and Chemometrics for the agro-food processing industries, 8–12. https://doi.org/10.1109/TIAR.2015.7358523 Leardi, R., Boggia, R., & Terrile, M. (1992). Genetic algorithms as a strategy for feature selection. Journal of Chemometrics, 6(5), 267–281. https://doi.org/10.1002/cem.1180060506 Lin, H., Zhao, J., Sun, L., Chen, Q., & Zhou, F. (2011). Freshness measurement of eggs using near infrared (NIR) spectroscopy and multivariate data analysis. Innovative Food Science & Emerging Technologies: IFSET: The Official Scientific Journal of the European Federation of Food Science and Technology, 12(2), 182– 186. Retrieved from http://www.sciencedirect.com/science/article/pii/S1466856411000117 Lin, H., Zhao, J., Sun, L., kun Bi, X., & Cai, J. (2015). Effective Variables Selection in Eggs Freshness Graphically Oriented Local Multivariate Analysis using NIR Spectroscopy. In International Conference on Chemical, Material and Food Engineering (pp. 13–18). Liu, F., & Tang, X. (2015). Fuji apple storage time rapid determination method using Vis/NIR spectroscopy. Bioengineered, 6(3), 166–169. https://doi.org/10.1080/21655979.2015.1038001 Liu, Y., Ying, Y., Ouyang, A., & Li, Y. (2007). Measurement of internal quality in chicken eggs using visible transmittance spectroscopy technology. Food Control, 18(1), 18–22. https://doi.org/10.1016/j.foodcont.2005.07.011 Lorentzen, G., Rotabakk, B. T., Olsen, S. H., Skuland, A. V., & Siikavuopio, S. I. (2016/1). Shelf life of snow crab clusters (Chionoecetes opilio) stored at 0 and 4 °C. Food Control, 59, 454–460. Retrieved from http://www.sciencedirect.com/science/article/pii/S0956713515300517 Martelo-Vidal, M. J., & Vázquez, M. (2015). Application of artificial neural networks coupled to UV–VIS–NIR spectroscopy for the rapid quantification of wine compounds in aqueous mixtures. CyTA - Journal of Food, 13(1), 32–39. https://doi.org/10.1080/19476337.2014.908955 Martens, H., Jensen, S. A., & Geladi, P. (1983). Multivariate linearity transformation for near-infrared reflectance spectrometry. In Proceedings of the Nordic symposium on applied statistics (pp. 205–234). Stokkand Forlag Publishers Stavanger, Norway. Martens, H., & Naes, T. (1992). Multivariate Calibration. Wiley. Retrieved from https://books.google.com.ec/books?id=6lVcUeVDg9IC Mathew, A. O., Olufemi, A. M., & Foluke, A. (2016). Relationship of temperature and length of storage on ph of internal contents of chicken table egg in humid tropics. Biotechnology in Animal. Retrieved from http://www.doiserbia.nb.rs/Article.aspx? id=1450-91561603285M Ma, X., Zhang, Y., & Wang, Y. (2015). Performance evaluation of kernel functions based on grid search for support vector regression. In 2015 IEEE 7th International Conference on Cybernetics and Intelligent Systems (CIS) and IEEE Conference on Robotics, Automation and Mechatronics (RAM) (pp. 283–288). ieeexplore.ieee.org. https://doi.org/10.1109/ICCIS.2015.7274635 Moros, J., Garrigues, S., & Guardia, M. de la. (2010). Vibrational spectroscopy provides a green tool for multi-component analysis. Trends in Analytical Chemistry: TRAC, 29(7), 578–591. Retrieved from http://www.sciencedirect.com/science/article/pii/S0165993610000361 Mucherino, A., Papajorgji, P. J., & Pardalos, P. M. (2009). Data Mining in Agriculture (Vol. 34, pp. 143–160–160). New York, NY: Springer New York. https://doi.org/10.1007/978-0-387-88615-2 Pügner, T., Knobbe, J., & Grüger, H. (2016). Near-Infrared Grating Spectrometer for Mobile Phone Applications. Applied Spectroscopy, 70(5), 734–745. https://doi.org/ 10.1177/0003702816638277 Rateni, G., Dario, P., & Cavallo, F. (2017). Smartphone-Based Food Diagnostic Technologies: A Review. Sensors , 17(6). https://doi.org/10.3390/s17061453 Refaeilzadeh, P., Tang, L., & Liu, H. (2009). Cross-Validation. In L. Liu & M. Tamer Özsu (Eds.), Encyclopedia of Database Systems (pp. 532–538). Springer US. https://doi.org/10.1007/978-0-387-39940-9_565 Rinnan, Å., Berg, F. van D., & Engelsen, S. B. (2009). Review of the most common pre-processing techniques for near-infrared spectra. Trends in Analytical Chemistry: TRAC, 28(10), 1201–1222. https://doi.org/10.1016/j.trac.2009.07.007 Rivero, D., Fernandez-Blanco, E., Dorado, J., & Pazos, A. (2011). Using recurrent ANNs for the detection of epileptic seizures in EEG signals. In 2011 IEEE Congress of Evolutionary Computation (CEC) (pp. 587–592). https://doi.org/10.1109/CEC.2011.5949672 Ruck, D. W., Rogers, S. K., Kabrisky, M., Oxley, M. E., & Suter, B. W. (1990). The multilayer perceptron as an approximation to a Bayes optimal discriminant function. IEEE Transactions on Neural Networks / a Publication of the IEEE Neural Networks Council, 1(4), 296–298. https://doi.org/10.1109/72.80266 Saeys, Y., Inza, I., & Larrañaga, P. (2007). A review of feature selection techniques in bioinformatics. Bioinformatics , 23(19), 2507–2517. https://doi.org/10.1093/bioinformatics/btm344 Savitzky, A., & Golay, M. J. E. (1964). Smoothing and differentiation of data by simplified least squares procedures. Analytical Chemistry, 36(8), 1627–1639. Schulte, H., Brink, G., Gruna, R., Herzog, R., & Gruger, H. (2015). Utilization of spectral signatures of food for daily use. In OCM 2015-Optical Characterization of Materials-conference proceedings (p. 39). KIT Scientific Publishing. Retrieved from https://books.google.es/books? hl=en&lr=&id=nvCFBwAAQBAJ&oi=fnd&pg=PA39&dq=scio+sensor&ots=ZH32R6 zb0W&sig=edTxfa80EEyMI2qtF85V4w2uxyQ Silversides, F. G., & Scott, T. A. (2001). Effect of storage and layer age on quality of eggs from two lines of hens. Poultry Science, 80(8), 1240–1245. Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/11495479 Stadelman, W. J., Newkirk, D., & Newby, L. (1995). Egg science and technology. Retrieved from https://books.google.es/books? hl=en&lr=&id=m20SqIXGKYYC&oi=fnd&pg=PR11&dq=Quality+identification+shell +egg&ots=CHHZkEKroK&sig=HSevsdjr9TIspj2nLllom0ucLek Stark, E. (1996). Near infrared spectroscopy past and future. Near Infrared Spectroscopy The Future Waves, 701–713. Stockwell, D. R. B., & Peterson, A. T. (2002). Effects of sample size on accuracy of species distribution models. Ecological Modelling, 148(1), 1–13. Retrieved from http://www.sciencedirect.com/science/article/pii/S030438000100388X Sun, L., Yuan, L.-M., Cai, J.-R., Lin, H., & Zhao, J.-W. (2015). Egg Freshness on-Line Estimation Using Machine Vision and Dynamic Weighing. Food Analytical Methods, 8(4), 922–928. https://doi.org/10.1007/s12161-014-9944-1 Szymańska, E., Gerretzen, J., Engel, J., Geurts, B., Blanchet, L., & Buydens, L. M. C. (2015). Chemometrics and qualitative analysis have a vibrant relationship. Trends in Analytical Chemistry: TRAC, 69, 34–51. https://doi.org/10.1016/j.trac.2015.02.015 Teye, E., Huang, X.-Y., & Afoakwa, N. (2013). Review on the potential use of near infrared spectroscopy (NIRS) for the measurement of chemical residues in food. American Journal of Food Science and Technology, 1(1), 1–8. Retrieved from http://pubs.sciepub.com/ajfst/1/1/1/ Tukey, J. W. (1949). Comparing individual means in the analysis of variance. Biometrics, 5(2), 99–114. https://doi.org/10.2307/3001913 Viscarra Rossel, R. A. (2008). ParLeS: Software for chemometric analysis of spectroscopic data. Chemometrics and Intelligent Laboratory Systems, 90(1), 72– 83. https://doi.org/10.1016/j.chemolab.2007.06.006 Weesepoel, Y. J. A., & Ruth, S. M. van. (2016). Miniaturized NIRs for age and expiration date prediction of packaged chicken fillets. Retrieved from http://edepot.wur.nl/378295 Wilson, B. K., Kaur, H., Allan, E. L., Lozama, A., & Bell, D. (2017). A New Handheld Device for the Detection of Falsified Medicines: Demonstration on Falsified Artemisinin-Based Therapies from the Field. The American Journal of Tropical Medicine and Hygiene. https://doi.org/10.4269/ajtmh.16-0904 Wold, H. (1985). Partial least squares. Encyclopedia of Statistical Sciences. Retrieved from http://onlinelibrary.wiley.com/doi/10.1002/0471667196.ess1914.pub2/full Wold, S., Sjöström, M., & Eriksson, L. (2001). PLS-regression: a basic tool of chemometrics. Chemometrics and Intelligent Laboratory Systems, 58(2), 109–130. Retrieved from http://www.sciencedirect.com/science/article/pii/S0169743901001551 Xu, L., Zhou, Y.-P., Tang, L.-J., Wu, H.-L., Jiang, J.-H., Shen, G.-L., & Yu, R.-Q. (2008). Ensemble preprocessing of near-infrared (NIR) spectra for multivariate calibration. Analytica Chimica Acta, 616(2), 138–143. https://doi.org/10.1016/j.aca.2008.04.031 Yongwei, W., Wang, J., Zhou, B., & Lu, Q. (2009). Monitoring storage time and quality attribute of egg based on electronic nose. Analytica Chimica Acta, 650(2), 183– 188. https://doi.org/10.1016/j.aca.2009.07.049 Yu, S., Xiao, X., Ding, H., Xu, G., Li, H., & Liu, J. (2017). Weighted partial least squares based on the error and variance of the recovery rate in calibration set. Spectrochimica Acta. Part A, Molecular and Biomolecular Spectroscopy, 183, 138–143. https://doi.org/10.1016/j.saa.2017.04.029 Zhao, J., Lin, H., Chen, Q., Huang, X., Sun, Z., & Zhou, F. (2010). Identification of egg’s freshness using NIR and support vector data description. Journal of Food Engineering, 98(4), 408–414. https://doi.org/10.1016/j.jfoodeng.2010.01.018 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Electrical Engineering and Systems Science arXiv (Cornell University)

Determination of egg storage time at room temperature using a low-cost NIR spectrometer and machine learning techniques

Loading next page...
 
/lp/arxiv-cornell-university/determination-of-egg-storage-time-at-room-temperature-using-a-low-cost-4K0qkR6PA6
ISSN
0168-1699
eISSN
ARCH-3348
DOI
10.1016/j.compag.2017.12.030
Publisher site
See Article on Publisher Site

Abstract

Nowadays, consumers are more concerned about freshness and quality of food. Poultry egg storage time is a freshness and quality indicator in industrial and consumer applications, even though egg marking is not always required outside the European Union. Other authors have already published works using expensive laboratory equipment in order to determine the storage time and freshness in eggs. Oppositely, this paper presents a novel method based on low-cost devices for rapid and non-destructive prediction of egg storage time at room temperature (23±1°C). H&N brown flock with 49-week-old hens were used as source for the sampled eggs. Those samples were daily scanned with a low-cost smartphone-connected near infrared reflectance (NIR) spectrometer for a period of 22 days starting to run from the egg laid. The resulting dataset of 660 samples was randomly splitted according to a 10- fold cross validation in order to be used in a contrast and optimization process of two machine learning algorithms. During the optimization, several models were tested to develop a robust calibration model. The best model use a Savitzky Golay preprocessing technique, and an Artificial Neural Network with ten neurons in one hidden layer. Regressing the storage time of the eggs, tests achieved a coefficient of determination (R-squared) of 0.8319±0.0377 and a root mean squared error (RMSE) of 1.97. Although further work is needed, this technique has shown industrial potential and consumer utility to determine the egg's freshness by using a low-cost spectrometer connected to a smartphone. Keywords Non-destructive, chemometrics, freshness, poultry, neural networks. I. Introduction Eggs are an affordable source of nutrients in the human diet. Its freshness and quality declines with the time, been influenced by the storage conditions. Degradation can get to the point of been unfitted for human consumption (Abdel-Nour, Ngadi, Prasher, & Karimi, 2011; Akter, Kasim, Omar, & Sazili, 2014; Akyurek & Okur, 2009; Mathew, Olufemi, & Foluke, 2016). The variability in freshness might be perceived by consumers as lack of quality, for this reason it is important to study methods to monitor and preserve them better (Akter et al., 2014; Karoui et al., 2006). Important and complex changes occur in egg during storage. Predicting these changes is critical in order to monitor egg freshness. These changes include thinning of albumin, weakening the vitelline membrane and increase the water content of the yolk. The properties of foaming and emulsifying the albumen and yolk, respectively, are affected by the protein concentration, pH, ionic strength (Karoui et al., 2006). Storage time, temperature, humidity, air quality, and handling are external factors which causes the degradation of eggs. In particular, storage time is related with two major issues: the reduction of the nutritional value of eggs (Stadelman, Newkirk, & Newby, 1995) and the decreasing of freshness in a logarithmic relation (Silversides & Scott, 2001). Akter et al (2014) demonstrated that egg weight, pH, oxidation and Haugh Units are also adversely affected with increasing storage time at room temperature. In the same work, the authors propose a maximum of 14 days for the time to be stored at room temperature. The freshness can be assessed by physical, biochemical, microbial and sensory parameters. The Haugh Unit (HU) method is a widely used destructive method to measure egg quality (Haugh, 1937). However quality measurements based on HU are biased by the strain and age of the hen (Silversides & Scott, 2001). Lui et al (2007) demonstrated a high correlation between HU and storage time with a R-squared of 0.9868. Sensor technologies are an attractive strategy for non-destructive determination of freshness of the egg, either at the production plant, or at food industries (Galiş, Dale, Boudry, & Théwis, 2012; Karoui et al., 2006). In the last years, non-destructive techniques for freshness and storage time at room temperature have emerged. This techniques include electronic nose (Yongwei, Wang, Zhou, & Lu, 2009), ultrasound (M. Aboonajmi, Setarehdan, Akram, Nishizu, & Kondo, 2014), ultraviolet-visible spectroscopy (Y. Liu et al., 2007) and near infrared spectroscopy (Abdel-Nour et al., 2011; Mohammad Aboonajmi, Saberi, Abbasian Najafabadi, & Kondo, 2015; Lin, Zhao, Sun, kun Bi, & Cai, 2015; Zhao et al., 2010). The food industry has used NIR spectroscopy for long time (Stark, 1996) because it is an accurate, rapid, and non-destructive quality analysis technique (Kumaravelu & Gopal, 2015). Recent works have been published using NIR to predict storage time associated with its freshness, in atlantic salmon (Kimiya, Sivertsen, & Heia, 2013), large yellow croaker (Gangying et al., 2015), snow crab (Lorentzen, Rotabakk, Olsen, Skuland, & Siikavuopio, 2016/1), pork (Chen, Cai, Wan, & Zhao, 2011), apples (F. Liu & Tang, 2015), valerianella locusta (Giovenzana, Beghi, Buratti, Civelli, & Guidetti, 2014), and eggs (Abdel-Nour et al., 2011; Mohammad Aboonajmi et al., 2015; Lin et al., 2015; Zhao et al., 2010). In the past ten years, the evolution of small, hand-held instruments has seen considerable growth (Barton, 2016; Haughey, Galvin-King, Malechaux, & Elliott, 2014). Recently, some low-cost NIR devices have appeared in the market making NIRS applications affordable and therefore, accessible to a wider public (Haughey et al., 2014). NIR spectra is the result of vibrational transitions associated with chemical bonds present in most organic compounds (dos Santos, Lopo, Páscoa, & Lopes, 2013; Kumaravelu & Gopal, 2015; Teye, Huang, & Afoakwa, 2013). The resulting spectrum is a consequence of the modifications made simultaneously in all the properties in the sample, making the calibration process more complicated (Florkowski, Prussia, Shewfelt, & Brueckner, 2009; Martens & Naes, 1992). Chemometrics has become into an essential technique aimed to develop NIR calibration models. Using this techniques it is possible to process many numerous samples in a short time (Moros, Garrigues, & Guardia, 2010). Multivariate analysis techniques are commonly used to process spectral data, techniques such as Principal Component Analysis (PCA), Partial Least Squares (PLS) have been widely used (Kumaravelu & Gopal, 2015). Recently, some machine learning techniques are popularizing as good alternatives to the classic techniques, since they are based on pattern recognition (Brereton, 2015). The aim of this study was to assess the potential of a low-cost NIR spectrometer as a non-destructive and rapid technique for egg storage time assessment. More specific objective is to develop and evaluate a chemometric NIR calibration model based on machine learning techniques for the determination of egg storage time at room temperature. II. Materials and methods Overall methodology, as can be viewed in Figure 1, consists of three moments: the acquisition of the data (Section 2.1), data partition using a cross-validation technique (Section 2.2) and the optimization of the chemometric model (Section 2.3). At each moment several steps were followed. In following subsections the methodology is described in detail. Figure 1. Diagram of moments and steps in experimental methodology 2.1 Data acquisition Samples were collected using a smartphone-connected NIR spectrometer; each sample was uploaded to a cloud application using phone’s internet to be stored in a dataset until the end of sample collection. 2.1.1 Smartphone-connected NIR spectrometer TM SCiO handheld NIR Spectrometer is a device with a built-in light source and a silicon sensor in a NIR short wavelength of 740nm to 1070nm (Goldring, Sharon, Brodetzki, & Ruf, 2016). The spectral data are transferred to a smartphone via Bluetooth and then to a cloud application (Goldring et al., 2016). Previous works has already pointed to the potential of this low-cost device (Haughey et al., 2014; Schulte, Brink, Gruna, Herzog, & Gruger, 2015). This hardware allows to perform rapid tests which could be available on a smartphone (Cartwright, 2016; Das et al., 2015; Pügner, Knobbe, & Grüger, 2016). It has also been reported the use of this device in research related to the detection of counterfeit medicines (Guillemain, Dégardin, & Roggo, 2017; Kaur, 2015; Wilson, Kaur, Allan, Lozama, & Bell, 2017) and for predicting the storage time and expiration packaged chicken fillets (Weesepoel & Ruth, 2016). 2.1.2 Sample collection A total of 660 spectral signals between 740nm to 1070nm were recorded with a spectral resolution of 1 nm. Spectral data were stored in a cloud based dataset with its corresponding reference values in the time storage. Spectral curves correspond to 22 days of continuous monitoring, with 30 shell intact brown poultry eggs, with weights between 55g and 65 g. Eggs were collected from a flock of 20.000 hens of the strain H&N which were between 49-52 weeks old. Hens were housed in stacked cage system, and were fed with a standard ration without the use of laying egg promoters. The spectral data used for experimentation was obtained by averaging two repeated measurements taken successively in the same place. Eggs were scanned in the poultry house immediately after been laid (day 0) and then transported to the laboratory in a thermally insulated container. Measures from day 1 to day 21 were obtained in laboratory conditions monitored hourly at 23±1°C and relative humidity 90±2%. The interval between each measurement was strictly 24 hours. The procedure employed is simple, and the time required for each measurement is short. Non destructive technique was used in this experiment, since it was intended to understand how the spectrum is modified in each of the eggs over time. Using a research license of SCiO Lab, egg spectral signals were downloaded and imported into Matlab (The MathWorks Inc., Natick, MA) in order to develop and optimize the chemometric models. The dataset used for this experiments will be publicly available after manuscript is accepted. During the peer reviewing process, dataset is available for downloading in the following private link: https://data.mendeley.com/datasets/6hn67h2trb/draft?a=76022a6d-e2f2-454a-b507-4c1b59d5d0c5 2.2 Data partition Raw dataset was downloaded and then partitioned using a repeated 10-fold cross- validation technique in order to have training, validation and test subsets for the optimization of calibration model. The model performance measures should be evaluated in a set of new data which are not been used for training the model. A good model should be able to make accurate estimations on this test data (Mucherino, Papajorgji, & Pardalos, 2009). Cross-validation is one common technique applied in machine learning to maximize the use of available data. In this technique, dataset is randomly divided into multiple subsets for training and test the model. Cross-validation is used to avoid overfitting of the model. (Kuhn & Johnson, 2013; Refaeilzadeh, Tang, & Liu, 2009). In this work, spectral data were divided into training (calibration), validation and test subsets using a variation known as repeated cross-validation (Garcia & Filzmoser, 2015; Kuhn & Johnson, 2013). A repeated 10-fold cross-validation technique was chosen. Therefore, data were splitted into 10 groups, which 9 are used as calibration and validation sets and the remaining one as a test set. This process was repeated 50 times. The training and test set were changed until all folds have been tested. Data partition for each fold divided randomly the dataset, having 462 samples (70%) for training, 132 samples (20%) for validation and 66 samples (10%) for test. 2.3 Optimization of calibration model The relationship between the response in the spectral region of the NIR spectra and the target is often a nonlinear type (Bertran et al., 1999). The origin of these nonlinearities is difficult to identify, for this reason, calibration is often performed using multivariate analysis (Martens & Naes, 1992). In order to develop a chemometric model, it is required the NIR spectra, the reference values for calibration and an algorithm to link them (Barton, 2016). Parameters of the model were optimized in three consecutive phases. With the parameters achieved by the best model, there was performed an evaluation in unseen data (test set) in order to estimate future performance in new data. This step was performed in three consecutive phases of optimization. In Phase 1, two modelling algorithms (Section 2.3.1 and 2.3.2) were tested simultaneously with seven preprocessing techniques (Section 2.3.3). In Phase 2, the parameters of selected model were tuned in order to optimize its performance. In Phase 3, the feature selection threshold (Section 2.3.4) was fine tuned. 2.3.1 Partial Least Squares (PLS) PLS was introduced by the Swedish statistician Herman Wold (H. Wold, 1985). In chemometrics, PLS-regression is used as a basic method for relating two data matrices, by a linear multivariate model. However, this method goes beyond traditional regression since it has the ability to analyze data with incomplete, noisy and collinear variables (S. Wold, Sjöström, & Eriksson, 2001). This method is widely applied on NIR spectroscopy where multiple input variables are required. The accuracy of the model in PLS-regression, improves when the number of relevant variables and the number of observations is increased (Hattori & Otsuka, 2017). The PLS model is aimed to find in a multidimensional space, the direction in X, which explains the maximum variance direction in the Y. PLS regression is suited when the problem has more predictor variables than the number of observations, and when there could be multicollinearity among X values (Yu et al., 2017). 2.3.2 Artificial neural networks (ANN) Artificial neural networks (ANN) are data-modeling tools aimed to analyze complex relationships between inputs and outputs. In recent years, ANN have become a subject of much relevance in the scientific and research field, they are inspired in the human central nervous system, which have lots of numerous cells that work quickly and help in decision making (Cascardi, Micelli, & Aiello, 2017). The Multilayer Perceptron (MLP) is a type of layered neural network with connections between consecutive forwarding layers. Figure 2 shows the general scheme of a MLP, one input layer, one or more hidden layers and one output layer. The transfer function of neurons is commonly a sigmoid function, but other functions can also be used (Kruse et al., 2013; Ruck, Rogers, Kabrisky, Oxley, & Suter, 1990). Figure 2. Multilayer perceptron representation. Each neuron receives the output signals of the neurons in previous layer and provides an output for the next layer. The output layer receives as input the output of the last hidden layer and returns the output of the network. (Gardner & Dorling, 1998; Kruse et al., 2013). The number of neurons, layers and their connections is commonly known as the architecture of the neural network, been one of the key parameters to be optimized. The architecture depends on the complexity of problem, and there is no general method for choosing the best one. Choosing a good architecture is an empirical process, where multiple architectures are tested in order to find one that offers satisfactory results (Herrera, Hervas, Otero, & Sánchez, 2004; Rivero, Fernandez- Blanco, Dorado, & Pazos, 2011). The use of artificial neural networks has been successfully applied to NIR spectrometry in rapid quantification of wine compounds (Martelo-Vidal & Vázquez, 2015), evaluation of chemical components and properties of the jujube fruit (Guo, Ni, & Kokot, 2016), characterization of blends containing refined and extra virgin olive oils (Aroca-Santos, Cancilla, Pariente, & Torrecilla, 2016). 2.3.3 Preprocessing techniques NIR spectra can be preprocessed to eliminate physical phenomena which alter raw spectra. This practice is an integral part of the chemometric modeling (Blanco & Villarroya, 2002) and it is considered one of the most important steps. The data contained on NIR spectra is the composition of several signals with overlapping information. (Blanco & Villarroya, 2002; dos Santos et al., 2013; Rinnan, Berg, & Engelsen, 2009). Proper choice of technique preprocessing is difficult to assess before the validation of the model, therefore NIR spectra preprocessing is performed by trial and error (Rinnan et al., 2009; Xu et al., 2008). In this work, raw spectral signal and other six common preprocessing techniques are analyzed: 1) Raw spectra: NIR-reflectance measurement of a sample, includes both the diffusively reflected and specular reflected radiation. The latter as does not contain relevant information and it is commonly minimized by instrument design and sampling geometry. The diffusively reflected light, is the primary source of information in the NIR spectra. (Rinnan et al., 2009). 2) Savitzky Golay: This pre-processing technique was proposed in 1964 by the authors who gave name to the technical (Savitzky & Golay, 1964). The authors popularized a method which includes a smoothing step for numerical derivation of a vector. In this method a p-order polynomial is fitted in a symmetric window of w-width on the raw data, then the d-order derivative is calculated at centre point i. This operation is applied sequentially to all spectral points. The width of the window size, the degree of the fitted polynomial, are decisions that need to be made. The highest derivative that can be determined is the degree of the polynomial (Rinnan et al., 2009). For initial experiments the width was se to five, and both orders of polynomial and derivative were set to two. In subsequent experiments, the width was modified from 3 to 101 (odd numbers only), the polynomial order from one to five, and the derivative order from one to five. 3) Beer-Lambert law: suggests a linear relationship between the reflectance of the spectra and concentration of components, the law can be expressed as shown in Equation 1 (Rinnan et al., 2009): Aλ=−log ⁡ (R)≅ ϵλ ×l ×c (1) Where Aλ is the wavelength-dependent absorbance, R is the reflectance, ελ is the wavelength-dependent molar absorptivity, l is the length of the light through the sample matrix and c is the concentration of the components of interest. 4) Standard Normal Variate (SNV): is a method for scattering correction of NIR data (Barnes, Dhanoa, & Lister, 1989). The formula is expressed in Equation 2. X −x X = (2) i , snv Where Xi,snv is the SNV at a wavelength i, x̄ is the spectrum average of the sample to be corrected, and S is the standard deviation of the spectrum sample. 5) Multiplicative Scatter Correction: is a widely used pre-processing technique for NIR, it was first introduced by Martens et al. (1983). In this technique, undesirable scatter effects of spectra are removed from the data matrix prior to data modeling. MSC comprises in a first step the estimation of the correction coefficients, and a second step which consists in the correction of the recorded spectrum. Equation 3 and 4 show both steps respectively. X =b +b × X +e (3) o o r,1 r X −b o o X = =X + (4) c r b b r,1 r ,1 X X Where is the corrected spectra, is one original sample spectra measured by the c o NIR instrument, is a reference spectrum used for preprocessing of the entire dataset. In most applications, the average spectrum of the calibration set is used as the X b b reference spectrum. e is the unmodeled part of . and are the scalar o o r,1 parameters, which differ for each sample. (Rinnan et al., 2009). 6) First Spectral Derivative (FSD): these technique have been used in analytical spectroscopy for decades due to its ability to remove additive and multiplicative effects of the spectra (dos Santos et al., 2013). Finite difference is the basic method for spectral derivation (Rinnan et al., 2009); thus, the first derivative is calculated as the difference between two subsequent spectral points, as can be analyzed in Equation 5. X =X −X (5) i , fsd i i−1 Where X denotes the first derivative in the wavelength i, and X represents the i,fsd i,ssd second derivative in the wavelength i. The first derivative removes the baseline of spectra. 7) Second Spectral Derivative (SSD): This technique is also based on finite difference for spectral derivation (Rinnan et al., 2009); thus, the second derivative is calculated as the difference between two subsequent points from processed FSD signal. The formula is shown in Equation 6 : X =X −X (6) i, ssd i , fsd i, fsd−1 Where X represents the second derivative in the wavelength i. The second i,ssd derivative besides removing the baseline, remove the linear trend of spectra. According to Rinnan et al., (2009) this technique should be avoided in practice, since it is not feasible for most real measurements due to noise inflation. 2.3.4 Feature selection threshold Commonly NIR spectra contains a large amount of information along the wavelength range. For this reason, it is important to perform a technique aimed to reduce this amount of data (Blanco & Villarroya, 2002). This techniques have now become a necessity and a requirement in chemometrics (Guyon, Gunn, Nikravesh, & Zadeh, 2008; Saeys, Inza, & Larrañaga, 2007). In machine learning techniques, the proper selection of features to get of a small subset with lower sensitivity to non-linearities is usually effective to improve performance of the models (Leardi, Boggia, & Terrile, 1992; Saeys et al., 2007). Selection of relevant features related to the compound of interest and avoiding interference of others should be a target in terms of building a robust predictive model. In this study, a feature selection method was applied to select the informative wavelengths. Feature Selection does not alter the original representation of the variables. This techniques simply choose a subset of the best wavelengths for the model (Saeys et al., 2007). A threshold is normally used together with a filter technique either univariate or multivariate model to evaluate which are the best wavelengths (Szymańska et al., 2015). The present work used a multivariate filter described by Hall (1999) called Correlation- based Feature Selection (CFS). This filter is a simple algorithm that ranks feature importance according to its correlation function with the predicted variable. By using this method, the relevant informative wavelengths will be selected and therefore, it is expected to obtain an improvement of the model’s performance. 2.3.5 Model evaluation All models were evaluated using performance measures as the coefficient of determination (R-squared) in validation set from a cross-validation (dos Santos et al., 2013; Viscarra Rossel, 2008). Mean and standard of R-squared obtained of 500 models (50 repetitions by 10-fold cross-validation) were calculated for each model configuration in order to decide the best parameters of the model. Final results are presented in test set from a cross-validation. R-squared, mean absolute error (MAE) and root mean square error (RMSE) were obtained according to equations 7-9, respectively. MAE and RMSE, represent the difference between predictive values and the actual values (Armstrong & Collopy, 1992; Hyndman & Koehler, 2006). i=n ( y¿¿i−^y ) (7) ❑ i=1 R−squared = ¿ i=n ¿ y −^y ∨¿ i i (8) i=1 MAE= ¿ RMSE= ❑ (9) y y Where is the real value of the i-th observation, is the predicted value of the i-th i i observation, y is the average of real values and n is the number of observations. III. Results and Discussion This study presents a method to estimate the egg’s storage time at room temperature using a smartphone connected NIR spectrometer and machine learning techniques. Data acquisition and data partition were broadly covered in section 2.1 and 2.2. Therefore in this section results obtained in the three phases of the calibration model optimization are presented. 3.1 Phase 1: Selection of modelling algorithm and pre-processing technique This phase was aimed to simultaneously choose the modelling algorithm and the pre- processing techniques. A grid search technique (Koch et al., 2012; Ma, Zhang, & Wang, 2015), was used to evaluate models. PLS and ANN algorithms were trained using the same partition schema to ensure that both models receive exactly the same input data, making the results comparable. Due to the influence of the feature selection threshold, the preprocessing techniques were evaluated at all thresholds using a correlation based feature selection (Hall, 1999). Figure 3: R-squared in CV validation set obtained with the seven preprocessing techniques at different values of feature threshold. a) Using PLS b) using ANN. The results of the mean R-squared of 50 repeated 10-fold cross-validation achieved using PLS with 10 latent variables, can be seen in Figure 3a. Best results are obtained with pre-processing technique 4 (Standard Normal Variate) at 70% feature selection threshold and pre-processing technique 6 (First Spectral Derivative) at 50% feature selection threshold. In both cases, the results are below 0.8. In Figure 3b, can be observed the results of the mean R-squared of 50 repeated 10- fold cross-validation achieved using ANN with 10 neurons in one hidden layer. Best results are obtained with pre-processing technique 2 (Savitzky Golay) at 90% feature selection threshold, and pre-processing technique 7 (Second Spectral Derivative) at 60% feature selection threshold. In both cases, the R-squared results are above 0.8. ANN models outperform to the PLS models. Therefore, ANN will be used to be optimized. Despite the fact Savitzky Golay and Second Spectral Derivative have similar results, the latter one should be avoided in practice, because it is not feasible in practice due to the added noise (Rinnan et al., 2009). Additionally, Savitzky Golay technique, has tuning parameters which can be useful to optimize the model (Savitzky & Golay, 1964). Therefore, once the preprocessing was chosen, the optimization was executed again to optimize the feature selection threshold again. Main reason is Savitzky-Golay’s tuning parameters can transform the input pattern, and thus, a reevaluation of the selected wavelengths is necessary. Therefore, this parameter is set to 100% for now. Savitzky Golay widths, odd numbers between 3 and 101 were tested for different configurations of Polynomial order and Derivative order (between first and fifth). Table 1 show the results of the mean, standard deviation, min and max, and p-value of a Tukey Honest Significant Difference (Tukey, 1949) test of R-squared of 50 repeated 10-fold cross-validation achieved at the best width of each configuration. There is an intrinsic redundancy in the hierarchy of Savitzky Golay derivation. For each Polynomial order, the subsequent derivative order, gave the same estimate of the coefficients. In example, for the first-degree polynomial, a first derivative and second derivative will give the same answer. The results of redundant configurations are not been presented. Table 1. Results of SavGol parameters at best width, using odd numbers from 3 al 101 Polynomial Derivative width mean ± std min max Tukey HSD degree order First First 7 0,7644 ± 0,069 0,360 0,861 c Second First 7 0,7654 ± 0,067 0,445 0,894 c Second Second 29 0,8204 ± 0,047 0,610 0,909 ab Third First 19 0,7678 ± 0,065 0,472 0,880 c Third Second 21 0,8214 ± 0,045 0,675 0,910 ab Third Third 53 0,8249 ± 0,044 0,690 0,926 ab Fourth First 13 0,7679 ± 0,073 0,267 0,899 c Fourth Second 41 0,8222 ± 0,045 0,561 0,926 ab Fourth Third 51 0,8248 ± 0,043 0,618 0,914 ab Fourth Fourth 61 0,8199 ± 0,042 0,651 0,909 ab Fifth First 5 0,7684 ± 0,07 0,123 0,889 c Fifth Second 39 0,8252 ± 0,045 0,624 0,931 a Fifth Third 67 0,8276 ± 0,041 0,590 0,910 a Fifth Fourth 61 0,8189 ± 0,042 0,660 0,907 ab Fifth Fifth 67 0,8107 ± 0,045 0,607 0,913 b Rows with different letters differ significantly according to Tukey HSD for a value of p <0.01. According to the Tukey HSD test, best configuration of Savitzky Golay can be a fifth polynomial degree and a second or third derivative order. The latter was chosen since it has a greater mean and smaller standard deviation in test data. Therefore the optimized Savitzky Golay technique for this experiment was of 67 smoothing points (width), a fifth polynomial degree and a third order derivative. There were evident differences in the spectra of the eggs as a function of storage time. Figure 4a shows the result of applying Savitzky Golay preprocessing technique. As the eggs were stored for longer, the obtained spectra takes a different characteristic values. This differences can also be seen in Figure 4b, which was made with an average of the signals corresponding to the eggs stored at 0, 7, 14 and 21 days. Visually exploring the dataset it is found some patterns indicating that spectra of eggs changes as the storage time is increased. Notice in Figure 4b at wavelengths of 860- 940 nm, that fresh eggs with zero days of storage have a Savitzky Golay reflectance near to zero, while stored eggs have oscillations between 1 and -1 in the same wavelength. After this pattern fresh eggs have a peak at 933nm, while stored eggs show a similar peak earlier at 913nm. At 983nm and 999nm, the difference between eggs stored during 7 days and those stored more time is evident. In a visual exploration of data it is difficult to find differences between eggs stored for more than 14 days. According to Akter et al (2014) eggs must be stored maximum of 14 days at room temperature. The small differences between spectra of egg stored for more than 14 days may be a indicative of a deterioration in freshness of eggs. Figure 4. Spectral signal processed with optimized Savitzky Golay at different storage times. 4a) all spectral signals. 4b) mean spectra at 0,7,14,and 21 days of storage. 3.2 Phase 2: Optimization of selected algorithm parameters During phase 1, the model using an ANN algorithm obtained better results. For this reason, ANN algorithm was selected. In this phase, it was decided to optimize the architecture of the neural network. Architectures of one and two layer were evaluated using a grid search (Koch et al., 2012; Ma et al., 2015). Values ranging from 0 to 200 neurons in each layer with intervals of 10 were evaluated. Figure 6. R-squared in 50 repeated 10-fold CV validation set. a) mean b) standard deviation Figure 6b shows the standard deviation of 50 repeated 10 fold cross-validated models at each architecture, this measure indicates the stability and the repeatability of the proposed model. Best results are achieved in architectures of one hidden layer with 10 or 20 neurons, and architectures of two hidden layers in which the second layer has 10 or 20 neurons. Although the best results are achieved with an architecture of two hidden layers with 180 and 10 neurons respectively. It was observed that using an architecture of 10 neurons in one hidden layer the model obtains statistically equal results. According to Abu-Mostafa (1989) in this work is has been selected the least complex with the same results, this is an architecture of 10 neurons in one hidden layer. 3.3 Phase 3: Fine tuning of feature selection threshold The following experiment aims to find the best threshold value for feature (75, 150) selection. Figure 5 shows the influence of feature selection threshold in R-squared results. Figure 5. Results of R-squared at different values of feature threshold Results above 0.8 are obtained with values of feature selection threshold from 30 onwards. Figure 5 evidences the importance of the appropriate selection of features as inputs of the model. A feature selection threshold of 48 is selected since R-squared obtained in test data is equal to 0.8319 ± 0.0377. Although, evaluation of storage time of poultry eggs involves a complex process, this study shows that egg quality can be predicted using a ANN model with relevant features as input patterns. Our results confirm the stated by Guyon et al. (2008), that appropriate selection of relevant wavelengths is very important and must be used as a good strategy to avoid the inclusion of uninformative and redundant wavelengths in the predictive model. 3.4 Evaluation of the model in unseen data (test set). Once optimized all parameters, final configuration of the model is a Savitzky-Golay pre- processing technique, with a fifth polynomial degree, a third derivative order and a width of 67. The ANN model has 10 neurons in one hidden layer and the feature selection threshold is of 48. The calibration model using the previous mentioned parameters was tested in unseen data to evaluate its performance and generalization ability. Figure 6a shows a regression plot between the values of egg storage time predicted by the model, with their corresponding actual valors. It can be seen that most predictions present an absolute error of about 2 days, some predictions show a error of about 4 days, and one value has the worst error of 6 days. Figure 6b shows the histogram of absolute errors. Figure 6. Model performance in test set data. a) Regression fit plot actual vs predicted. b) Absolute error histogram. In order to evaluate the performance of the model in unseen data, R-squared and RMSE was calculated for the model and can be seen in Table 2. Proposed model has a high significance in all subsets. Table 2. Performance metrics in training, validation y test set training validation test n samples 462 132 66 R-squared 0.865 0.862 0.873 Mean Absolute Error 1.834 1.998 1.810 Root Mean Squared Error 2.15 2.19 1.97 F-statistic vs. constant model 3.09e+03 673 449 p-value <0.01 <0.01 <0.01 The differences between results in test and training set performance could indicate the degree of overfitting (Stockwell & Peterson, 2002). However, Table 2 shows that these parameters are similar between training set, validation set and test set. Therefore we can affirm that the models are well fitted within the context of the calibration set. Although the R-squared as a performance metric of the model is below 0.9, the RMSE is about two days, which gives safety margins to the user. Some authors have evaluated predictive models for freshness assessment. Lin et. al. (2011) obtained a R-squared of 0.879 and a RMSE 2.443 using NIR spectroscopy for Haugh unit. Sun et. al. (2015) used artificial vision and dynamic weighing, to obtain a R-squared of 0.8653 and a RMSE of 3.745. Abdel-Nour et. al. (2011) achieved a R- squared of 0.89 and a RMSE of 1.65 in validation dataset for the prediction of storage days, using a lab grade VIS/NIR spectroradiometer. Our results are similar to the mentioned above and this clearly indicate that there a is potential for portable NIR instruments as an alternative to desktop lab instrumentation in agree with Haughey (2014). Moreover, due to the recent advances in sensors technology such as Changhong (Rateni, Dario, & Cavallo, 2017) which have a embedded NIR sensor. It is possible to create applications using the proposed model. This approach could enable the consumers to predict the storage time in poultry eggs. IV. Conclusions and future work Our findings show the potential of smartphone-connected devices based on short wave NIR as an effective method for the evaluation of storage time in poultry eggs. Spectral analysis eggs is a rapid and nondestructive method for egg storage time determination. Reflectance spectral data of the egg contains relevant information about its storage time, a parameter of freshness. There were noticeable differences in the processed spectra values of the eggs as a function of storage time. Suitable predictive models were built with PLS and ANN regression techniques, however the latter performed better achieving a R-squared of 0.873 and a RMSE of 1.97 in test set data, suggesting that the spectra obtained with the smartphone connected NIR spectrometer can be used as a nondestructive method for the assessment of egg storage time, a parameter of quality and freshness. The use of a smartphone connected NIR spectrometer is recommended for egg storage time assessment. However, further work is needed to assess the long-term reliability of the system. A combination of this method with destructive techniques is recommended. Future work focus on the exhaustive experimentation at room and refrigeration temperatures using this low-cost spectrometer on eggs from hens of diverse ages and strains. The use of hyperspectral imaging for storage time prediction is also a potential technique to be studied. Acknowledgements The authors wish to thank Agrolomas CL for providing samples directly from their poultry houses. An special acknowledgement to CEDIA National Research and Education Network. Iván Ramírez-Morales and Enrique Fernández-Blanco would also like to thank the support provided by the NVIDIA Research Grants Program. This work is part of DINTA-UTMACH and RNASA-UDC research groups. Bibliografía Abdel-Nour, N., Ngadi, M., Prasher, S., & Karimi, Y. (2011). Prediction of Egg Freshness and Albumen Quality Using Visible/Near Infrared Spectroscopy. Food and Bioprocess Technology, 4(5), 731–736. https://doi.org/10.1007/s11947-009- 0265-0 Aboonajmi, M., Saberi, A., Abbasian Najafabadi, T., & Kondo, N. (2015). Quality assessment of poultry egg based on Vis-NIR Spectroscopy and RBF networks. International Journal of Food Properties, 2912(December), 150831112742007. https://doi.org/10.1080/10942912.2015.1075215 Aboonajmi, M., Setarehdan, S. K., Akram, A., Nishizu, T., & Kondo, N. (2014). Prediction of Poultry Egg Freshness Using Ultrasound. International Journal of Food Properties, 17(9), 1889–1899. https://doi.org/10.1080/10942912.2013.770015 Abu-Mostafa, Y. S. (1989). The Vapnik-Chervonenkis Dimension: Information versus Complexity in Learning. Neural Computation, 1(3), 312–317. https://doi.org/10.1162/neco.1989.1.3.312 Akter, Y., Kasim, A., Omar, H., & Sazili, A. Q. (2014). Effect of storage time and temperature on the quality characteristics of chicken eggs. Journal of Food, Agriculture. Retrieved from https://www.researchgate.net/profile/Yeasmin_Akter/publication/284471186_Effect _of_storage_time_and_temperature_on_the_quality_characteristics_of_chicken_e ggs/links/5653df8f08ae1ef929763361.pdf Akyurek, H., & Okur, A. A. (2009). Effect of storage time, temperature and hen age on egg quality in free-range layer hens. Journal of Animal and Veterinary Advances: JAVA, 8(10), 1953–1958. Retrieved from http://docsdrive.com/pdfs/medwelljournals/javaa/2009/1953-1958.pdf Armstrong, J. S., & Collopy, F. (1992). Error measures for generalizing about forecasting methods: Empirical comparisons. International Journal of Forecasting, 8(1), 69–80. https://doi.org/10.1016/0169-2070(92)90008-W Aroca-Santos, R., Cancilla, J. C., Pariente, E. S., & Torrecilla, J. S. (2016). Neural networks applied to characterize blends containing refined and extra virgin olive oils. Talanta, 161, 304–308. https://doi.org/10.1016/j.talanta.2016.08.033 Barnes, R. J., Dhanoa, M. S., & Lister, S. J. (1989). Standard Normal Variate Transformation and De-trending of Near-Infrared Diffuse Reflectance Spectra. Applied Spectroscopy, 43(5), 772–777. Retrieved from https://www.osapublishing.org/as/abstract.cfm?URI=as-43-5-772 Barton, F., II. (2016). Near infrared equipment through the ages and into the future. NIR News, 27(1), 41. https://doi.org/10.1255/nirn.1585 Bertran, E., Blanco, M., Maspoch, S., Ortiz, M. C., Sánchez, M. S., & Sarabia, L. A. (1999). Handling intrinsic non-linearity in near-infrared reflectance spectroscopy. Chemometrics and Intelligent Laboratory Systems, 49(2), 215–224. https://doi.org/ 10.1016/S0169-7439(99)00043-X Blanco, M., & Villarroya, I. (2002). NIR spectroscopy: a rapid-response analytical tool. Trends in Analytical Chemistry: TRAC, 21(4), 240–250. https://doi.org/10.1016/S0165-9936(02)00404-1 Brereton, R. G. (2015). Pattern recognition in chemometrics. Chemometrics and Intelligent Laboratory Systems, 149, Part B, 90–96. https://doi.org/10.1016/j.chemolab.2015.06.012 Cartwright, J. (2016). Technology: Smartphone science. Nature, 531(7596), 669–671. https://doi.org/10.1038/nj7596-669a Cascardi, A., Micelli, F., & Aiello, M. A. (2017). An Artificial Neural Networks model for the prediction of the compressive strength of FRP-confined concrete circular columns. Engineering Structures, 140, 199–208. Retrieved from http://www.sciencedirect.com/science/article/pii/S0141029617305461 Chen, Q., Cai, J., Wan, X., & Zhao, J. (2011). Application of linear/non-linear classification algorithms in discrimination of pork storage time using Fourier transform near infrared (FT-NIR) spectroscopy. LWT - Food Science and Technology, 44(10), 2053–2058. Retrieved from http://www.sciencedirect.com/science/article/pii/S0023643811001630 Das, A., Swedish, T., Wahi, A., Moufarrej, M., Noland, M., Gurry, T., … Raskar, R. (2015). Mobile phone based mini-spectrometer for rapid screening of skin cancer. In SPIE Sensing Technology + Applications (p. 94820M–94820M–5). International Society for Optics and Photonics. https://doi.org/10.1117/12.2182191 dos Santos, C. A. T., Lopo, M., Páscoa, R. N. M. J., & Lopes, J. A. (2013). A review on the applications of portable near-infrared spectrometers in the agro-food industry. Applied Spectroscopy, 67(11), 1215–1233. https://doi.org/10.1366/13-07228 Florkowski, W. J., Prussia, S. E., Shewfelt, R. L., & Brueckner, B. (2009). Postharvest Handling: A Systems Approach. Elsevier Science. Retrieved from https://books.google.com.ec/books?id=_euakAoRNZEC Galiş, A.-M., Dale, L. M., Boudry, C., & Théwis, A. (2012). The potential use of near- infrared spectroscopy for the quality assessment of eggs and egg products. Scientific Works C Series Veterinary Medicine, 58, 294–307. Retrieved from https://orbi.ulg.ac.be/bitstream/2268/156809/1/Art38.pdf Gangying, Z., Yangyang, G., Wei, L., Jie, H., Minmin, W., & Guohua, H. (2015). Large Yellow Croaker (Pseudosciaena crocea) Storage Time Detecting Method by Visible-NIR Spectroscopy Combined with Aperiodic Stochastic Resonance. Journal of Chinese Institute of Food Science and Technology, 1, 050. Retrieved from http://en.cnki.com.cn/Article_en/CJFDTotal-ZGSP201501050.htm Garcia, H., & Filzmoser, P. (2015). Multivariate Statistical Analysis using the R package chemometrics. Vienna, Austria. Retrieved from ftp://155.232.191.229/cran/web/packages/chemometrics/vignettes/chemometrics- vignette.pdf Gardner, M. W., & Dorling, S. R. (1998). Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences. Atmospheric Environment, 32(14–15), 2627–2636. https://doi.org/10.1016/S1352- 2310(97)00447-0 Giovenzana, V., Beghi, R., Buratti, S., Civelli, R., & Guidetti, R. (2014). Monitoring of fresh-cut Valerianella locusta Laterr. shelf life by electronic nose and VIS–NIR spectroscopy. Talanta, 120, 368–375. Retrieved from http://www.sciencedirect.com/science/article/pii/S003991401300996X Goldring, D., Sharon, D., Brodetzki, G., & Ruf, A. (2016). WITHDRAWN PATENT AS PER THE LATEST USPTO WITHDRAWN LIST. US Patent. Retrieved from http:// www.freepatentsonline.com/9354117.html Guillemain, A., Dégardin, K., & Roggo, Y. (2017). Performance of NIR handheld spectrometers for the detection of counterfeit tablets. Talanta, 165, 632–640. https://doi.org/10.1016/j.talanta.2016.12.063 Guo, Y., Ni, Y., & Kokot, S. (2016). Evaluation of chemical components and properties of the jujube fruit using near infrared spectroscopy and chemometrics. Spectrochimica Acta. Part A, Molecular and Biomolecular Spectroscopy, 153, 79– 86. https://doi.org/10.1016/j.saa.2015.08.006 Guyon, I., Gunn, S., Nikravesh, M., & Zadeh, L. A. (2008). Feature Extraction: Foundations and Applications. Springer Berlin Heidelberg. Retrieved from https://books.google.com.ec/books?id=FOTzBwAAQBAJ Hall, M. A. (1999). Correlation-based feature selection for machine learning. The University of Waikato. Retrieved from https://www.lri.fr/~pierres/donn%E9es/save/these/articles/lpr-queue/ hall99correlationbased.pdf Hattori, Y., & Otsuka, M. (2017). Modeling of feed-forward control using the partial least squares regression method in the tablet compression process. International Journal of Pharmaceutics, 524(1-2), 407–413. https://doi.org/10.1016/j.ijpharm.2017.04.004 Haughey, S. A., Galvin-King, P., Malechaux, A., & Elliott, C. T. (2014). The use of handheld near-infrared reflectance spectroscopy (NIRS) for the proximate analysis of poultry feed and to detect melamine adulteration of soya bean meal. Analytical Methods, 7(1), 181–186. https://doi.org/10.1039/C4AY02470B Haugh, R. R. (1937). The Haugh unit for measuring egg quality. Retrieved from http://en.journals.sid.ir/ViewPaper.aspx?ID=128047 Herrera, F., Hervas, C., Otero, J., & Sánchez, L. (2004). Un estudio empırico preliminar sobre los tests estadısticos más habituales en el aprendizaje automático. Tendencias de La Minerıa de Datos En Espana, Red Espanola de Minerıa de Datos Y Aprendizaje (TIC2002-11124-E), 403–412. Retrieved from http://www.lsi.us.es/~riquelme/red/Capitulos/LMD35.pdf Hyndman, R. J., & Koehler, A. B. (2006). Another look at measures of forecast accuracy. International Journal of Forecasting, 22(4), 679–688. https://doi.org/10.1016/j.ijforecast.2006.03.001 Karoui, R., Kemps, B., Bamelis, F., De Ketelaere, B., Decuypere, E., & De Baerdemaeker, J. (2006). Methods to evaluate egg freshness in research and industry: A review. European Food Research and Technology = Zeitschrift Fur Lebensmittel-Untersuchung Und -Forschung. A, 222(5-6), 727–732. https://doi.org/ 10.1007/s00217-005-0145-4 Kaur, H. (2015). Counterfeit Pharmaceuticals and Methods to Test Them. In M. Walport (Ed.), Annual Report of the Government Chief Scientific Adviser 2015: Forensic Science and Beyond: Authenticity, Provenance and Assurance Evidence and Case Studies (pp. 132–137). London: Government Office for Science. Retrieved from http://researchonline.lshtm.ac.uk/id/eprint/2528148 Kimiya, T., Sivertsen, A. H., & Heia, K. (2013). VIS/NIR spectroscopy for non- destructive freshness assessment of Atlantic salmon (Salmo salar L.) fillets. Journal of Food Engineering, 116(3), 758–764. Retrieved from http://www.sciencedirect.com/science/article/pii/S0260877413000216 Koch, P., Bischl, B., Flasch, O., Bartz-Beielstein, T., Weihs, C., & Konen, W. (2012). Tuning and evolution of support vector kernels. Evolutionary Intelligence, 5(3), 153–170. https://doi.org/10.1007/s12065-012-0073-8 Kruse, R., Borgelt, C., Klawonn, F., Moewes, C., Steinbrecher, M., & Held, P. (2013). Multi-Layer Perceptrons. In Computational Intelligence (pp. 47–81). Springer London. https://doi.org/10.1007/978-1-4471-5013-8_5 Kuhn, M., & Johnson, K. (2013). Applied Predictive Modeling: Springer New York. https://doi.org/10.1007/978-1-4614-6849-3 Kumaravelu, C., & Gopal, A. (2015). A review on the applications of Near-Infrared spectrometer and Chemometrics for the agro-food processing industries, 8–12. https://doi.org/10.1109/TIAR.2015.7358523 Leardi, R., Boggia, R., & Terrile, M. (1992). Genetic algorithms as a strategy for feature selection. Journal of Chemometrics, 6(5), 267–281. https://doi.org/10.1002/cem.1180060506 Lin, H., Zhao, J., Sun, L., Chen, Q., & Zhou, F. (2011). Freshness measurement of eggs using near infrared (NIR) spectroscopy and multivariate data analysis. Innovative Food Science & Emerging Technologies: IFSET: The Official Scientific Journal of the European Federation of Food Science and Technology, 12(2), 182– 186. Retrieved from http://www.sciencedirect.com/science/article/pii/S1466856411000117 Lin, H., Zhao, J., Sun, L., kun Bi, X., & Cai, J. (2015). Effective Variables Selection in Eggs Freshness Graphically Oriented Local Multivariate Analysis using NIR Spectroscopy. In International Conference on Chemical, Material and Food Engineering (pp. 13–18). Liu, F., & Tang, X. (2015). Fuji apple storage time rapid determination method using Vis/NIR spectroscopy. Bioengineered, 6(3), 166–169. https://doi.org/10.1080/21655979.2015.1038001 Liu, Y., Ying, Y., Ouyang, A., & Li, Y. (2007). Measurement of internal quality in chicken eggs using visible transmittance spectroscopy technology. Food Control, 18(1), 18–22. https://doi.org/10.1016/j.foodcont.2005.07.011 Lorentzen, G., Rotabakk, B. T., Olsen, S. H., Skuland, A. V., & Siikavuopio, S. I. (2016/1). Shelf life of snow crab clusters (Chionoecetes opilio) stored at 0 and 4 °C. Food Control, 59, 454–460. Retrieved from http://www.sciencedirect.com/science/article/pii/S0956713515300517 Martelo-Vidal, M. J., & Vázquez, M. (2015). Application of artificial neural networks coupled to UV–VIS–NIR spectroscopy for the rapid quantification of wine compounds in aqueous mixtures. CyTA - Journal of Food, 13(1), 32–39. https://doi.org/10.1080/19476337.2014.908955 Martens, H., Jensen, S. A., & Geladi, P. (1983). Multivariate linearity transformation for near-infrared reflectance spectrometry. In Proceedings of the Nordic symposium on applied statistics (pp. 205–234). Stokkand Forlag Publishers Stavanger, Norway. Martens, H., & Naes, T. (1992). Multivariate Calibration. Wiley. Retrieved from https://books.google.com.ec/books?id=6lVcUeVDg9IC Mathew, A. O., Olufemi, A. M., & Foluke, A. (2016). Relationship of temperature and length of storage on ph of internal contents of chicken table egg in humid tropics. Biotechnology in Animal. Retrieved from http://www.doiserbia.nb.rs/Article.aspx? id=1450-91561603285M Ma, X., Zhang, Y., & Wang, Y. (2015). Performance evaluation of kernel functions based on grid search for support vector regression. In 2015 IEEE 7th International Conference on Cybernetics and Intelligent Systems (CIS) and IEEE Conference on Robotics, Automation and Mechatronics (RAM) (pp. 283–288). ieeexplore.ieee.org. https://doi.org/10.1109/ICCIS.2015.7274635 Moros, J., Garrigues, S., & Guardia, M. de la. (2010). Vibrational spectroscopy provides a green tool for multi-component analysis. Trends in Analytical Chemistry: TRAC, 29(7), 578–591. Retrieved from http://www.sciencedirect.com/science/article/pii/S0165993610000361 Mucherino, A., Papajorgji, P. J., & Pardalos, P. M. (2009). Data Mining in Agriculture (Vol. 34, pp. 143–160–160). New York, NY: Springer New York. https://doi.org/10.1007/978-0-387-88615-2 Pügner, T., Knobbe, J., & Grüger, H. (2016). Near-Infrared Grating Spectrometer for Mobile Phone Applications. Applied Spectroscopy, 70(5), 734–745. https://doi.org/ 10.1177/0003702816638277 Rateni, G., Dario, P., & Cavallo, F. (2017). Smartphone-Based Food Diagnostic Technologies: A Review. Sensors , 17(6). https://doi.org/10.3390/s17061453 Refaeilzadeh, P., Tang, L., & Liu, H. (2009). Cross-Validation. In L. Liu & M. Tamer Özsu (Eds.), Encyclopedia of Database Systems (pp. 532–538). Springer US. https://doi.org/10.1007/978-0-387-39940-9_565 Rinnan, Å., Berg, F. van D., & Engelsen, S. B. (2009). Review of the most common pre-processing techniques for near-infrared spectra. Trends in Analytical Chemistry: TRAC, 28(10), 1201–1222. https://doi.org/10.1016/j.trac.2009.07.007 Rivero, D., Fernandez-Blanco, E., Dorado, J., & Pazos, A. (2011). Using recurrent ANNs for the detection of epileptic seizures in EEG signals. In 2011 IEEE Congress of Evolutionary Computation (CEC) (pp. 587–592). https://doi.org/10.1109/CEC.2011.5949672 Ruck, D. W., Rogers, S. K., Kabrisky, M., Oxley, M. E., & Suter, B. W. (1990). The multilayer perceptron as an approximation to a Bayes optimal discriminant function. IEEE Transactions on Neural Networks / a Publication of the IEEE Neural Networks Council, 1(4), 296–298. https://doi.org/10.1109/72.80266 Saeys, Y., Inza, I., & Larrañaga, P. (2007). A review of feature selection techniques in bioinformatics. Bioinformatics , 23(19), 2507–2517. https://doi.org/10.1093/bioinformatics/btm344 Savitzky, A., & Golay, M. J. E. (1964). Smoothing and differentiation of data by simplified least squares procedures. Analytical Chemistry, 36(8), 1627–1639. Schulte, H., Brink, G., Gruna, R., Herzog, R., & Gruger, H. (2015). Utilization of spectral signatures of food for daily use. In OCM 2015-Optical Characterization of Materials-conference proceedings (p. 39). KIT Scientific Publishing. Retrieved from https://books.google.es/books? hl=en&lr=&id=nvCFBwAAQBAJ&oi=fnd&pg=PA39&dq=scio+sensor&ots=ZH32R6 zb0W&sig=edTxfa80EEyMI2qtF85V4w2uxyQ Silversides, F. G., & Scott, T. A. (2001). Effect of storage and layer age on quality of eggs from two lines of hens. Poultry Science, 80(8), 1240–1245. Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/11495479 Stadelman, W. J., Newkirk, D., & Newby, L. (1995). Egg science and technology. Retrieved from https://books.google.es/books? hl=en&lr=&id=m20SqIXGKYYC&oi=fnd&pg=PR11&dq=Quality+identification+shell +egg&ots=CHHZkEKroK&sig=HSevsdjr9TIspj2nLllom0ucLek Stark, E. (1996). Near infrared spectroscopy past and future. Near Infrared Spectroscopy The Future Waves, 701–713. Stockwell, D. R. B., & Peterson, A. T. (2002). Effects of sample size on accuracy of species distribution models. Ecological Modelling, 148(1), 1–13. Retrieved from http://www.sciencedirect.com/science/article/pii/S030438000100388X Sun, L., Yuan, L.-M., Cai, J.-R., Lin, H., & Zhao, J.-W. (2015). Egg Freshness on-Line Estimation Using Machine Vision and Dynamic Weighing. Food Analytical Methods, 8(4), 922–928. https://doi.org/10.1007/s12161-014-9944-1 Szymańska, E., Gerretzen, J., Engel, J., Geurts, B., Blanchet, L., & Buydens, L. M. C. (2015). Chemometrics and qualitative analysis have a vibrant relationship. Trends in Analytical Chemistry: TRAC, 69, 34–51. https://doi.org/10.1016/j.trac.2015.02.015 Teye, E., Huang, X.-Y., & Afoakwa, N. (2013). Review on the potential use of near infrared spectroscopy (NIRS) for the measurement of chemical residues in food. American Journal of Food Science and Technology, 1(1), 1–8. Retrieved from http://pubs.sciepub.com/ajfst/1/1/1/ Tukey, J. W. (1949). Comparing individual means in the analysis of variance. Biometrics, 5(2), 99–114. https://doi.org/10.2307/3001913 Viscarra Rossel, R. A. (2008). ParLeS: Software for chemometric analysis of spectroscopic data. Chemometrics and Intelligent Laboratory Systems, 90(1), 72– 83. https://doi.org/10.1016/j.chemolab.2007.06.006 Weesepoel, Y. J. A., & Ruth, S. M. van. (2016). Miniaturized NIRs for age and expiration date prediction of packaged chicken fillets. Retrieved from http://edepot.wur.nl/378295 Wilson, B. K., Kaur, H., Allan, E. L., Lozama, A., & Bell, D. (2017). A New Handheld Device for the Detection of Falsified Medicines: Demonstration on Falsified Artemisinin-Based Therapies from the Field. The American Journal of Tropical Medicine and Hygiene. https://doi.org/10.4269/ajtmh.16-0904 Wold, H. (1985). Partial least squares. Encyclopedia of Statistical Sciences. Retrieved from http://onlinelibrary.wiley.com/doi/10.1002/0471667196.ess1914.pub2/full Wold, S., Sjöström, M., & Eriksson, L. (2001). PLS-regression: a basic tool of chemometrics. Chemometrics and Intelligent Laboratory Systems, 58(2), 109–130. Retrieved from http://www.sciencedirect.com/science/article/pii/S0169743901001551 Xu, L., Zhou, Y.-P., Tang, L.-J., Wu, H.-L., Jiang, J.-H., Shen, G.-L., & Yu, R.-Q. (2008). Ensemble preprocessing of near-infrared (NIR) spectra for multivariate calibration. Analytica Chimica Acta, 616(2), 138–143. https://doi.org/10.1016/j.aca.2008.04.031 Yongwei, W., Wang, J., Zhou, B., & Lu, Q. (2009). Monitoring storage time and quality attribute of egg based on electronic nose. Analytica Chimica Acta, 650(2), 183– 188. https://doi.org/10.1016/j.aca.2009.07.049 Yu, S., Xiao, X., Ding, H., Xu, G., Li, H., & Liu, J. (2017). Weighted partial least squares based on the error and variance of the recovery rate in calibration set. Spectrochimica Acta. Part A, Molecular and Biomolecular Spectroscopy, 183, 138–143. https://doi.org/10.1016/j.saa.2017.04.029 Zhao, J., Lin, H., Chen, Q., Huang, X., Sun, Z., & Zhou, F. (2010). Identification of egg’s freshness using NIR and support vector data description. Journal of Food Engineering, 98(4), 408–414. https://doi.org/10.1016/j.jfoodeng.2010.01.018

Journal

Electrical Engineering and Systems SciencearXiv (Cornell University)

Published: Nov 26, 2020

References