Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Evaluation of predictive models to determine total morbidity outcome of feedlot cattle based on cohort-level feed delivery data during the first 15 days on feed

Evaluation of predictive models to determine total morbidity outcome of feedlot cattle based on... Translational Animal Science, 2022, 6, 1–5 https://doi.org/10.1093/tas/txac121 Advance access publication 29 August 2022 Animal Health and Well Being Evaluation of predictive models to determine total morbidity outcome of feedlot cattle based on cohort-level feed delivery data during the first 15 days on feed † †,1 † ‡ L. Heinen, P . A. Lancaster, B. J. White, and E. Zwiefel Beef Cattle Institute, Department of Clinical Sciences, College of Veterinary Medicine, Kansas State University, Manhattan, KS 66506, USA Machine Learning Global Black Belt Team, Microsoft Corporation, Edina, MN 55424, USA Corresponding author: palancaster@vet.k-state.edu ABSTRACT Changes in feeding behavior and intake have been used to predict the onset of bovine respiratory disease in individual animals but have not been applied to cohort-level data. Correctly identifying high morbidity cohorts of cattle early in the feeding period could facilitate the administration of interventions to improve health and economic outcomes. The study objective was to determine the ability of feed delivery data from the first 15 days of feed to predict total feeding period morbidity. Data consisted of 518 cohorts (10 feedlots, 56,796 animals) of cattle of varying sex, age, arrival weight, and arrival time of year over a 2-year period. Overall cohort-level morbidity was classified into high (≥15% total morbidity) or low categories with 18.5% of cohorts having high morbidity. Five predictive models (advanced perceptron, decision forest, logistic regression, neural network, and boosted decision tree) were created to predict overall morbidity given cattle characteristics at arrival and feeding characteristics from the first 15 days. The dataset was split into training and testing subsets (75% and 25% of original, respectively), stratified by the outcome of interest. Predictive models were generated in Microsoft Azure using the training set and overall predictive performance was evaluated using the testing set. Performance in the testing set (n = 130) was measured based on final accuracy, sensitivity (Sn, the ability to accurately detect high morbidity cohorts), and specificity (Sp, the ability to accurately detect low morbidity cohorts). The decision forest had the highest Sp (97%) with the greatest ability to accurately identify low morbidity lots (103 of 106 identified correctly), but this model had low Sn (33%). The logistic regres - sion and neural network had similar Sn (both 63%) and Sp (69% and 72%, respectively) with the best ability to correctly identify high morbidity cohorts (15 of 24 correctly identified). Predictor variables with the greatest importance in the predictive models included percent change in feed delivery between days and 4-day moving averages. The most frequent variable with a high level of importance among models was the percent change in feed delivered from d 2 to 3 after arrival. In conclusion, feed delivery data during the first 15 days on feed was a significant predictor of total cohort-level morbidity over the entire feeding period with changes in feed delivery providing important information. Key words: feeding patterns, machine learning, predictive analytics INTRODUCTION on feed. The hypothesis was that predictive models can accu- rately predict total feeding period morbidity, and that feed de- Bovine respiratory disease (BRD) is one of the costliest diseases livery patterns during the first 15 days on feed are important in the feedlot industry (Salman et al., 1991). Early identification predictors. The objective was to determine the accuracy, sensi- and prompt treatment of the disease can reduce the costs as- tivity, and specificity of five predictive models using arrival char - sociated with BRD (Booker et al., 2004). Additionally, gastro- acteristics and feed delivery data during the first 15 days on feed intestinal diseases, lameness, and other adverse health events to correctly identify high (≥15%) morbidity cohorts. negatively impact animal performance, but the prevalence of these other diseases is relatively low. Machine learning has been MATERIALS AND METHODS applied to various fields in the agricultural industry to decrease costs and increase outputs. Predictive modeling techniques have Animal Care and Use Committee approval was unnecessary not been heavily studied for use in the beef industry (White et as data were obtained from an existing database of feedlot al., 2018). Feedlots collect large volumes of data daily, including operational data. feed calls, antimicrobials, and other treatments administered, Data Collection and various arrival characteristics. These data and more are po- tential inputs for predictive models that could be used to predict Daily records for 12,657 cohorts of cattle (1,005,320 animals) the health outcome of cohorts of cattle. Therefore, the study ob- were obtained from 10 U.S. feedlots spanning 2018 to 2020. jective was to evaluate the diagnostic ability of predictive models Health event records on an individual animal basis were tied to determine whether a cohort would have a health outcome of to cohort-level data. A cohort was defined as a group of cattle ≥15% total morbidity during the feeding period based on data purchased and managed together but not necessarily housed collected at arrival and feed delivery data for the first 15 days in the same pen for the entirety of the feeding phase. Received May 30, 2022 Accepted August 24, 2022. © The Author(s) 2022. Published by Oxford University Press on behalf of the American Society of Animal Science. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. 2 Heinen et al. Data Transformation 518 cohorts (56,796 animals) with 18.5% of cohorts having high morbidity. Data were transformed into the appropriate format and new Before the model building step occurred, data were split variables generated before use in predictive models. First, in- into separate training and testing sets, stratified by the prev - clusion criteria were applied to the data. Cohorts needed to alence of outcome of interest. The training set represented have complete arrival and feeding data. Feed delivery data 75% of the original dataset while the testing set represented beyond 15 days on feed were not included in the dataset, and the other 25%. Using the Pipeline Designer function in Azure remaining feed delivery data were adjusted for arrival body Machine Learning Studio (Microsoft, 2022), the datasets weight (percentage of arrival body weight) on a dry matter were used to create five predictive models. The models trained basis. Any cohorts with missing data were removed. This and tested in this study were advanced perceptron, neural net- resulted in removal of many rows of data as several cohorts work, boosted decision tree, decision forest, and logistic re- were rearranged early in the feeding period. Cohorts with ex- gression. A previous description has been given by Rojas et treme values for some arrival characteristics or dry matter al. (2022) from our laboratory. Briefly, neural networks and intake were also removed to reduce data entry errors. Cohorts advanced perceptron models can be useful for identifying were removed if the cohort was designated as a hospital pen, patterns in operational data, but are difficult to utilize for the average arrival weight was less than 182  kg or greater describing model structure or importance of predictive than 545 kg, days on feed was less than 0 or greater than 300, variables (Rosenblatt, 1957; Zhang, 2010). Boosted decision the cohort had not yet been closed out, or dry matter intake as trees and decision forests are classification models creating a a percentage of average arrival body weight was greater than series of splits in data based on attributes to minimize entropy 5% or less than 0.1% on any given day within the first 15 in resulting data subsets (Breiman, 2001; Roe et al., 2005). days on feed. After applying these exclusion criteria, 12,139 Logistic regression models are often used for statistical anal- cohorts were removed from the dataset. Secondly, data ysis and these algorithms can be created, then used to esti- wrangling techniques applied in R software (R Core Team, mate a probability of even occurrence in a predictive manner 2021) were used to create a dataset that consisted of one row (Dreiseitl and Ohno-Machado, 2002). The models selected per cohort. for investigation in this study are based on previous work Several additional variables were created using the existing conducted by Rojas et al. (2022) and Amrine et al. (2014), as data to describe feed delivery characteristics such as day- well as the resources available on Azure. to-day changes in feed delivery and rolling averages of feed delivery using 2-to-7-day time spans. Figure 1 outlines the Model Evaluation feeding variables and which days on feed were accounted for in each feeding characteristic variable. Although each variable Finally, test data were used to evaluate model performance. indicated in the figure is noted only once, feeding variables Adjustment of the threshold probability was done manually were calculated for their respective increments throughout for each model to maximize F1 score to balance sensitivity the 15-day feeding period of interest. The outcome of interest and positive predictive value. The metrics used to evaluate the was captured in a variable called total morbidity category. models were accuracy, sensitivity (Sn), specificity (Sp), pos - This variable described the total morbidity (all diseases) of itive and negative predictive values (PPV and NPV, respec- a cohort of cattle during the entire feeding period. It was tively, and area under the receiver operating characteristics expressed as a categorical variable in which high morbidity [ROC] curve [AUC]). These values were calculated using the indicated a total morbidity percentage of greater than or equal confusion matrices produced by each model run in Azure. to 15%. Low morbidity was a total morbidity percentage of Using these metrics, we were able to compare models based less than 15%. Table 1 offers a complete overview of all the on their ability to accurately predict the total feeding period variables that populated the dataset. Following data transfor- morbidity of a cohort. Figure 2 demonstrates the training and mation and variable creation, the final dataset consisted of testing process of the predictive models. Figure 1. Timeline schematic demonstrating the feed delivery data corresponding to various feeding predictor variables. Triangles indicate data on dry matter delivered as percentage of arrival body weight (DMI-BW). Arrows indicate data on percent change in DMI-BW from day to day. Feed intake predictive models 3 Table 1. Complete overview of variables included in the final dataset to train and test the five predictive models Variable category Variable name Description of variable Feeding variables DMI-BW for day 0 through 15 Feed intake measured by DMI-BW for each day starting on the day of arrival (0) to day 15 (16 total measurements) Percent change in DMI-BW Percent change in feed intake measured by DMI-BW be- from one day to the next for tween sequential days (15 total measurements) days 0 through 15 2-day increment rolling averages Rolling averages in 2-day increments of percent change in of percent change in DMI-BW DMI-BW (14 total measurements) 3-day increment rolling averages Rolling averages in 3-day increments of percent change in of percent change in DMI-BW DMI-BW (13 total measurements) 4-day increment rolling averages Rolling averages in 4-day increments of percent change in of percent change in DMI-BW DMI-BW (12 total measurements) 5-day increment rolling averages Rolling averages in 5-day increments of percent change in of percent change in DMI-BW DMI-BW (11 total measurements) 6-day increment rolling averages Rolling averages in 6-day increments of percent change in of percent change in DMI-BW DMI-BW (10 total measurements) 7-day increment rolling averages Rolling averages in 7-day increments of percent change in of percent change in DMI-BW DMI-BW (9 total measurements) Arrival charac- Arrival date Date of arrival for the cohort, format: MM/DD/YYYY teristics Average arrival weight Average weight at arrival of the cohort in pounds Sex Sex of the cohort, could be heifer, steer, mixed Arrival animal count Number of animals in the cohort upon arrival Outcome var- Total morbidity category High (≥ 15%) or low (< 15%) based on morbidity for any iable diagnosis as a percentage of arrival animal count DMI-BW indicates the feed intake on a dry matter basis given as a percentage of the average arrival weight of the cohort. Figure 3. Receiver operating characteristic (ROC) curves for five predictive models trained to predict high (≥15%) morbidity cohorts of feedlot cattle. The five predictive models are Advanced Perceptron, Logistic Regression, Neural Network, Decision Tree, and Decision Forest. Perf. Pred. represents the perfect predictive model. Additionally, the overall accuracy, in addition to the AUC Figure 2. Illustration of data management, and model training and value, support the conclusion that the advanced perceptron evaluation process. model has very poor performance (Table 2). The advanced perceptron model identified all cohorts as high morbidity RESULTS AND DISCUSSION based on the sensitivity of 100.0%. The logistic regression and neural network models had similar overall accuracy, was The ROC curves for the five predictive models are presented in well as similar sensitivity, specificity, positive and negative Figure 3. A line closer to a true positive rate of 1.0 and a false predictive values. The decision tree and decision forest models positive rate of 0.0 (i.e., top left corner) has greater sensitivity had the greatest overall accuracy and specificity indicating (correctly identify true positives—high morbidity cohorts) and good ability to identify low morbidity cohorts but had low specificity (correctly identifying true negatives—low-morbidity sensitivity. cohorts). Logistic regression, neural network, and decision Previous work from our lab (Amrine et al., 2019) re- forest models have similar ROC curves, whereas the decision ported that predictive models with greater specificity tree model appears to be slightly less predictive, and the ad- had lesser sensitivity when using sale barn and arrival vanced perceptron model has very little predictive ability. 4 Heinen et al. Table 2. Model evaluation of five predictive models trained to predict high (≥15%) morbidity cohorts of feedlot cattle Model AUC Acc (%) Sn (%) Sp (%) PPV NPV Advanced perceptron 0.653 18.5 100.0 0.0 0.18 — Logistic regression 0.675 67.7 62.5 68.9 0.31 0.89 Neural network 0.691 70.0 62.5 71.7 0.33 0.89 Decision tree 0.691 78.5 29.2 89.6 0.39 0.85 Decision forest 0.671 85.4 33.3 97.2 0.73 0.87 AUC, area under the receiver operator characteristic (ROC) curve; Acc, overall accuracy; Sn, sensitivity, ability to predict high morbidity cohorts; Sp, specificity, ability to predict low morbidity cohorts; PPV, positive predictive value, probability that predicted high morbidity cohorts are truly high morbidity cohorts; NPV, negative predictive value, probability that predicted low morbidity cohorts are truly low morbidity cohorts. characteristics to predict BRD in the first 14 days on feed. In commercial feedlots, cattle are managed as pens except The “best” model depends upon the goals of the feedlot for individual animal treatment of disease. Tracking indi- manager. If the goal is to accurately identify low morbidity vidual animal feeding behavior is not practical nor cost-effec- cohorts, then a model with the greatest specificity would be tive in a commercial feedlot. However, commercial feedlots deemed the ‘best’ model. Conversely, if the goal is to accu- collect real-time data that can be used to make pen-level man- rately identify high morbidity cohorts, then a model with agement decisions. The results of this study and the previous the greatest sensitivity would be deemed the ‘best’ model. discussion indicate that feeding data can be used to predict Accurate identification of high morbidity cohorts could morbidity in cattle. In our study, we looked at total morbidity, lead to actions that allow changes in risk management, but BRD accounted for 65% of the total treatments. The pre- frequency of disease monitoring, and other interventions. dictive models in this study were somewhat predictive of total However, if the model misclassifies low morbidity cohorts cohort level morbidity indicating that feeding data can be as high morbidity cohorts, then valuable resources would used to predict morbidity in pens of cattle, which could lead be wasted. The decision forest model had a low sensitivity, to improved management of disease in feedlots. However, in only accurately classifying 33% of high morbidity cohorts, the current study, feed delivery data were used as predictors but based on the high PPV, this model had a high prob- in the models, which do not account for feed refusals. Based ability of being correct when it did classify a cohort as on previous data (Jackson, 2016), feed intake decreases prior high morbidity. Thus, the decision forest model would not to clinical signs of BRD and thus in our case, a decrease in the identify all of the high morbidity cohorts, but if potential feed delivered is likely indicative of significant feed refusals interventions were implemented based on this model little and decreased feed intake for the day prior. Although, an resources would be wasted. increase in feed refusals could also be due to several other Several previous studies (Sowell et al., 1999; Buhman et factors such as weather, removal of cattle from the pen, etc., al., 2000; Quimby et al., 2001; Moya et al., 2015; Wolfger which cannot be ascertained from our current dataset. et al., 2015; Jackson et al., 2016) have indicated that feed intake, feeding behavior, and drinking behavior are predictive CONCLUSION of onset of BRD in individual cattle. The onset of BRD can be predicted 4 to 7 days before clinical signs can be observed Feed intake and feeding behavior data are predictive of BRD using feeding and drinking data (Buhman et al., 2000; Moya in individual animals, and feed delivery data are predictive et al., 2015; Wolfger et al., 2015; Jackson et al., 2016). of total morbidity in commercial feedlot pens. Predictive an- Feeding behavior has predicted BRD with an overall accu- alytics is a valuable tool that can be used to convert feedlot racy of 84% to 89% and positive predictive value of 85% to operational data into animal health management decisions. 96% (Quimby et al., 2001). Wolfger et al. (2015) indicated Future research should evaluate the ability of feed delivery to that feed behavior correctly predicted 81% of BRD cases and predict specific diseases (BRD, bloat, etc.), and combine feed 77% of healthy animals 3 days prior to clinical signs; adding delivery data with other data types to improve the prediction feed intake did not improve these predictions. Similarly, Moya of morbidity in feedlot cattle. et al. (2015), using pattern recognition techniques, reported that feeding behavior predicted BRD with good sensitivity Conflict of Interest Statement (58% to 83%) and specificity (67% to 100%) for some of the The authors declare no actual or potential conflicts of inter - models with the best overall accuracy. est. From the Microsoft Azure platform, feature importance could be obtained for the decision tree, decision forest, and logistic regression models. In each of these models, feeding data were LITERATURE CITED among the top five predictors, and the percent change in feed Amrine, D. E., J. G. McLellan, B. J. White, R. L. Larson, D. G. Renter, delivery from days on feed (DOF) 2 to 3 was one of the most im- and M. Sanderson. 2019. Evaluation of three classification models portant predictors. Other predictors were percent change in feed to predict risk class of cattle cohorts developing bovine respira- delivery from DOF 4 to 5 and from DOF 8 to 9, and rolling av- tory disease within the first 14 days on feed using on-arrival and/ erage in percent change in feed delivery from DOF 7 to 10. These or pre-arrival information. Comp Electron Agric. 156:439–446. results indicate that alterations in feeding patterns very early in doi:10.1016/j.compag.2018.11.035. the feeding period are predictive of total morbidity, which may Amrine, D. E., B. J. White, and R. L. Larson. 2014. Comparison of clas- allow interventions to mitigate disease progression in the cohort. sification algorithms to predict outcomes of feedlot cattle identified Feed intake predictive models 5 and treated for bovine respiratory disease. Comp Electron Agric. R Core Team. 2021. R: A language and environment for statistical com- 105:9–19. doi:10.1016/j.compag.2014.04.009. puting. Vienna (Austria): R Foundation for Statistical Computing. Booker, C. W., G. H. Loneragan, P. T. Guichon, G. K. Jim, O. C. Available from https://www.R-project.org/ Schunicht, B. K. Wildman, T. J. Pittman, R. K. Fenton, E. D. Janzen, Roe, B. P., H. -J. Yang, J. Zhu, Y. Liu, I. Stancu, and G. McGregor. 2005. and T. Perrett. 2004. Practical application of epidemiology in Boosted decision trees as an alternative to artificial neural networks veterinary herd health/production medicine. In: American Associ- for particle identification. Nucl. Instrum. Methods Phys. Res., Sect. ation of Bovine Practitioners Proceedings of the Annual Confer- A 543:577–584. doi:10.1016/j.nima.2004.12.018. ence; September 23–25; 2004; Fort Worth, TX; p. 59–62. (vol. 37). Rojas, H. A., B. J. White, D. E. Amrine, and R. L. Larson. 2022. Predicting doi:10.21423/aabppro20044902. bovine respiratory disease risk in feedlot cattle in the first 45 days Breiman, L. 2001. Random forests. Mach. Learn. 45:5–32. post arrival. Pathogens. 11:442. doi:10.3390/pathogens11040442. doi:10.1023/A:1010933404324. Rosenblatt, F. 1957. The perceptron: a perceiving and recognizing auto- Buhman, M. J., L. J. Perino, M. L. Galyean, T. E. Wittum, T. H. mation. Buffalo (NY): Cornell Aeronautical Laboratory, Inc. Avail- Montgomery, and R. S. Swingle. 2000. Association between able from chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/ changes in eating and drinking behaviors and respiratory tract dis- https://blogs.umass.edu/brain-wars/files/2016/03/rosenblatt-1957. ease in newly arrived calves at a feedlot. Am. J. Vet. Res. 61:1163– pdf 1168. doi:10.2460/ajvr.2000.61.1163. Salman, M. D., M. E. King, K. G. Odde, and R. G. Mortimer. 1991. Dreiseitl, S., and L. Ohno-Machado. 2002. Logistic regression and arti- Costs of veterinary services and vaccines/drugs used for preven- ficial neural network classification models: a methodology review. J. tion and treatment of diseases in 86 Colorado cow-calf operations Biomed. Inform. 35:352–359. doi:10.1016/s1532-0464(03)00034- participating in the National Animal Health Monitoring System 0. (1986–1988). J. Am. Vet. Med. Assoc. 198:1739–1744. Jackson, K. S., G. E. Carstens, L. O. Tedeschi, and W. E. Pinchak. 2016. Sowell, B. F., M. E. Branine, J. G. P. Bowman, M. E. Hubbert, H. E. Changes in feeding behavior patterns and dry matter intake before Sherwood, and W. Quimby. 1999. Feeding and watering behavior clinical symptoms associated with bovine respiratory disease in grow- of healthy and morbid steers in a commercial feedlot. J. Anim. Sci. ing bulls. J. Anim. Sci. 94:1644–1652. doi:10.2527/jas.2015-9993. 77:1105–1112. doi:10.2527/1999.7751105x. Microsoft. 2022. Azure machine learning—ML as a service. Microsoft White, B. J., D. E. Amrine, and R. L. Larson. 2018. Big data analytics Azure. Available from https://azure.microsoft.com/en-us/services/ and precision animal agriculture symposium: data to decisions. J. machine-learning/ Anim. Sci. 96:1531–1539. doi:10.1093/jas/skx065. Moya, D., R. Silasi, T. A. McAllister, B. Genswein, T. Crowe, S. Marti, Wolfger, B., K. S. Schwartzkopf-Genswein, H. W. Barkema, E. A. Pajor, and K. S. Schwartzkopf-Genswein. 2015. Use of pattern recogni- M. Levy, and K. Orsel. 2015. Feeding behavior as an early predictor tion techniques for early detection of morbidity in receiving feedlot of bovine respiratory disease in North American feedlot systems. J. cattle. J. Anim. Sci. 93:3623–3638. doi:10.2527/jas.2015-8907. Anim. Sci. 93:377–385. doi:10.2527/jas.2013-8030. Quimby, W. F., B. F. Sowell, J. G. P. Bowman, M. E. Branine, M. E. Hubbert, Zhang, G. P. 2010. Neural networks for data mining. In: Maimon, O. and and H. W. Sherwood. 2001. Application of feeding behaviour to pre- L. Rokach, editors. Data mining and knowledge discovery hand- dict morbidity of newly received calves in a commercial feedlot. Can. book. Boston (MA): Springer US; p. 419–444. doi:10.1007/978-0- J. Anim. Sci. 81:315–320. doi:10.4141/A00-098. 387-09823-4_21 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Translational Animal Science Oxford University Press

Evaluation of predictive models to determine total morbidity outcome of feedlot cattle based on cohort-level feed delivery data during the first 15 days on feed

Translational Animal Science , Volume 6 (3): 1 – Aug 29, 2022

Loading next page...
 
/lp/oxford-university-press/evaluation-of-predictive-models-to-determine-total-morbidity-outcome-0NV7ECxhin
Copyright
© The Author(s) 2022. Published by Oxford University Press on behalf of the American Society of Animal Science.
eISSN
2573-2102
DOI
10.1093/tas/txac121
Publisher site
See Article on Publisher Site

Abstract

Translational Animal Science, 2022, 6, 1–5 https://doi.org/10.1093/tas/txac121 Advance access publication 29 August 2022 Animal Health and Well Being Evaluation of predictive models to determine total morbidity outcome of feedlot cattle based on cohort-level feed delivery data during the first 15 days on feed † †,1 † ‡ L. Heinen, P . A. Lancaster, B. J. White, and E. Zwiefel Beef Cattle Institute, Department of Clinical Sciences, College of Veterinary Medicine, Kansas State University, Manhattan, KS 66506, USA Machine Learning Global Black Belt Team, Microsoft Corporation, Edina, MN 55424, USA Corresponding author: palancaster@vet.k-state.edu ABSTRACT Changes in feeding behavior and intake have been used to predict the onset of bovine respiratory disease in individual animals but have not been applied to cohort-level data. Correctly identifying high morbidity cohorts of cattle early in the feeding period could facilitate the administration of interventions to improve health and economic outcomes. The study objective was to determine the ability of feed delivery data from the first 15 days of feed to predict total feeding period morbidity. Data consisted of 518 cohorts (10 feedlots, 56,796 animals) of cattle of varying sex, age, arrival weight, and arrival time of year over a 2-year period. Overall cohort-level morbidity was classified into high (≥15% total morbidity) or low categories with 18.5% of cohorts having high morbidity. Five predictive models (advanced perceptron, decision forest, logistic regression, neural network, and boosted decision tree) were created to predict overall morbidity given cattle characteristics at arrival and feeding characteristics from the first 15 days. The dataset was split into training and testing subsets (75% and 25% of original, respectively), stratified by the outcome of interest. Predictive models were generated in Microsoft Azure using the training set and overall predictive performance was evaluated using the testing set. Performance in the testing set (n = 130) was measured based on final accuracy, sensitivity (Sn, the ability to accurately detect high morbidity cohorts), and specificity (Sp, the ability to accurately detect low morbidity cohorts). The decision forest had the highest Sp (97%) with the greatest ability to accurately identify low morbidity lots (103 of 106 identified correctly), but this model had low Sn (33%). The logistic regres - sion and neural network had similar Sn (both 63%) and Sp (69% and 72%, respectively) with the best ability to correctly identify high morbidity cohorts (15 of 24 correctly identified). Predictor variables with the greatest importance in the predictive models included percent change in feed delivery between days and 4-day moving averages. The most frequent variable with a high level of importance among models was the percent change in feed delivered from d 2 to 3 after arrival. In conclusion, feed delivery data during the first 15 days on feed was a significant predictor of total cohort-level morbidity over the entire feeding period with changes in feed delivery providing important information. Key words: feeding patterns, machine learning, predictive analytics INTRODUCTION on feed. The hypothesis was that predictive models can accu- rately predict total feeding period morbidity, and that feed de- Bovine respiratory disease (BRD) is one of the costliest diseases livery patterns during the first 15 days on feed are important in the feedlot industry (Salman et al., 1991). Early identification predictors. The objective was to determine the accuracy, sensi- and prompt treatment of the disease can reduce the costs as- tivity, and specificity of five predictive models using arrival char - sociated with BRD (Booker et al., 2004). Additionally, gastro- acteristics and feed delivery data during the first 15 days on feed intestinal diseases, lameness, and other adverse health events to correctly identify high (≥15%) morbidity cohorts. negatively impact animal performance, but the prevalence of these other diseases is relatively low. Machine learning has been MATERIALS AND METHODS applied to various fields in the agricultural industry to decrease costs and increase outputs. Predictive modeling techniques have Animal Care and Use Committee approval was unnecessary not been heavily studied for use in the beef industry (White et as data were obtained from an existing database of feedlot al., 2018). Feedlots collect large volumes of data daily, including operational data. feed calls, antimicrobials, and other treatments administered, Data Collection and various arrival characteristics. These data and more are po- tential inputs for predictive models that could be used to predict Daily records for 12,657 cohorts of cattle (1,005,320 animals) the health outcome of cohorts of cattle. Therefore, the study ob- were obtained from 10 U.S. feedlots spanning 2018 to 2020. jective was to evaluate the diagnostic ability of predictive models Health event records on an individual animal basis were tied to determine whether a cohort would have a health outcome of to cohort-level data. A cohort was defined as a group of cattle ≥15% total morbidity during the feeding period based on data purchased and managed together but not necessarily housed collected at arrival and feed delivery data for the first 15 days in the same pen for the entirety of the feeding phase. Received May 30, 2022 Accepted August 24, 2022. © The Author(s) 2022. Published by Oxford University Press on behalf of the American Society of Animal Science. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. 2 Heinen et al. Data Transformation 518 cohorts (56,796 animals) with 18.5% of cohorts having high morbidity. Data were transformed into the appropriate format and new Before the model building step occurred, data were split variables generated before use in predictive models. First, in- into separate training and testing sets, stratified by the prev - clusion criteria were applied to the data. Cohorts needed to alence of outcome of interest. The training set represented have complete arrival and feeding data. Feed delivery data 75% of the original dataset while the testing set represented beyond 15 days on feed were not included in the dataset, and the other 25%. Using the Pipeline Designer function in Azure remaining feed delivery data were adjusted for arrival body Machine Learning Studio (Microsoft, 2022), the datasets weight (percentage of arrival body weight) on a dry matter were used to create five predictive models. The models trained basis. Any cohorts with missing data were removed. This and tested in this study were advanced perceptron, neural net- resulted in removal of many rows of data as several cohorts work, boosted decision tree, decision forest, and logistic re- were rearranged early in the feeding period. Cohorts with ex- gression. A previous description has been given by Rojas et treme values for some arrival characteristics or dry matter al. (2022) from our laboratory. Briefly, neural networks and intake were also removed to reduce data entry errors. Cohorts advanced perceptron models can be useful for identifying were removed if the cohort was designated as a hospital pen, patterns in operational data, but are difficult to utilize for the average arrival weight was less than 182  kg or greater describing model structure or importance of predictive than 545 kg, days on feed was less than 0 or greater than 300, variables (Rosenblatt, 1957; Zhang, 2010). Boosted decision the cohort had not yet been closed out, or dry matter intake as trees and decision forests are classification models creating a a percentage of average arrival body weight was greater than series of splits in data based on attributes to minimize entropy 5% or less than 0.1% on any given day within the first 15 in resulting data subsets (Breiman, 2001; Roe et al., 2005). days on feed. After applying these exclusion criteria, 12,139 Logistic regression models are often used for statistical anal- cohorts were removed from the dataset. Secondly, data ysis and these algorithms can be created, then used to esti- wrangling techniques applied in R software (R Core Team, mate a probability of even occurrence in a predictive manner 2021) were used to create a dataset that consisted of one row (Dreiseitl and Ohno-Machado, 2002). The models selected per cohort. for investigation in this study are based on previous work Several additional variables were created using the existing conducted by Rojas et al. (2022) and Amrine et al. (2014), as data to describe feed delivery characteristics such as day- well as the resources available on Azure. to-day changes in feed delivery and rolling averages of feed delivery using 2-to-7-day time spans. Figure 1 outlines the Model Evaluation feeding variables and which days on feed were accounted for in each feeding characteristic variable. Although each variable Finally, test data were used to evaluate model performance. indicated in the figure is noted only once, feeding variables Adjustment of the threshold probability was done manually were calculated for their respective increments throughout for each model to maximize F1 score to balance sensitivity the 15-day feeding period of interest. The outcome of interest and positive predictive value. The metrics used to evaluate the was captured in a variable called total morbidity category. models were accuracy, sensitivity (Sn), specificity (Sp), pos - This variable described the total morbidity (all diseases) of itive and negative predictive values (PPV and NPV, respec- a cohort of cattle during the entire feeding period. It was tively, and area under the receiver operating characteristics expressed as a categorical variable in which high morbidity [ROC] curve [AUC]). These values were calculated using the indicated a total morbidity percentage of greater than or equal confusion matrices produced by each model run in Azure. to 15%. Low morbidity was a total morbidity percentage of Using these metrics, we were able to compare models based less than 15%. Table 1 offers a complete overview of all the on their ability to accurately predict the total feeding period variables that populated the dataset. Following data transfor- morbidity of a cohort. Figure 2 demonstrates the training and mation and variable creation, the final dataset consisted of testing process of the predictive models. Figure 1. Timeline schematic demonstrating the feed delivery data corresponding to various feeding predictor variables. Triangles indicate data on dry matter delivered as percentage of arrival body weight (DMI-BW). Arrows indicate data on percent change in DMI-BW from day to day. Feed intake predictive models 3 Table 1. Complete overview of variables included in the final dataset to train and test the five predictive models Variable category Variable name Description of variable Feeding variables DMI-BW for day 0 through 15 Feed intake measured by DMI-BW for each day starting on the day of arrival (0) to day 15 (16 total measurements) Percent change in DMI-BW Percent change in feed intake measured by DMI-BW be- from one day to the next for tween sequential days (15 total measurements) days 0 through 15 2-day increment rolling averages Rolling averages in 2-day increments of percent change in of percent change in DMI-BW DMI-BW (14 total measurements) 3-day increment rolling averages Rolling averages in 3-day increments of percent change in of percent change in DMI-BW DMI-BW (13 total measurements) 4-day increment rolling averages Rolling averages in 4-day increments of percent change in of percent change in DMI-BW DMI-BW (12 total measurements) 5-day increment rolling averages Rolling averages in 5-day increments of percent change in of percent change in DMI-BW DMI-BW (11 total measurements) 6-day increment rolling averages Rolling averages in 6-day increments of percent change in of percent change in DMI-BW DMI-BW (10 total measurements) 7-day increment rolling averages Rolling averages in 7-day increments of percent change in of percent change in DMI-BW DMI-BW (9 total measurements) Arrival charac- Arrival date Date of arrival for the cohort, format: MM/DD/YYYY teristics Average arrival weight Average weight at arrival of the cohort in pounds Sex Sex of the cohort, could be heifer, steer, mixed Arrival animal count Number of animals in the cohort upon arrival Outcome var- Total morbidity category High (≥ 15%) or low (< 15%) based on morbidity for any iable diagnosis as a percentage of arrival animal count DMI-BW indicates the feed intake on a dry matter basis given as a percentage of the average arrival weight of the cohort. Figure 3. Receiver operating characteristic (ROC) curves for five predictive models trained to predict high (≥15%) morbidity cohorts of feedlot cattle. The five predictive models are Advanced Perceptron, Logistic Regression, Neural Network, Decision Tree, and Decision Forest. Perf. Pred. represents the perfect predictive model. Additionally, the overall accuracy, in addition to the AUC Figure 2. Illustration of data management, and model training and value, support the conclusion that the advanced perceptron evaluation process. model has very poor performance (Table 2). The advanced perceptron model identified all cohorts as high morbidity RESULTS AND DISCUSSION based on the sensitivity of 100.0%. The logistic regression and neural network models had similar overall accuracy, was The ROC curves for the five predictive models are presented in well as similar sensitivity, specificity, positive and negative Figure 3. A line closer to a true positive rate of 1.0 and a false predictive values. The decision tree and decision forest models positive rate of 0.0 (i.e., top left corner) has greater sensitivity had the greatest overall accuracy and specificity indicating (correctly identify true positives—high morbidity cohorts) and good ability to identify low morbidity cohorts but had low specificity (correctly identifying true negatives—low-morbidity sensitivity. cohorts). Logistic regression, neural network, and decision Previous work from our lab (Amrine et al., 2019) re- forest models have similar ROC curves, whereas the decision ported that predictive models with greater specificity tree model appears to be slightly less predictive, and the ad- had lesser sensitivity when using sale barn and arrival vanced perceptron model has very little predictive ability. 4 Heinen et al. Table 2. Model evaluation of five predictive models trained to predict high (≥15%) morbidity cohorts of feedlot cattle Model AUC Acc (%) Sn (%) Sp (%) PPV NPV Advanced perceptron 0.653 18.5 100.0 0.0 0.18 — Logistic regression 0.675 67.7 62.5 68.9 0.31 0.89 Neural network 0.691 70.0 62.5 71.7 0.33 0.89 Decision tree 0.691 78.5 29.2 89.6 0.39 0.85 Decision forest 0.671 85.4 33.3 97.2 0.73 0.87 AUC, area under the receiver operator characteristic (ROC) curve; Acc, overall accuracy; Sn, sensitivity, ability to predict high morbidity cohorts; Sp, specificity, ability to predict low morbidity cohorts; PPV, positive predictive value, probability that predicted high morbidity cohorts are truly high morbidity cohorts; NPV, negative predictive value, probability that predicted low morbidity cohorts are truly low morbidity cohorts. characteristics to predict BRD in the first 14 days on feed. In commercial feedlots, cattle are managed as pens except The “best” model depends upon the goals of the feedlot for individual animal treatment of disease. Tracking indi- manager. If the goal is to accurately identify low morbidity vidual animal feeding behavior is not practical nor cost-effec- cohorts, then a model with the greatest specificity would be tive in a commercial feedlot. However, commercial feedlots deemed the ‘best’ model. Conversely, if the goal is to accu- collect real-time data that can be used to make pen-level man- rately identify high morbidity cohorts, then a model with agement decisions. The results of this study and the previous the greatest sensitivity would be deemed the ‘best’ model. discussion indicate that feeding data can be used to predict Accurate identification of high morbidity cohorts could morbidity in cattle. In our study, we looked at total morbidity, lead to actions that allow changes in risk management, but BRD accounted for 65% of the total treatments. The pre- frequency of disease monitoring, and other interventions. dictive models in this study were somewhat predictive of total However, if the model misclassifies low morbidity cohorts cohort level morbidity indicating that feeding data can be as high morbidity cohorts, then valuable resources would used to predict morbidity in pens of cattle, which could lead be wasted. The decision forest model had a low sensitivity, to improved management of disease in feedlots. However, in only accurately classifying 33% of high morbidity cohorts, the current study, feed delivery data were used as predictors but based on the high PPV, this model had a high prob- in the models, which do not account for feed refusals. Based ability of being correct when it did classify a cohort as on previous data (Jackson, 2016), feed intake decreases prior high morbidity. Thus, the decision forest model would not to clinical signs of BRD and thus in our case, a decrease in the identify all of the high morbidity cohorts, but if potential feed delivered is likely indicative of significant feed refusals interventions were implemented based on this model little and decreased feed intake for the day prior. Although, an resources would be wasted. increase in feed refusals could also be due to several other Several previous studies (Sowell et al., 1999; Buhman et factors such as weather, removal of cattle from the pen, etc., al., 2000; Quimby et al., 2001; Moya et al., 2015; Wolfger which cannot be ascertained from our current dataset. et al., 2015; Jackson et al., 2016) have indicated that feed intake, feeding behavior, and drinking behavior are predictive CONCLUSION of onset of BRD in individual cattle. The onset of BRD can be predicted 4 to 7 days before clinical signs can be observed Feed intake and feeding behavior data are predictive of BRD using feeding and drinking data (Buhman et al., 2000; Moya in individual animals, and feed delivery data are predictive et al., 2015; Wolfger et al., 2015; Jackson et al., 2016). of total morbidity in commercial feedlot pens. Predictive an- Feeding behavior has predicted BRD with an overall accu- alytics is a valuable tool that can be used to convert feedlot racy of 84% to 89% and positive predictive value of 85% to operational data into animal health management decisions. 96% (Quimby et al., 2001). Wolfger et al. (2015) indicated Future research should evaluate the ability of feed delivery to that feed behavior correctly predicted 81% of BRD cases and predict specific diseases (BRD, bloat, etc.), and combine feed 77% of healthy animals 3 days prior to clinical signs; adding delivery data with other data types to improve the prediction feed intake did not improve these predictions. Similarly, Moya of morbidity in feedlot cattle. et al. (2015), using pattern recognition techniques, reported that feeding behavior predicted BRD with good sensitivity Conflict of Interest Statement (58% to 83%) and specificity (67% to 100%) for some of the The authors declare no actual or potential conflicts of inter - models with the best overall accuracy. est. From the Microsoft Azure platform, feature importance could be obtained for the decision tree, decision forest, and logistic regression models. In each of these models, feeding data were LITERATURE CITED among the top five predictors, and the percent change in feed Amrine, D. E., J. G. McLellan, B. J. White, R. L. Larson, D. G. Renter, delivery from days on feed (DOF) 2 to 3 was one of the most im- and M. Sanderson. 2019. Evaluation of three classification models portant predictors. Other predictors were percent change in feed to predict risk class of cattle cohorts developing bovine respira- delivery from DOF 4 to 5 and from DOF 8 to 9, and rolling av- tory disease within the first 14 days on feed using on-arrival and/ erage in percent change in feed delivery from DOF 7 to 10. These or pre-arrival information. Comp Electron Agric. 156:439–446. results indicate that alterations in feeding patterns very early in doi:10.1016/j.compag.2018.11.035. the feeding period are predictive of total morbidity, which may Amrine, D. E., B. J. White, and R. L. Larson. 2014. Comparison of clas- allow interventions to mitigate disease progression in the cohort. sification algorithms to predict outcomes of feedlot cattle identified Feed intake predictive models 5 and treated for bovine respiratory disease. Comp Electron Agric. R Core Team. 2021. R: A language and environment for statistical com- 105:9–19. doi:10.1016/j.compag.2014.04.009. puting. Vienna (Austria): R Foundation for Statistical Computing. Booker, C. W., G. H. Loneragan, P. T. Guichon, G. K. Jim, O. C. Available from https://www.R-project.org/ Schunicht, B. K. Wildman, T. J. Pittman, R. K. Fenton, E. D. Janzen, Roe, B. P., H. -J. Yang, J. Zhu, Y. Liu, I. Stancu, and G. McGregor. 2005. and T. Perrett. 2004. Practical application of epidemiology in Boosted decision trees as an alternative to artificial neural networks veterinary herd health/production medicine. In: American Associ- for particle identification. Nucl. Instrum. Methods Phys. Res., Sect. ation of Bovine Practitioners Proceedings of the Annual Confer- A 543:577–584. doi:10.1016/j.nima.2004.12.018. ence; September 23–25; 2004; Fort Worth, TX; p. 59–62. (vol. 37). Rojas, H. A., B. J. White, D. E. Amrine, and R. L. Larson. 2022. Predicting doi:10.21423/aabppro20044902. bovine respiratory disease risk in feedlot cattle in the first 45 days Breiman, L. 2001. Random forests. Mach. Learn. 45:5–32. post arrival. Pathogens. 11:442. doi:10.3390/pathogens11040442. doi:10.1023/A:1010933404324. Rosenblatt, F. 1957. The perceptron: a perceiving and recognizing auto- Buhman, M. J., L. J. Perino, M. L. Galyean, T. E. Wittum, T. H. mation. Buffalo (NY): Cornell Aeronautical Laboratory, Inc. Avail- Montgomery, and R. S. Swingle. 2000. Association between able from chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/ changes in eating and drinking behaviors and respiratory tract dis- https://blogs.umass.edu/brain-wars/files/2016/03/rosenblatt-1957. ease in newly arrived calves at a feedlot. Am. J. Vet. Res. 61:1163– pdf 1168. doi:10.2460/ajvr.2000.61.1163. Salman, M. D., M. E. King, K. G. Odde, and R. G. Mortimer. 1991. Dreiseitl, S., and L. Ohno-Machado. 2002. Logistic regression and arti- Costs of veterinary services and vaccines/drugs used for preven- ficial neural network classification models: a methodology review. J. tion and treatment of diseases in 86 Colorado cow-calf operations Biomed. Inform. 35:352–359. doi:10.1016/s1532-0464(03)00034- participating in the National Animal Health Monitoring System 0. (1986–1988). J. Am. Vet. Med. Assoc. 198:1739–1744. Jackson, K. S., G. E. Carstens, L. O. Tedeschi, and W. E. Pinchak. 2016. Sowell, B. F., M. E. Branine, J. G. P. Bowman, M. E. Hubbert, H. E. Changes in feeding behavior patterns and dry matter intake before Sherwood, and W. Quimby. 1999. Feeding and watering behavior clinical symptoms associated with bovine respiratory disease in grow- of healthy and morbid steers in a commercial feedlot. J. Anim. Sci. ing bulls. J. Anim. Sci. 94:1644–1652. doi:10.2527/jas.2015-9993. 77:1105–1112. doi:10.2527/1999.7751105x. Microsoft. 2022. Azure machine learning—ML as a service. Microsoft White, B. J., D. E. Amrine, and R. L. Larson. 2018. Big data analytics Azure. Available from https://azure.microsoft.com/en-us/services/ and precision animal agriculture symposium: data to decisions. J. machine-learning/ Anim. Sci. 96:1531–1539. doi:10.1093/jas/skx065. Moya, D., R. Silasi, T. A. McAllister, B. Genswein, T. Crowe, S. Marti, Wolfger, B., K. S. Schwartzkopf-Genswein, H. W. Barkema, E. A. Pajor, and K. S. Schwartzkopf-Genswein. 2015. Use of pattern recogni- M. Levy, and K. Orsel. 2015. Feeding behavior as an early predictor tion techniques for early detection of morbidity in receiving feedlot of bovine respiratory disease in North American feedlot systems. J. cattle. J. Anim. Sci. 93:3623–3638. doi:10.2527/jas.2015-8907. Anim. Sci. 93:377–385. doi:10.2527/jas.2013-8030. Quimby, W. F., B. F. Sowell, J. G. P. Bowman, M. E. Branine, M. E. Hubbert, Zhang, G. P. 2010. Neural networks for data mining. In: Maimon, O. and and H. W. Sherwood. 2001. Application of feeding behaviour to pre- L. Rokach, editors. Data mining and knowledge discovery hand- dict morbidity of newly received calves in a commercial feedlot. Can. book. Boston (MA): Springer US; p. 419–444. doi:10.1007/978-0- J. Anim. Sci. 81:315–320. doi:10.4141/A00-098. 387-09823-4_21

Journal

Translational Animal ScienceOxford University Press

Published: Aug 29, 2022

Keywords: feeding patterns; machine learning; predictive analytics

References