Access the full text.
Sign up today, get DeepDyve free for 14 days.
Purpose – An individual’s driving style signiﬁcantly affects overall trafﬁc safety. However, driving style is difﬁcult to identify due to temporal and spatial differences and scene heterogeneity of driving behavior data. As such, the study of real-time driving-style identiﬁcation methods is of great signiﬁcance for formulating personalized driving strategies, improving trafﬁc safety and reducing fuel consumption. This study aims to establish a driving style recognition framework based on longitudinal driving operation conditions (DOCs) using a machine learning model and natural driving data collected by a vehicle equipped with an advanced driving assistance system (ADAS). Design/methodology/approach – Speciﬁcally, a driving style recognition framework based on longitudinal DOCs was established. To train the model, a real-world driving experiment was conducted. First, the driving styles of 44 drivers were preliminarily identiﬁed through natural driving data and video data; drivers were categorized through a subjective evaluation as conservative, moderate or aggressive. Then, based on the ADAS driving data, a criterion for extracting longitudinal DOCs was developed. Third, taking the ADAS data from 47 Kms of the two test expressways as the research object, six DOCs were calibrated and the characteristic data sets of the different DOCs were extracted and constructed. Finally, four machine learning classiﬁcation (MLC) models were used to classify and predict driving style based on the natural driving data. Findings – The results showed that six longitudinal DOCs were calibrated according to the proposed calibration criterion. Cautious drivers undertook the largest proportion of the free cruise condition (FCC), while aggressive drivers primarily undertook the FCC, following steady condition and relative approximation condition. Compared with cautious and moderate drivers, aggressive drivers adopted a smaller time headway (THW) and distance headway (DHW). THW, time-to-collision (TTC) and DHW showed highly signiﬁcant differences in driving style identiﬁcation, while longitudinal acceleration (LA) showed no signiﬁcant difference in driving style identiﬁcation. Speed and TTC showed no signiﬁcant difference between moderate and aggressive drivers. In consideration of the cross-validation results and model prediction results, the overall hierarchical prediction performance ranking of the four studied machine learning models under the current sample data set was extreme gradient boosting > multi-layer perceptron > logistic regression > support vector machine. Originality/value – The contribution of this research is to propose a criterion and solution for using longitudinal driving behavior data to label longitudinal DOCs and rapidly identify driving styles based on those DOCs and MLC models. This study provides a reference for real-time online driving style identiﬁcation in vehicles equipped with onboard data acquisition equipment, such as ADAS. Keywords Machine learning, Advanced driver assistant systems, Driver behaviors and assistance, Sensor data processing Paper type Research paper 1. Introduction © Nengchao Lyu, Yugang Wang, Chaozhong Wu, Lingfeng Peng and Alieu Freddie Thomas. Published in Journal of Intelligent and Connected Driving style can be deﬁned as an individual’s habitual manner Vehicles. Published by Emerald Publishing Limited. This article is of driving (Elander et al.,1993; Lajunen and Özkan, 2011; published under the Creative Commons Attribution (CC BY 4.0) licence. Sagberg et al.,2015) (i.e. a person’s preference of velocity Anyone may reproduce, distribute, translate and create derivative works of distribution), which is formed over time as that person this article (for both commercial and non-commercial purposes), subject to accumulates driving experience (Suzdaleva and Nagy., 2018). full attribution to the original publication and authors. The full terms of this licence maybe seen at http://creativecommons.org/licences/by/4.0/ legalcode The current issue and full text archive of this journal is available on Emerald This research was funded by the National Nature Science Foundation Insight at: https://www.emerald.com/insight/2399-9802.htm of China (No.52072290), Hubei Province Science Fund for Distinguished Young Scholars (No.2020CFA081) and the Fundamental Research Funds for the Central Universities (No.191044003, No. 2020-YB-028). Journal of Intelligent and Connected Vehicles Received 5 July 2021 5/1 (2022) 17–35 Revised 2 October 2021 Emerald Publishing Limited [ISSN 2399-9802] [DOI 10.1108/JICV-07-2021-0008] Accepted 5 November 2021 17 Naturalistic driving data Journal of Intelligent and Connected Vehicles Nengchao Lyu et al. Volume 5 · Number 1 · 2022 · 17–35 Studies also explicitly describe the importance of acceleration 2. Literature review behavior as a key indicator of driving style because individuals In recent years, to discover and present driving style have different preferences for speed (Müller et al.,2013; Reiser, information in a scientiﬁc method, many models have been 2008). To differentiate between driving skill and driving style developed that assess driving style from different aspects. Since (Elander et al., 1993; Taubman-Ben-Ari et al.,2004), “skill” is its publication, the multidimensional driving style inventory deﬁned as the driver’s ability to maintain control of the vehicle (MDSI) (Taubman-Ben-Ari et al., 2004) has been the subject and adapt to complex trafﬁc conditions, and driving skill is of research around the world. It deﬁnes an individual’s driving expected to improve with practice or training. On the other style as a driving-speciﬁc factor that can contribute to both hand, “style” is deﬁned as the manner in which a driver chooses crashes and trafﬁc violations directly and in terms of more to drive or habitually drives (i.e. his/her choice of driving speed general socio-demographic and personal factors. The MDSI and headway). can increase driver awareness of his/her own and others’ driving A number of studies have shown that driving style has a styles and be used to identify baseline driving styles prior to the signiﬁcant impact on trafﬁc safety (Evans, 1996), vehicle implementation of road safety interventions as well as inform dynamics control (Plöchl et al.,2007) and the economic and post-intervention assessments (Taubman-Ben-Ari et al.,2016). ecological efﬁciency of driving (Mensing et al.,2014). However, To determine whether the MDSI is consistent with actual driving style information cannot be directly measured nor driving behavior, Van Huysduynen et al. (2018) conducted a detected. Existing studies have categorized driving behavior into simulation experiment with 88 participants. The objective data driving maneuvers (e.g. following, hard braking, lane changing, retrieved from the simulator was compared with the scores etc.) (Bellem et al.,2016). These studies estimate driving style obtained from questionnaire data. The analysis showed that in terms of the durations or frequencies of individual maneuver there is a moderate correlation between self-reported driving states. However, driving style is easily affected by and ﬂuctuates style and driving behavior in the simulator. This suggests that with the road trafﬁc environment. Additionally, relatively static MDSI can be used as a diagnostic tool to identify typical and singular driving data does not fully reﬂect the true driving driving behaviors of individuals in driving simulators. Ishibashi style. On the other hand, one of the main factors affecting the et al. (2007) developed a driving style questionnaire (DSQ) to identiﬁcation of driving style is the real-time ability and extract key indicators from self-reports and calibrate different effectiveness of data acquisition. Therefore, how to effectively driving styles. However, the DSQ focuses more on preferences use driving data to comprehensively and quantitatively analyze for driving behavior, which is limited by sample characteristics driving style has become a new ﬁeld to be further explored (Qi and structural validity (van Huysduynen et al., 2018). In other et al.,2019). words, the DSQ cannot fully describe an objective condition. In recent years, advanced driving assistance systems There have been many studies that classify driving style based (ADASs) have signiﬁcantly progressed, opening novel horizons on actual vehicle operating parameters, such as naturalistic in reducing trafﬁc accidents (Rezaei et al., 2021). Speciﬁcally, driving and ﬁeld operational tests (FOTs). For instance, Toledo with the rapid development of in-vehicle information systems et al. (2008) developed pattern recognition algorithms to and collision warning systems, a large amount of natural driving identify more than 20 maneuvers (such as lane change and data can be acquired through these types of ADASs (Bao et al., sudden braking) using naturalistic driving data on different 2020; Orlovska et al.,2020). In response to the great need of roads; this information was collected by onboard data loggers. driving style identiﬁcation for trafﬁc safety and fuel economy, On this basis, drivers were divided into three categories naturalistic data collection is becoming ever more feasible as combined with the weighted maneuvering frequency. The the penetration rate of ADASs increases in vehicles and on results showed that this method effectively predicts driving style. roadways around the world. Wang et al. (2015) extracted emergency braking maneuver Therefore, to explore the inﬂuence of different driving features from naturalistic driving data. On this basis, a behavior data on driving style identiﬁcation and realize the classiﬁcation regression tree model was established to estimate rapid and efﬁcient detection of driving style, this study obtained driving style, and drivers were divided into three risk groups a large amount of naturalistic driving data through an ADAS- according to nine rules. Xu et al. (2015) used naturalistic equipped vehicle and proposes a solution framework for rapid driving data from American highways and adopted a neural detection of driving style based on the driver’s longitudinal network (NN) model to divide driver styles into three types. In a driving operation conditions (DOCs). The proposed simulated scenario, Baer et al. (2011) rated ﬁve driving styles: framework calibrates the driver’s DOCs through naturalistic aggressive, anxious, economical, sensitive and calm. driving data and rapidly detects driving style through a machine Judging from the literature described above, it can be learning model according to the driving behavior parameter observed that driving style classiﬁcation methods and standards characteristics of different DOCs. To achieve the main goal of are not uniform. That said, previous studies have found that in this research, 44 subjects participated in naturalistic driving naturalistic driving, drivers generically categorized as high-risk experiments and data from the driver characteristics, vehicle drive faster, exhibit shorter time headways (THWs), brake motion attitude and micro driving operation was collected. The harder and change lanes more frequently than low-risk drivers framework for rapid identiﬁcation of driving styles proposed in (Sagberg et al., 2015; Xiong et al.,2012). It was also found from this research may be applied in intelligent connected and ﬁeld operation tests that low-risk drivers engage in fewer risky vehicle-road cooperative scenarios, providing a reference for maneuvers (Simons-Morton et al., 2015; Kusano et al.,2015). real-time and efﬁcient identiﬁcation of driving style to help While the aforementioned studies did identify differences drivers make real-time driving decisions. between driving styles, they did not establish evaluation models 18 Naturalistic driving data Journal of Intelligent and Connected Vehicles Nengchao Lyu et al. Volume 5 · Number 1 · 2022 · 17–35 to estimate driving style through different driving maneuvers. In 3. Data collection contrast, Guo and Fang (2013) classiﬁed drivers into three risk 3.1 Test equipment and test route groups by a K-means clustering method according to the To obtain real and reliable driving data, in this study, an maneuvers detected from naturalistic driving data on different automatic GAC Trumpchi passenger car equipped with FOT roads in the USA; the authors established a logistic model to data acquisition equipment, as shown in Figure 1, was used to predict driving style, which showed that the frequency of perform FOTs on various types of roads in Wuhan, China. emergency braking events was a valid indicator of high-risk The multi-functional road test vehicle platform is shown in drivers. Li et al. (2017) proposed a new method to identify Figure 1 and the data types and parameter descriptions driving style according to the transition patterns between collected by each experimental equipment are shown in maneuvering states. Driving behavior in highway trafﬁc Table 1. was divided into 12 maneuvering states. A conditional The installation of all instruments and equipment did not likelihood maximization method was used to extract typical hinder normal driving, such that the driver could maintain a maneuverability transfer patterns, which represented driving naturalistic driving state. The sampling frequency of the in- styles from 144 probabilities and the selected features were vehicle devices was 20100 Hz and the sampling interval of all classiﬁed by a random forest algorithm. The results showed that devices was set to 0.1 s. The naturalistic driving data was the transitions concerning ﬁve maneuver states – free driving, obtained in real time through the onboard laptop and the approach, near following, constrained left lane changes and driving video data was continuously stored in the memory card. constrained right lane changes – can reliably classify driving As shown in Figure 2, the experimental route consisted of style. Suzdaleva and Nagy (2018) proposed an online driving four sections. Detailed information for each section is provided style detection model based on both a normal component and in Table 2. classiﬁcation component mixed recursive Bayesian estimation. As can be observed from Table 1, Section 2 was a highway Seven driving styles associated with fuel economy were with dispersed trafﬁc volume. During the FOT drives, the identiﬁed using an online estimation algorithm. That algorithm trafﬁc ﬂow on this section was low and the trafﬁc density was can also be used to model and predict fuel consumption, speed, sparse, such that the experimental vehicle was in a free-driving throttle pedal position and gear selection. Lu et al. (2021) tried state for a long time. As can also be observed, Section 4 was an to understand the inﬂuence of different driving styles (such as arterial with congested trafﬁc volume. During the FOT drives, cautious, normal and aggressive) on key variables (such as this section of the road had a high trafﬁc ﬂow and density, such speed) in trafﬁc ﬂow theory and revealed the inﬂuence on that the experimental vehicle was in a car-following state for a network efﬁciency. The characteristics of different driving styles long time. Therefore, in both Sections 2 and 4, the motion were extracted from high-dimensional data clustering classes posture of the experimental vehicle was relatively stable; drivers and transformed into different vehicle-following models, which did not make any signiﬁcant operations that would make were simulated in a SUMO trafﬁcsimulator. driving style identiﬁable. The key to the modeling and analysis of driving style is the On the contrary, Sections 1 and 3 were both expressways extraction of driving maneuver features. Driving maneuvers are with moderate trafﬁc volume, and the road parameters were mainly divided into longitudinal or lateral. Longitudinal similar. During the FOT drives, the trafﬁc ﬂow was moderate maneuvers include free driving, approaching, following, and trafﬁc density was balanced, such that the experimental opening and emergency braking. Longitudinal maneuvers are vehicle made a variety of motion postures, and the driver’s classiﬁed according to the value of the THW, longitudinal operating characteristics were signiﬁcantly different, rendering acceleration (LA) and the perception of changes in the outward driving style easily identiﬁable. Therefore, 47 Kms of Sections size of the vehicle ahead (Toledo et al.,2007). More speciﬁcally, 1 and 3 were selected as the expressway test bed from which to the THW and LA are commonly used to describe the following observe the naturalistic driving data. maneuvers (Kondoh et al.,2008). Further, when rapid deceleration is not occurring, a 3-s THW or less is considered to be car-following (Kusano et al.,2015; Transportation Research 3.2 Participants This study mainly focused on model and data analysis. The Board, The Highway Capacity Manual, 2010). In other words, experiment was outdoor naturalistic driving, the experimental If the THW of the front and rear vehicles exceeds 3.0 s, it is considered a free drive operation. This study focuses on longitudinal driving behavior and Figure 1 Multi-functional road test vehicle platform simpliﬁes the impact of lateral driving behavior. The scope of this study was based on an urban expressway with high trafﬁc ﬂow and speed and the inﬂuence of acceleration and deceleration during the process of vehicle following was considered. According to the literature summary and the understanding and analysis of naturalistic driving data, this study took a 6.0-s THW as one of the criteria to indicate car- following. The longitudinal driving data was extracted from naturalistic driving data to classify different driving conditions, and different machine learning models were selected to construct driving style classiﬁcation models; the accuracy of the various models was then compared to ﬁnd the best ﬁt. 19 Naturalistic driving data Journal of Intelligent and Connected Vehicles Nengchao Lyu et al. Volume 5 · Number 1 · 2022 · 17–35 Table 1 Original data collected by naturalistic driving experimental platform Data acquisition equipment Data type Parameter OBD-II Vehicle operation and kinematics data Speed, accelerator pedal opening, braking pressure, steering wheel angle, steering wheel angular speed Mobileye M630 Position information Distance from right and left lane line, THW INS RT2500 Vehicle’s movement and longitude and latitude Lateral acceleration, longitudinal acceleration, information longitude, latitude, yaw rate IBEO LUX-4 LiDAR Forward target and road edge information Lateral and longitudinal distance of forward target, lateral and longitudinal relative velocity of forward target MOVON camera Video information Driving video For this study, a total of 44 participants were recruited (female Figure 2 Naturalistic driving experiment data acquisition equipment = 19; male = 25). The participants’ age ranged from 22 to 55 years old (mean = 32.8, SD = 8.2). Their driving experience ranged from 2 to 18 years (mean = 6.9) and their total lifetime driving mileage ranged from 400 to 400,000 Kms (mean = 110,000). The distribution of gender, age and experience of the sample was consistent with the distribution of the general driving population in China. 3.4 Test process In this study, naturalistic driving data was collected using a road environment was good, the trafﬁc volume was moderate single test vehicle and a continuous measurement method. and the weather was sunny. During the whole process of the Each subject drove the test vehicle one time along the test road experiment, an experimental assistant was arranged to monitor during a weekday. To avoid trafﬁc ﬂow disturbance caused by the risk factors and explain the experimental requirements in peak periods, the test was run between 09:00 to 16:00 (outside real time. The research plan was discussed with the research of rush hour). Each test provided subjects with route guidance group, and all participants were informed of the experimental only and did not interfere with their daily driving habits so as to requirements and impacts. Sample size selection is critical to obtaining sufﬁcient keep the subjects in a naturalistic driving state. The test data experimental data. If the sample size is too small, the reliability was preprocessed to facilitate statistical analysis. of the results will be reduced and if the sample size is too large, resources will be wasted. For this study, the correct sample size 3.5 Data processing was calculated based on expected variance, target conﬁdence The raw data collected by the natural driving experimental and error margin according to reference (Zhao et al.,2020)as platform and the other methods is shown in Table 2. Because follows: the original data collected by the onboard sensor inevitably 2 2 2 experienced defects, such as missing frames, discontinuity and N ¼ Z s =E (1) jump, it was necessary to clean and preprocess the original data to ensure quality. Therefore, this study used cubic spline where N is the sample size; Z is the standard normal interpolation to supplement the lost frames, ﬁltered the noise distribution statistic; s is the standard deviation; E is the and corrected the jump data based on the Savitzky-Golay ﬁlter maximum error. and ﬁnally obtained accurate vehicle motion attitude. Generally, a signiﬁcance level of 10% is chosen to reﬂect the The data collected in this study included driver attributes, 90% conﬁdence level of the unknown parameter. In this study, operation parameters and road characteristics, as shown in when the conﬁdence level was 90%, Z = 1.25, s was 0.250.5 (Chow, 2007)and E = 10%. Therefore, the minimum sample Table 3. Driver attributes included driver ID, age and gender. size required for calculation ranged from 10 to 39. Operation parameters included speed, LA, THW, time to Table 2 Detailed information for each section Length Section Road type Speed limit (km/h) Lanes in each direction (km) Traffic volume 1 Expressway 70 3 13 Moderate 2 Highway 100–120 3–4 45 Dispersed 3 Expressway 80 3–4 34 Moderate 4 Arterial 40–60 2–3 12 Congested 20 Naturalistic driving data Journal of Intelligent and Connected Vehicles Nengchao Lyu et al. Volume 5 · Number 1 · 2022 · 17–35 Figure 3 Test route Score Table 3 Data collection E or E or E ; j if E ¼ E ¼ E A B C A B C Data structure Variable > > E or E ; j if E ¼ E 6¼ E ; jE E j 1 A B A B C A C Driver attribute data ID, age, gender, ¼ E or E ; j if E ¼ E 6¼ E ; jE E j 1 A C A C B A B Driving operation data Speed, THW, TTC, DHW, longitudinal > E or E ; j if E ¼ E 6¼ E ; jE E j 1 B C B C A A B acceleration rerate; j otherwise Road data Type, length (2) where E is the scoring value of the ﬁrst expert, E is the scoring A B collision (TTC) and distance headway (DHW). Road value of the second expert and E is the scoring value of the third expert. characteristics included road type and length. The results from the DSQ are shown in Figure 4. In total, 16 drivers were scored as cautious, 22 drivers were scored as 3.6 Subjective driving style evaluation moderate and 6 drivers were scored as aggressive. As the DSQ uses subjective responses for driving style calibration, the analysis results are not only limited by sample 4. Method characteristics and structural validity, but the data focuses more on driving behavior preferences and cannot fully describe 4.1 Research strategy a true objective driving condition. In this study, using the three- In a naturalistic driving environment, due to the inﬂuence of point scale method (Li et al., 2017), three drivers with rich road conditions, trafﬁc conditions, driver characteristics and driving experience (the actual driving mileage per person was other impactful factors, drivers will make myriad operations, more than 60,000 Kms and the driving experience per person such as accelerating, decelerating, parking, approaching, was more than eight years) were selected as the scoring experts. following and more. However, because different drivers have Driving style was scored according to the video data based on different driving styles, they make different operations under three points, namely, 1 indicated a conservative driving style, 2 the same conditions. Therefore, the driver’s operating indicated a moderate driving style and 3 indicated an aggressive performance under these different driving conditions can be driving style. The scoring rules were set as follows: used to identify that driver’s style. 21 Naturalistic driving data Journal of Intelligent and Connected Vehicles Nengchao Lyu et al. Volume 5 · Number 1 · 2022 · 17–35 Figure 4 Driving style labeled results As shown in Figure 5, this research ﬁrstly identiﬁed different DOCs. The following sections discuss the DOC labels and the driving styles and then labeled DOCs based on naturalistic labeling process is shown in Figure 6. The labeling of driving data. Then, operating parameters were extracted under longitudinal driving behavior conditions was performed in two different DOCs and four machine learning classiﬁcation steps: (MLC) methods were used to predict driving style; the 4.2.1 Label acceleration and deceleration segments prediction performance of the models was then evaluated. Taking an acceleration segment as an example, a sliding time window was adopted. From the initial moment when the 4.2 Label method of driving operation conditions vehicle entered the expressway, a ﬁxed sampling threshold was Previous studies have shown that relative distance and relative set to 50 frames. speed are two important indicators of longitudinal driving; they As shown in Figure 7, the abscissa represents the number of can be used to simulate driver behavior by taking them as frames, and the ordinate represents the speed. Within the 50- elements of a regression function in longitudinal driving frames range of the sliding time window (t ,t 50), if the 2 1 scenarios and models (Itkonen et al., 2020). Therefore, in this speed increased, the driving segment of (t , t ) was temporarily 1 2 study, speed and THW were selected as the label basis of the marked as an accelerating segment, otherwise, the driving Figure 5 Research strategy of driving style identiﬁcation 22 Naturalistic driving data Journal of Intelligent and Connected Vehicles Nengchao Lyu et al. Volume 5 · Number 1 · 2022 · 17–35 Figure 6 Label process of DOCs Figure 7 Schematic diagram of acceleration and deceleration segment label segment of (t , t ) was marked as a conventional driving If t , t > 5, t , t 50, then (t , t )and (t , t ) were marked 1 2 3 2 4 3 1 2 3 4 segment. If the speed decreased in the range of (t , t ), as accelerating segments and (t , t ) was marked as a 2 3 1 2 subsequent processing was required. The subsequent conventional driving segment for a further label; processing followed key principles: If t , t > 5, t , t < 50, then (t , t ) was marked as an 3 2 4 3 1 2 When the speed decreased at t but started to rise at t and accelerating segment and (t , t ) was marked as a conventional 2 3 2 4 the speed reaches its peak at t : driving segment for a further label. If t , t 5, v > v , then (t , t ) was marked as an Then, THW was used to determine whether the vehicle was 3 2 t t 1 4 4 2 accelerating segment; following a car in the time window and the driving segment was 23 Naturalistic driving data Journal of Intelligent and Connected Vehicles Nengchao Lyu et al. Volume 5 · Number 1 · 2022 · 17–35 Figure 8 Acceleration and deceleration segment label Speed Acc. segment Dec. segment 0 0.5 11.5 22.5 33.5 Frames 10 Figure 9 The label result of longitudinal DOCs 100 10 Speed 90 9 DOC 80 8 70 7 60 6 50 5 40 4 30 3 20 2 10 1 0 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Frames 4 labeled as either a following acceleration condition (FAC) or a The FrAC and FrDC indicate that the speed of the host vehicle free acceleration condition (FrAC). Because of the detection increased or decreased, respectively, within the sliding window detection time of 50 frames and either no leading vehicle was in equipment, a 0 in the THW data meant that there was no front or the headway time between the front and rear vehicles leading vehicle and a non-zero meant that there was a leading was more than 6.0 s. vehicle detected ahead. In addition, the accelerating segment The FAC and FDC (following deceleration condition) with THW 6s was also marked as a FrAC because when indicate that the speed of the host vehicle increased or THW 6, the vehicle was in a relatively safe driving state. The decreased, respectively, within the sliding window detection label process for deceleration conditions was similar. time of 50 frames and a leading vehicle was in front and the headway between the front and rear cars was within 6.0 s. 4.2.2 Label other conventional driving segments The FCC indicates that the speed of the host vehicle changed The other conventional driving conditions included a free cruise repeatedly within the sliding window detection time of 50 condition (FCC), following steady condition (FSC), relatively frames and either a leading vehicle was not detected in front or distant condition (RDC) and a relative approximation the headway between the front and rear vehicles was more than condition (RAC). The sliding window was used to identify and 6.0 s. label these continuous driving segments – all except for the The RDC and RAC indicate that the speed of the host FCC, which was labeled based on a THW > 6s or THW = 0 – vehicle changed repeatedly and alternately within the sliding and the threshold and methods were similar to the acceleration window detection time of 50 frames and a leading vehicle was label process described above. Within the sampling threshold, in front and the headway of the front and rear vehicles was an increasing or decreasing THW was determined and the FSC, within 6.0 s. Within the 50-frame sliding window detection RDC and RAC were automatically labeled by MATLAB. time, the headway time showed an increasing RAC or a This study did not consider the impact of latitudinal vehicle decreasing RDC. operations (i.e. lane-changing). Only longitudinal driving The FSC indicates that the speed of the host vehicle changed conditions were considered. To sum up, the eight longitudinal repeatedly within the sliding window detection time of 50 driving conditions are deﬁned as follows. frames and there was a leading vehicle in front and the headway Speed Speed Naturalistic driving data Journal of Intelligent and Connected Vehicles Nengchao Lyu et al. Volume 5 · Number 1 · 2022 · 17–35 between the front and rear vehicles was within 6.0 s. Within the learning and artiﬁcial intelligence. The SVM proposed by sliding window detection time of 50 frames, the headway time Cortes and Vapnik made full use of the structural risk showed repeated and alternate changes. minimization theory, thus ensuring the strong generalization ability of the model (Cortes and Vapnik., 1995). SVMs are a 4.2.4 Measurement of index supervised learning method to predict the labels of points in the Drawing on the 10 observable driving style indices described in test data set by learning the model of the training data set. This existing literature (Itkonen et al., 2020), the longitudinal method is well-known in computer science and has been widely driving behavior analysis indices and particular index used in the ﬁeld of transportation engineerings, such as trafﬁc measurement, including speed (V), LA, THW, the count accident prediction (Tang et al., 2020; Zhang et al.,2018), road backward of TTC and DHW, were selected to characterize the risk prediction (Basso et al., 2018), vehicle trajectory state driving style. For each driving condition, the index recognition (Siddique et al., 2019) and path selection (Sun measurement was different. For example, for FCC, because et al.,2017), driving behavior prediction (Wang et al., 2017) only parameters of the vehicle were relevant, only speed and LA and driving state recognition (Chai et al.,2019; Allahviranloo, were calculated. The particular analysis indices and index 2013). SVMs have generally good predictive performance. measurement values of the DOCs are shown in Table 4. (2) Extreme gradient boosting (XGB): The studied naturalistic driving data was captured from 44 XGB is an integrated machine learning model based on many participants driving on experimental road section 1 and decision trees that use an optimized gradient boosting system. experimental road section 3. The speed limit of road 1 was It has the advantages of performing parallel processing, 70 Km/h and the speed limit of road 3 was 80 Km/h. In approximate greedy search and improving the learning process addition, the length of the two roads and the trafﬁc ﬂow on each in the shortest time without overﬁtting. It has been proven that road also differed, as observed through video. Therefore, to XGB has superior predictive performance and processing time ensure high quality data analysis, the data of roads 1 and 3 were compared with the random forest model (Chen and Boost, divided into independent analysis units and data from the 2016). In recent years, XGB models have been proven to have whole process of driving on each section from the beginning to good performance in trafﬁc ﬂow prediction (Mahmoud et al., the exit was divided into small units with equal time intervals 2021), rail defects prediction (Mohammadi et al.,2019), according to t = 600 frames. Then, the statistical index values driving behavior prediction (Ayoub et al., 2021) and road risk in the small units that were split in different sections were identiﬁcation and prediction (Das et al., 2020). analyzed, as shown in Table 3. Fragmented data less than (3) Logistic regression (LR): 10 min was removed and subsequent analysis was not carried LR is generally used to model the relationship between a out. In this way, the naturalistic driving data from the 44 drivers categorical dependent variable and categorical/dichotomous/ on the two tested expressways was divided into 229 driving continuous independent variables. These models predict the segments, and all the statistical analysis indicators were probability of occurrence of the dependent variable using a set summarized to form a 229 211 driving condition index of given independent variables (Venkata et al.,2020). LR is a analysis matrix. generalized linear model and has been widely used in accident prediction (Venkata et al., 2020; Dong et al.,2018) and conﬂict 4.3 Machine learning classiﬁcation methods risk prediction (Costela et al.,2020)in trafﬁc safety research. At This study aimed to test the feasibility of using longitudinal the same time, it is used in trafﬁc system performance tests DOCs to identify driving styles through MLC algorithms. To (Caﬁso et al., 2020; Liu et al.,2018) and behavior prediction achieve these goals and based on previous literature, it was (Farooq et al., 2021; Ghasemzadeh et al., 2018). found that MLC models, namely, SVM, XGB, LR and MLP, (4) Multi-layer perceptron (MLP): have shown relatively good predictive performance in existing practical applications. Therefore, this study evaluated the The use of NNs and deep learning optimization algorithms to prediction performance of these four machine learning models enhance discrete selection models is an active research area, based on the label analysis of the DOCs: which has shown encouraging results (Zargarnezhad et al., (1) Support vector machines (SVMs): 2019). In recent years, experimental cases of deep learning methods in discrete choice models have been explored, such as Support vector machines (SVM) are one of the most widely personal travel mode prediction (Omrani, 2015), path tracking used supervised classiﬁcation methods in the ﬁeld of machine prediction (Ge et al.,2021), driving behavior feature recognition (Jasper et al.,2018) and more. As a basic three- Table 4 The measures of driving style used in the analysis layered back-propagation MLP model was used to develop the Analysis ﬁrst NN (Clark, 1993), MLP has been developed into a novel DOCs index Measurement of index non-parametric approach based on an MLP NN and has been demonstrated to be successful in complex behavioral data FrAC, V, LA, THW, Mean, standard deviation, quartile modeling (Costa et al., 1997). FrDC, FAC, TTC-1, DHW (15%, 50%, 85%), mode (except the FDC, FSC, parameter of TTC-1), maximum, 4.4 Model prediction performance evaluation RDC, RAC minimum After parameter adjustment and model training, it was FCC V, LA Mean, standard deviation, quartile necessary to evaluate the generalization ability of the model on (15%, 50%, 85%), mode, maximum, an independent test set. To evaluate the performance of the minimum prediction model, a confusion matrix was introduced. Taking 25 Naturalistic driving data Journal of Intelligent and Connected Vehicles Nengchao Lyu et al. Volume 5 · Number 1 · 2022 · 17–35 the dichotomy problem as an example, the confusion matrix is It can be observed in Table 7 that the FCC occurred most shown in Table 5. frequently, indicating that, when driving on the expressway, True positive indicates that the number of the true value was drivers were most likely to adopt FCC and less likely to adopt positive and the predicted value was positive. False negative FAC and FDC. The reason may be that when formulating the indicates that the number of the true value was positive, but the criteria for labeling the DOCs, the model was established based on the naturalistic driving data of the entire experimental road predicted value was negative. False positive indicates that section. The data input took into account driving data from the number of the true value was negative, but the predicted multiple types of roads, whereas only two types of roads were value was positive. True negative indicates that the number of actually analyzed. In addition, the overall law of DOCs the true value was negative and the predicted value was negative. distribution among all drivers was roughly the same, but the The indicators of accuracy (ACC), precision (PPV), mean and variance of each DOC ratio were different, which sensitivity or recall rate (TPR), FPR, speciﬁcity TNR and the reﬂects the heterogeneity of the frequency distribution of the F1-score were used to evaluate the performance of the models. different DOCs. The calculation formula and meaning of the evaluation indices are shown in Table 6. 5.2 Driving style identiﬁcation with different machine learning classiﬁcation methods With reference to the four machine learning models, a sample 5. Results set was established to distinguish driving style. The difference 5.1 Calibration results of longitudinal driving operation in this study is that the samples were divided into driving style conditions labels – namely, conservative driving style, moderate driving Naturalistic driving data from 47 Kms of the expressway (test style and aggressive driving style – in the data aggregation stage. route 1 and 3) was extracted and the label method described in The sample set was divided into 70% training set and 30% test the previous section was used to identify the DOCs from 44 set. At the data level, the problem of the imbalanced number of drivers on the tested expressway. Figure 10 describes the DOCs samples for conservative drivers, moderate drivers and frequency distribution from different drivers. It can be observed aggressive drivers was addressed. The ENN method was used that unlike the label results on the entire experimental section, to undersample the normal samples in the training set. Then, only six DOCs, namely, FAC, FDC, FCC, RDC, RAC and the ﬁve-fold cross-validation method was used to train and FSC, appeared on the expressway for all drivers, while FrAC verify the data of the training set and ﬁnally the model was and FrDC did not appear at all. By deﬁnition, FrAC and FrDC tested on an independent test set. Table 8 shows the confusion generally do not appear on expressways and by reviewing the matrix predicted by the established model to distinguish driving natural driving video data, it was also conﬁrmed that FrAC and style on the independent test set. The values in Table 8 FrDC are not present on the tested expressway. represent the number of driving segments in the test set. Table 9 shows that the MLP model had the highest overall Table 5 Confusion matrix accuracy. The most accurate prediction models of aggressive driving style, moderate driving style and conservative driving Predictive value style were XGB, MLP, MLP (PPV ¼ 1:000; Aggressive Positive Negative PPV ¼ 0:659; PPV ¼ 0:867Þ, respectively. Ordinary Conservative Real value From the perspective of sensitivity (TPR), the detection rate of Positive TP FN moderate driving style was higher than that of aggressive Negative FP TN driving style and conservative driving style. This shows that these models had better predictive ability for moderate driving Table 6 Model prediction result evaluation index Evaluation index Formula Meaning TP Accuracy (ACC) The proportion of all the correct results of the classiﬁcation model to the total observed values ACC ¼ TP1 TN1 EP1 FN TP Precision (PPV) PPV ¼ Among all the results where the model prediction was positive, the proportion of correct model TP1 FP predictions TP Sensitivity (TPR) TPR ¼ Among all the results where the true value was positive, the proportion of correct model TP1 FN predictions FP False positive (FPR) FPR ¼ Among all the results that the true value was negative, the proportion that was incorrectly FP1 TN predicted TN Speciﬁcity (TNR) TNR ¼ Among all the results where the true value was negative, the proportion of correct model TN1 FP predictions TP F1-score Score ¼ Integrate the results of precision and recall’s output. The value ranged from 0 to 1. 1 represents ðÞ TP1 FN1 FP =2 the best output of the model, and 0 represents the worst 26 Naturalistic driving data Journal of Intelligent and Connected Vehicles Nengchao Lyu et al. Volume 5 · Number 1 · 2022 · 17–35 Figure 10 Frequency cumulative distribution of different DOCs styles. The FPR of moderate driving style was higher than that other prediction results exceeded 0.5, indicating that the overall of aggressive driving style and conservative driving style. In the output performance of the model was general under this sample point of view of the F1-score, apart from the LR model, the size. Because it was difﬁcult to clearly deﬁne a moderate driving style, its recognition rate was not high, which affected the Table 7 Statistical analysis of different longitudinal DOCs overall recognition level of all the models. Table 9 also shows that, under the current sample size, a DOC FAC FDC FCC FSC RDC RAC small number of extracted longitudinal driving conditions can Mean (%) 6.27 3.78 44.61 13.50 12.61 19.24 be used to effectively identify driving styles through MLC Median (%) 6.00 3.66 42.30 13.30 12.61 19.47 models, and with the increase of sample size, the accuracy of Standard deviation 2.39 1.56 14.28 6.21 5.95 8.16 driving style identiﬁcation will signiﬁcantly improve. However, Minimum (%) 2.40 1.17 19.77 0.12 0.18 0.12 different MLC models differ in performance in the Maximum (%) 13.26 8.58 85.90 35.12 31.95 34.68 identiﬁcation of driving style. It was found that the four models 27 Naturalistic driving data Journal of Intelligent and Connected Vehicles Nengchao Lyu et al. Volume 5 · Number 1 · 2022 · 17–35 Table 8 Different machine learning model prediction results SVM XGB Predictive value Predictive value Aggressive Moderate Conservative Aggressive Moderate Conservative Real value Aggressive 5 4 0 Aggressive 3 6 0 Moderate 1 30 2 Moderate 0 27 6 Conservative 0 17 10 Conservative 0 13 14 LR MLP Predictive value Predictive value Aggressive Moderate Conservative Aggressive Moderate Conservative Real value Aggressive 2 6 1 Aggressive 6 2 1 Moderate 3 23 7 Moderate 3 29 1 Conservative 1 15 11 Conservative 1 13 13 Table 9 The prediction results of different machine learning models Driving style Aggressive Moderate Conservative Evaluation index PPV TPR FPR TNR F1-Score PPV TPR FPR TNR F1-Score PPV TPR FPR TNR F1-Score ACC SVM 0.833 0.556 0.017 0.983 0.667 0.588 0.909 0.583 0.417 0.714 0.833 0.37 0.048 0.952 0.513 0.652 XGB 1.000 0.333 0 1.000 0.500 0.587 0.818 0.528 0.472 0.684 0.700 0.519 0.143 0.857 0.596 0.638 LR 0.333 0.222 0.067 0.933 0.267 0.523 0.697 0.583 0.417 0.597 0.579 0.407 0.190 0.810 0.478 0.522 MLP 0.600 0.667 0.067 0.933 0.631 0.659 0.879 0.417 0.583 0.753 0.867 0.481 0.048 0.952 0.619 0.696 all showed good performance in the prediction of driving style. The four longitudinal driving behavior parameters of speed, However, in terms of accuracy, precision, recall and F1-score, THW, TTC and DHW showed signiﬁcant differences in the MLP model had the best prediction results. driving style identiﬁcation, while the LA showed no signiﬁcant difference in driving style identiﬁcation. In particular, THW, TTC and DHW showed highly signiﬁcant differences in 6. Discussion driving style identiﬁcation. This also indicated that the driver’s 6.1 Statistical analysis of parameters based on different subjective perception of LA during natural driving was far driving styles less strong than the objective factors of speed, THW, TTC According to the calibration results of the DOCs, a scatter and DHW. This distinction is useful for ADAS-equipped diagram of average longitudinal driving behavior parameters vehicles, which can display THW, TTC and DHW in real time was drawn, as shown in Figure 11. It can be observed that the through the onboard intelligent display terminal, so that drivers scatter distribution of THW and DHW was signiﬁcantly can easily respond to this data and adopt different driving different. strategies – also in real time. Compared with cautious and moderate drivers, aggressive From the results of multiple comparison analyzes, LA drivers adopted a smaller THW and DHW during the natural showed no signiﬁcant difference between the three driving driving experiment, indicating that THW and DHW showed styles. At the same time, speed and TTC showed no signiﬁcant high signiﬁcance for the identiﬁcation of driving style. difference between moderate and aggressive drivers. This also However, the signiﬁcance of the other three parameters for the indirectly shows that there was little difference between identiﬁcation of driving style needed to be further analyzed. moderate and aggressive drivers. According to normality and lognormality tests, it was found that the longitudinal driving behavior parameters of different 6.2 Statistical analysis of parameters based on different driving styles do not conform to the Gaussian distribution. longitudinal driving operation conditions Therefore, a non-parametric test was adopted to analyze the Based on the 229 segments of naturalistic driving data, the box correlation of longitudinal driving behavior parameters to plots of the mean values of speed, LA, THW, TTCi and DHW different driving styles. As the number of drivers who exhibited were drawn according to the six DOCs, as shown in Figure 12. different driving styles was imbalanced, as were the different It should be noted that the FCC lacked the statistics of THW, DOCs parameters, the sample size of each group was TTCi and DHW. asymmetrical. Therefore, the Kruskal Wallis test method was It can be observed in Figure 12 that, among all the DOCs, used for a non-parametric one-way ANOVA of the population the mean speed of the FDC was the lowest (FAC = 54.3 Km/h, sample. Meanwhile, Dunn’s Multiple Index test method was FDC = 42.4 Km/h, FCC = 47.3 Km/h, FSC = 57.5 Km/h, also selected for the non-parametric one-way ANOVA comparative analysis of driving data from drivers who exhibited RDC = 58.3 Km/h and RAC = 60.1 Km/h). This shows that different driving styles. The results are shown in Table 10. when drivers were in the FDC, most drove at a low following 28 Naturalistic driving data Journal of Intelligent and Connected Vehicles Nengchao Lyu et al. Volume 5 · Number 1 · 2022 · 17–35 Figure 11 Data scatter diagram of longitudinal DOCs of drivers with different driving styles Table 10 Non-parametric test of one-way ANOVA results P-value summary Non-parametric test method Speed Acceleration THW TTCi DHW Kruskal-Wallis test 0.011 0.784 <0.001 0.004 <0.001 Dunn’s multiple comparisons test Cautious vs moderate 0.048 0.980 0.001 0.020 0.006 Cautious vs aggressive 0.014 >0.999 <0.001 0.006 <0.001 Moderate vs aggressive 0.450 >0.999 <0.001 0.422 <0.001 speed to maintain safety. However, the average speed was was bigger than FDC (Q3 = 3.58 s, Q1 = 2.22 s, IQR = 1.36 s, higher in the FAC, which indicates that the following vehicle mean = 2.94 s), which indicates that drivers generally accelerated when the lead vehicle accelerated. The speed maintained a larger THW when following accelerating vehicles distribution of the FSC, RDC and RAC was relatively uniform. than when following decelerating vehicles. This shows that The FCC had the largest range of speed ﬂuctuations. This may when a rear vehicle followed a front accelerating vehicle, the be related to the fact that the vehicle entered an expressway rear vehicle showed a delay effect. When the rear vehicle from an urban road with a relatively low average speed. During followed a front decelerating vehicle, the rear vehicle showed this process, vehicles were required to accelerate. aggressive behavior, resulting in a small THW. This can also be The average value of LA of FAC and FDC had similar observed from the LA index of FDC. It can be observed from the distributions and the average absolute value of the LA between interquartile range of the average THW of FSC (Q3 = 2.35 s, FAC and FDC (FAC = 0.42, FDC = 0.47) had little Q1 = 1.41s, IQR = 0.94s, mean = 1.86 s), RDC (Q3= 1.98 s, difference, but the absolute value of the maximum value of Q1 = 1.49 s, IQR = 0.49 s, mean = 1.71s) and RAC (Q3 = FDC was slightly larger than FAC (FAC = 0.78, FDC = 0.90) 2.00 s, Q1 = 1.55 s, IQR = 0.45 s, mean = 1.80 s) that when the and signiﬁcantly higher than the other DOCs. This shows that vehicle was in these three DOCs, although the vehicle was still the driver had obvious acceleration or deceleration under these following, it did not rapidly accelerate or decelerate, but the two DOCs, but the driving operation under the other DOCs THW was already less than 3.0 s, which is consistent with was relatively smooth. The abnormal value of LA also existing research conclusions (Xu et al.,2015;Suzdaleva and illustrated the operating performance of aggressive drivers Nagy, 2018). under different DOCs. In general, although drivers exhibited different driving styles, Meanwhile, the interquartile range of the average THW of they all maintained a large TTC when driving on the FAC (Q3 = 4.40 s, Q1 = 2.92 s, IQR = 1.48 s, mean = 3.46 s) expressway. While the TTC index has been widely used for 29 Naturalistic driving data Journal of Intelligent and Connected Vehicles Nengchao Lyu et al. Volume 5 · Number 1 · 2022 · 17–35 Figure 12 Box plot for different DOCs potential risk assessment, the abnormal value of TTC under are presented in Table 11, which shows that longitudinal different DOCs reﬂects the behavior of different driving styles; driving behavior parameters showed highly signiﬁcant differences in the calibration of longitudinal DOCs (p < 0.001). in particular, the TTC of aggressive drivers ﬂuctuated greatly. Apart from the FCC, the interquartile distribution distance of the FAC (Q3 = 70.6 m, Q1 = 31.2 m, IQR = 39.4 m, mean = 6.3 Frequency of longitudinal driving operation 52.8 m) and FDC (Q3 = 51.5 m, Q1 = 14.8 m, IQR = 36.7 m, conditions based on different driving styles mean = 35.1 m) were much larger than that of the FSC (Q3 = As shown in Figure 13 and Table 11, the results of the DOC 38.2 m, Q1 = 19.2 m, IQR = 19 m, mean = 30.3 m), RDC calibrations were classiﬁed and statistically analyzed according (Q3 = 33.5 m, Q1 = 22.4 m, IQR = 11.1 m, mean = 28.0 m) to driving style. In this naturalistic driving test, all drivers and RAC (Q3 = 35.1 m, Q1 = 25.4 m, IQR = 9.7 m, mean = regardless of their dominant style preferred FCC. In addition to 30.5 m), indicating that the DHW of all drivers regardless of the inﬂuence of road factors (such as less crowded trafﬁc ﬂow their dominant style was signiﬁcantly different under the FAC and better road alignment), it showed that all drivers preferred and FDC, while the DHW difference was not signiﬁcant under free cruising conditions and attempted to avoid complex the FSC, RDC and RAC. In addition, from the perspective of following conditions. mean distribution, the mean DHW of the FAC was higher than It can be observed in Table 11 that cautious drivers took the that of the FDC (FAC = 52.8, FDC = 35.1), which indicates largest proportion of FCC and the one-way ANOVA showed that all drivers regardless of their dominant style were more no difference (P = 0.073), indicating that cautious drivers tended to maintain FCC for a long time. On the contrary, there inclined to follow a vehicle with a larger distance under the were signiﬁcant differences between moderate and aggressive FAC. drivers, indicating that they will change their driving strategies According to normality and lognormality tests, it was found that the longitudinal driving control data of different DOCs did according to the changes of driving environment in the not conform to the Gaussian distribution, so a non-parametric process of naturalistic driving. In particular, the proportion of test and analysis was adopted. FSC and RAC by aggressive drivers was higher, indicating that As the number of driving segments was consistent, the aggressive drivers tended to challenge complex driving Friedman test method was used for a non-parametric one-way conditions. ANOVA of the sample population. At the same time, Dunn’s multiple comparisons test method was selected to perform a 6.4 Discussion of model recognition results non-parametric one-way ANOVA comparison analysis on the With reference to the four machine learning models, a sample driving segment data from different DOCs. The analysis results set was established to distinguish driving styles. The difference 30 Naturalistic driving data Journal of Intelligent and Connected Vehicles Nengchao Lyu et al. Volume 5 · Number 1 · 2022 · 17–35 Table 11 Non-parametric test of one-way ANOVA results P-value summary Average longitudinal Non-parametric test method Average speed acceleration Average THW Average TTCi Average DHW Friedman test <0.001 <0.001 <0.001 <0.001 <0.001 Dunn’s multiple comparisons test FAC vs FDC <0.001 <0.001 <0.142 <0.001 <0.001 FAC vs FCC 0.817 <0.001 –– – FAC vs FSC >0.999 <0.001 <0.001 0.040 <0.001 FAC vs RDC >0.999 <0.001 <0.001 <0.001 <0.001 FAC vs RAC 0.008 <0.001 <0.001 <0.001 <0.001 FDC vs FCC <0.001 <0.001 –– – FDC vs FSC <0.001 <0.001 <0.001 <0.001 >0.999 FDC vs RDC <0.001 <0.001 <0.001 <0.001 0.288 FDC vs RAC <0.001 <0.001 <0.001 >0.999 >0.999 FCC vs FSC 0.054 0.003 –– – FCC vs RDC 0.032 >0.999 –– – FCC vs RAC <0.001 <0.001 –– – FSC vs RDC >0.999 0.001 0.359 <0.001 0.529 FSC vs RAC 0.194 0.094 >0.999 <0.001 >0.999 RDC vs RAC 0.302 <0.001 0.714 <0.001 0.022 tested on an independent test set. Table 9 shows the confusion Figure 13 Frequency of different DOCs between different driving style matrix predicted by the established model to distinguish driving style on the independent test set. The values in Table 9 represent the number of driving segments in the test set. Figure 14 shows the variation trend and overﬁtting of the prediction accuracy of the training set and validation set with the increased sample training number in the cross-validation process of the four machine learning models, namely, SVM, XGB, LR and MLP. Table 9 shows the comparison of the prediction results of these models on the test set. For multiple classiﬁcation problems, the evaluation index of the model was redeﬁned. The accuracy of the model was the same as that of the binary classiﬁcation problem, which was still the right proportion of the correctly classiﬁed samples to all the samples. As the confusion matrix of the three-way classiﬁcation was different from that of the dichotomy, the PPV, TPR, FPR, TNR, F1-score were also different. In this study, to directly is that the samples were divided into driving style labels in the reﬂect the prediction of different driving styles, when data aggregation stage, namely, conservative driving style, calculating the evaluation index of any type of driving style moderate driving style and aggressive driving style. The sample prediction, the two types of driving styles were merged as one set was divided into 70% training set and 30% test set. At the situation and then it was regarded as a binary classiﬁcation data level, the problem of the unbalanced number of problem. conservative driving style, moderate driving style and aggressive Figure 14 shows that the ﬁtting accuracy of the SVM model driving style samples was addressed. The ENN method was used to undersample the normal samples in the training set. on the training set was less than 80%, while the ﬁtting accuracy Then, the ﬁve-fold cross-validation method was used to train of the other three models on the training set reached 100%. and verify the data of the training set and ﬁnally the model was Moreover, with the gradual increase of the number of samples, Table 12 Statistical analysis of frequency of different longitudinal DOCs Frequency of different DOCs (%) Driving style FAC FDC FCC FSC RDC RAC Mean (%) SD Sig Cautious 6.25 3.55 52.57 10.08 14.23 13.32 16.67 18.05 0.073 Moderate 6.19 3.93 42.02 13.50 12.58 21.79 16.67 13.91 0.033 Aggressive 6.61 3.82 32.92 22.59 8.42 25.64 16.67 11.95 0.019 31 Naturalistic driving data Journal of Intelligent and Connected Vehicles Nengchao Lyu et al. Volume 5 · Number 1 · 2022 · 17–35 Figure 14 Cross-validation results of different machine learning models the performance of the SVM model on the training set 7. Conclusions worsened. In other words, the SVM model tended to be The driving style of each driver is not ﬁxed; it is affected by suitable for the training of data sets with a small sample size. driving environment, trafﬁc state, psychological state and From the point of view of the validation score, all the models myriad other inﬂuencing factors. This exempliﬁes the were over-ﬁtting. However, as the sample size gradually characteristics of temporal and spatial instability and segment increased, the scores of all the models on the test set showed an heterogeneity. If a real-time evaluation method of driving style upward trend and the change was most obvious for the XGB based on driving segment change can be constructed, it is model. With the increase of the test sample size, the problem of of great signiﬁcance for formulating personalized driving overﬁtting of each classiﬁcation model was gradually alleviated. strategies, improving driving safety and reducing fuel Compared with other models, the overﬁtting problem of the consumption. The purpose of this research was to identify SVM model had a smaller gap, but this was because the DOCs based on longitudinal driving behavior data and rapidly performance of the SVM model increased on the test set but predict and label driving styles through MLC models. The decreased on the training set. That is to say, the SVM model main contributions of this research are as follows: relied on the decrease of accuracy on the training set and the Based on the longitudinal driving behavior parameters of increase of accuracy on the test set to solve the over-ﬁtting naturalistic driving data, six DOCs of naturalistic driving problem, which is completely inconsistent with the performance on expressways were calibrated by formulating reasonable of the other three models. Therefore, after analyzing the cross- calibration rules, and the feasibility of the DOC validation results of the different machine learning models, the calibrations was veriﬁed by naturalistic driving video data. heirarchical performance ranking of the four models on the test Compared with cautious and moderate drivers, aggressive set and training set was XGB MLP LR SVM. drivers adopted a smaller THW and DHW during Considering model cross-validation results and prediction naturalistic driving. THW, time-to-collision (TTC) and results, the overall heirarchical prediction performance ranking DHW, three well-established longitudinal driving of the four machine learning models under the current sample behavior parameters, showed highly signiﬁcant differences data set was XGB MLP LR SVM. in driving style identiﬁcation, while LA showed no 32 Naturalistic driving data Journal of Intelligent and Connected Vehicles Nengchao Lyu et al. Volume 5 · Number 1 · 2022 · 17–35 signiﬁcant difference in driving style identiﬁcation. At the vehicle data”, IEEE International Conference on Intelligent same time, speed and TTC showed no signiﬁcant Transportation Systems-ITSC, Washington, DC. difference between moderate and aggressive drivers. Bao, S., et al (2020), “An examination of teen drivers’ car- Cautious drivers undertook the largest proportion of following behavior under naturalistic driving conditions: FCC, while aggressive drivers primarily undertook FCC, with and without an advanced driving assistance system”, FSC and RAC, which indicated that cautious drivers Accident Analysis & Prevention, Vol. 147, p. 105762. preferred free cruising, but aggressive drivers tended to Basso, F., et al (2018), “Real-time crash prediction in an urban challenge complex driving conditions. expressway using disaggregated data”, Transportation Four MLC methods, namely, SVM, XGB, LR and MLP, Research Part C: Emerging Technologies, Vol. 86, pp. 202-219. were used to classify and predict driving style based on the Bellem, H., et al (2016), “Objective metrics of comfort: six DOCs. In consideration of the cross-validation results developing a driving style for highly automated vehicles”, and model prediction results, the overall hierarchical Transportation Research Part F: Trafﬁc Psychology and prediction performance ranking of the four machine Behaviour, Vol. 41, pp. 45-54. learning models under the current sample data set was Caﬁso, S., et al (2020), “Safety effectiveness and performance of XGB MLP LR SVM. lane support systems for driving assistance and automation– experimental test and logistic regression for rare events”, The contribution of this research is to propose a criterion and Accident Analysis & Prevention, Vol. 148 No. 105791. solution for using longitudinal driving behavior data to label Chai, M., et al (2019), “Drowsiness monitoring based on longitudinal DOCs and rapidly identify driving styles based on steering wheel status”, Transportation Research Part D: those DOCs and MLC models. This study provides a reference Transport and Environment, Vol. 66, pp. 95-103. for real-time online driving style identiﬁcation in vehicles Chen, T. and Boost, X.G. (2016), “A scalable tree boosting equipped with onboard data acquisition equipment, such as system”, Proceedings of the 22nd ACM SIGKDD International ADAS. Conference on Knowledge Discovery and Data Mining, However, there are still some directions to be further studied: pp. 785-794. Naturalistic driving data was heterogeneous due to Chow, S. (2007), Sample Size Calculations in Clinical Research, different road types; as a result, the threshold criterion for Taylor & Francis, London. the label of the DOCs based on driving data from different Clark, et al. (1993), “The use of neural networks and time road types may not be portable nor extensive. Therefore, series models for short term trafﬁc forecasting: a comparative the DOCs calibration criteria developed in this study may study”, PTRC 21st Summer Annu. Meet, pp. 151-162. not be fully applicable to driving style identiﬁcation on all Cortes, C. and Vapnik, V. (1995), “Support-vector networks”, types of road scenes. In addition, the problems of Machine Learning, Vol. 20 No. 3, pp. 273-297. endogeneity among various DOCs and the spatiotemporal Costa, Á., et al. (1997), “Evaluating public transport efﬁciency correlation also needs to be further studied. with neural network models”, Transportation Research Part C: The inﬂuence of lateral driving behavior was simpliﬁed in Emerging Technologies, Vol. 5 No. 5, pp. 301-312. this research, which may affect the training and test Costela, F.M., et al. (2020), “Risk prediction model using eye performance of the model. This research was an attempt movements during simulated driving with logistic regressions to quickly label driving style. The multi-dimensional data and neural networks”, Transportation Research Part F: Trafﬁc of the vehicle’s longitudinal and lateral driving behavior Psychology and Behaviour, Vol. 74, pp. 511-521. will be worth considering for modeling in future research. Das, S., et al. (2020), “Vehicle involvements in hydroplaning The amount of sample input in this study was insufﬁcient, crashes: applying interpretable machine learning”, which is reﬂected in the fact that the problem of overﬁtting Transportation Research Interdisciplinary Perspectives, Vol. 6 was common in the process of model training and testing No. 100176. and the generalization error was large. Future research will Dong, C., et al. (2018), “An innovative approach for trafﬁc carry out more naturalistic driving data collection to verify crash estimation and prediction on accommodating the model. At the same time, it is also necessary to carry unobserved heterogeneities”, Transportation Research Part B: out multi-scenario testing to study the applicability of the Methodological, Vol. 118, pp. 407-428. model under multiple scenarios. Elander, J., West, et al. (1993), “Behavioral correlates of individual differences in road-trafﬁc crash risk: an References examination of methods and ﬁndings”, Psychological Bulletin, Allahviranloo, M., et al. (2013), “Daily activity pattern Vol. 113 No. 2, pp. 279-294. Evans, L. (1996), “The dominant role of driver behavior in recognition by using support vector machines with multiple trafﬁc safety”, American Journal of Public Health,Vol. 86 classes”, Transportation Research Part B: Methodological, No. 6, pp. 784-786. Vol. 58, pp. 16-43. Farooq, M.U., et al. (2021), “A statistical analysis of the Ayoub, J., et al (2021), “Modeling dispositional and initial learned trust in automated vehicles with predictability and correlates of compliance and deﬁance of seatbelt use”, Transportation Research Part F: Trafﬁc Psychology and explainability”, Transportation Research Part F: Trafﬁc Psychology and Behaviour, Vol. 77, pp. 102-116. Behaviour, Vol. 77, pp. 117-128. Baer, T., et al (2011), “Probabilistic driving style Ge, J., et al. (2021), “A robust path tracking algorithm for determination by means of a situation-based analysis of the connected and automated vehicles under i-VICS”, 33 Naturalistic driving data Journal of Intelligent and Connected Vehicles Nengchao Lyu et al. Volume 5 · Number 1 · 2022 · 17–35 Transportation Research Interdisciplinary Perspectives, Vol. 9 Omrani, H. (2015), “Predicting travel mode of individuals by No. 100314. machine learning”, Transportation Research Procedia, Vol. 10, Ghasemzadeh, A., et al. (2018), “Utilizing naturalistic driving pp. 840-849. data for in-depth analysis of driver lane-keeping behavior in Orlovska, J., et al. (2020), “Effects of the driving context on the rain: non-parametric MARS and parametric logistic usage of automated driver assistance systems (ADAS) - regression modeling approaches”, Transportation Research Naturalistic driving study for ADAS evaluation”, Part C: Emerging Technologies, Vol. 90, pp. 379-392. Transportation Research Interdisciplinary Perspectives, Vol. 4, Guo, F. and Fang, Y. (2013), “Individual driver risk pp. 100093. assessment using naturalistic driving data”, Accident Analysis Plöchl, M., et al. (2007), “Driver models in automobile & Prevention, Vol. 61, pp. 3-9. dynamics application”, Vehicle System Dynamics, Vol. 45 Ishibashi, M., et al. (2007), “Indices for characterizing driving Nos 7/8, pp. 699-741. style and their relevance to car following behavior”, 46th Qi, G., et al. (2019), “Recognizing driving styles based on topic SICE Annual Conference, Kagawa University (Japan). models”, Transportation Research Part D: Transport and Itkonen, T.H., et al. (2020), “Characterisation of motorway Environment, Vol. 66, pp. 13-22. driving style using naturalistic driving data”, Transportation Reiser, C., et al. (2008), “Kundenfahrverhalten im fokus der Research Part F: Trafﬁc Psychology and Behaviour,Vol.69, fahrzeugentwicklung”, Atz - Automobiltechnische Zeitschrift, pp. 72-79. Vol. 110 No. 7-8, pp. 684-692. Jasper, S.W., et al. (2018), “Identifying behavioural change Rezaei, M., Saadati, M., et al. (2021), “Gender differences in among drivers using long Short-Term memory recurrent the use of ADAS technologies: a systematic review”, neural networks”, Transportation Research Part F: Trafﬁc Transportation Research Part F: Trafﬁc Psychology and Psychology and Behaviour, Vol. 53, pp. 34-49. Behaviour, Vol. 78, pp. 1-15. Kondoh, T., et al. (2008), “Identiﬁcation of visual cues and Sagberg, F., et al. (2015), “A review of research on driving quantiﬁcation of drivers’ perception of proximity risk to the styles and road safety”, Hum. Factors, Vol. 57 No. 7, lead vehicle in car-following situations”, Journal of pp. 1248-1275. Mechanical Systems for Transportation and Logistics, Vol. 1 Siddique, C., et al. (2019), “State-dependent self-adaptive No. 2, pp. 170-180. sampling (SAS) method for vehicle trajectory data”, Kusano, K.D., et al. (2015), “Population distributions of time Transportation Research Part C: Emerging Technologies, to collision at brake application during car following from Vol. 100, pp. 224-237. naturalistic driving data”, Journal of Safety Research, Vol. 54, Simons-Morton, B.G., et al. (2015), “Naturalistic teenage pp. 95-104. driving study: ﬁndings and lessons learned”, Journal of Safety Lajunen, T. and Özkan, T. (2011), “Self-report instruments Research, Vol. 54, pp. 41-48. and methods”, In: Porter, B.E. (Ed.), Handbook of Trafﬁc Sun, B., et al. (2017), “Route choice modeling with support Psychology, Elsevier, London, pp. 43-59. vector machine”, Transportation Research Procedia,Vol. 25, Li, G., et al. (2017), “Estimation of driving style in naturalistic pp. 1806-1814. Suzdaleva, E. and Nagy, I. (2018), “An online estimation of highway trafﬁc using maneuver transition probabilities”, Transportation Research Part C: Emerging Technologies, driving style using data-dependent pointer model”, Vol. 74, pp. 113-125. Transportation Research Part C: Emerging Technologies, Liu, K., et al. (2018), “Heterogeneity in the effectiveness of Vol. 86, pp. 23-36. cooperative crossing collision prevention systems”, Suzdaleva, E. and Nagy, I. (2018), “An online estimation of Transportation Research Part C: Emerging Technologies, driving style using data-dependent pointer model”, Vol. 87, pp. 1-10. Transportation Research Part C: Emerging Technologies, Lu, Q.L., et al. (2021), “Exploring the inﬂuence of automated Vol. 86, pp. 23-36. driving styles on network efﬁciency”, Transportation Research Tang, J., et al. (2020), “Statistical and machine-learning Procedia, Vol. 52 No. 9, pp. 380-387. methods for clearance time prediction of road incidents: a Mahmoud, N., et al. (2021), “Predicting cycle-level trafﬁc methodology review”, Analytic Methods in Accident Research, movements at signalized intersections using machine Vol. 27 No. 100123. learning models”, Transportation Research Part C: Emerging Taubman-Ben-Ari, O., et al. (2004), “The multidimensional Technologies, Vol. 124 No. 102930. driving style inventory – scale construct and validation”, Mensing, F., et al. (2014), “Eco-driving: an economic or Accident Analysis & Prevention, Vol. 36 No. 3, pp. 323-332. ecologic driving style?”, Transportation Research Part C: Taubman-Ben-Ari, O., et al. (2016), “The multidimensional Emerging Technologies, Vol. 38, pp. 110-121. driving style inventory a decade later: review of the literature Mohammadi, R., et al. (2019), “Exploring the impact of foot- and re-evaluation of the scale”, Accident; Analysis and by-foot track geometry on the occurrence of rail defects”, Prevention, Vol. 93, pp. 179-188. Transportation Research Part C: Emerging Technologies, Toledo, T., et al. (2008), “In-vehicle data recorders for Vol. 102, pp. 153-172. monitoring and feedback on drivers’ behavior”, Müller, T., Hajek, H., Radic-Weissenfeld, L. and Bengler, K. Transportation Research Part C: Emerging Technologies, Vol. 16 (2013), “Can you feel the difference? The just noticeable No. 3, pp. 320-331. difference of longitudinal acceleration”, Proceedings of the Toledo, T., et al. (2007), “Integrated driving behavior Human Factors and Ergonomics Society Annual Meeting, modeling”, Transportation Research Part C: Emerging Vol. 57 No. 1, pp. 1219-1223. Technologies, Vol. 15 No. 2, pp. 96-112. 34 Naturalistic driving data Journal of Intelligent and Connected Vehicles Nengchao Lyu et al. Volume 5 · Number 1 · 2022 · 17–35 Transportation Research Board, The Highway Capacity Xiong, H., et al. (2012), “Use patterns among early adopters of Manual (2010), Transportation Research Board, The adaptive cruise control”, Human Factors: The Journal of the Highway Capacity Manual, Transportation Research Board Human Factors and Ergonomics Society, Vol. 54 No. 5, of the National Academies, Washington, DC. pp. 722-733. Van Huysduynen et al (2018), “The relation between self- Xu, L., et al. (2015), “Establishing Style-Oriented driver reported driving style and driving behaviour: a simulator models by imitating human driving behaviors”, IEEE study”, Transportation Research Part F: Trafﬁc Psychology & Transactions on Intelligent Transportation Systems,Vol. 16 Behaviour, Vol. 56, pp. 245-255. No. 5, pp. 2522-2530. van Huysduynen, H.H., Terken, J. and Eggen, B. (2018), Zargarnezhad, S., et al. (2019), “Predicting vehicle fuel “The relation between self-reported driving style and consumption in energy distribution companies using driving behaviour. A simulator study”, Transportation ANNs”, Transportation Research Part D: Transport and Research Part F: Trafﬁc Psychology and Behaviour, Vol. 56, Environment, Vol. 74, pp. 174-188. pp. 245-255. Zhang, Z., et al. (2018), “A deep learning approach for Venkata, R.D., et al. (2020), “Variable categories inﬂuencing detecting trafﬁc accidents from social media data”, single-vehicle run-off-road crashes and their severity”, Transportation Research Part C: Emerging Technologies, Transportation Engineering, Vol. 2 No. 100038. Vol. 86, pp. 580-596. Wang, E., et al. (2017), “Modeling the various merging Zhao, X., et al. (2020), “ Evaluation of the effect of RPMs in behaviors at expressway on-Ramp bottlenecks using support extra-long tunnels based on driving behavior and visual vector machine models”, Transportation Research Procedia, characteristics”, China Journal of Highway and Transport, Vol. 25, pp. 1327-1341. Vol. 33, pp. 29-41. Wang, J., et al. (2015), “Driving risk assessment using near- crash database through data mining of tree-based model”, Corresponding author Accident Analysis & Prevention, Vol. 84, pp. 54-64. Chaozhong Wu can be contacted at: email@example.com For instructions on how to order reprints of this article, please visit our website: www.emeraldgrouppublishing.com/licensing/reprints.htm Or contact us for further details: firstname.lastname@example.org
Journal of Intelligent and Connected Vehicles – Emerald Publishing
Published: Feb 17, 2022
Keywords: Machine learning; Advanced driver assistant systems; Driver behaviors and assistance; Sensor data processing
Access the full text.
Sign up today, get DeepDyve free for 14 days.