Analyzing Spatial Community Pattern of Network Traffic Flow and Its Variations across Time Based on Taxi GPS Trajectories
Analyzing Spatial Community Pattern of Network Traffic Flow and Its Variations across Time Based...
Yu, Wenhao;Guan, Menglin;Chen, Zhanlong
2019-05-18 00:00:00
applied sciences Article Analyzing Spatial Community Pattern of Network Trac Flow and Its Variations across Time Based on Taxi GPS Trajectories 1 , 2 1 , 1 Wenhao Yu , Menglin Guan * and Zhanlong Chen School of Geography and Information Engineering, China University of Geosciences, Wuhan 430074, China; yuwh@cug.edu.cn (W.Y.); chenzl@cug.edu.cn (Z.C.) State Key Laboratory of Resources and Environmental Information System, Beijing 100101, China * Correspondence: gmlcug@gmail.com; Tel.: +86-27-67883728 Received: 10 April 2019; Accepted: 14 May 2019; Published: 18 May 2019 Abstract: The transport system is a critical component of the urban environment in terms of its connectivity, aggregation, and dynamic functions. The transport system can be considered a complex system due to the massive trac flows generated by the spatial interactions between land uses. Benefiting from the recent development of location-aware sensing technologies, large volumes of trac flow data (e.g., taxi trajectory data) have been increasingly collected in spatial databases, which provides new opportunities to interpret transport systems in cities. This paper aims to analyze network trac flow from the perspective of the properties of spatial connectivity, spatial aggregation, and spatial dynamics. To this end, we propose a three level framework to mine intra-city vehicle trajectory data. More specifically, the first step was to construct the network trac flow, with nodes and edges representing the partitioned regions and associated trac flows, respectively. We then detected community structures of network trac flow based on their structural and trac volume properties. Finally, we analyzed the variations of those communities across time for the dynamic transport system. Through experiments in Beijing city, we found that the method is eective in interpreting the mechanisms of urban space, and can provide references for administrative divisions. Keywords: trac flows; taxi trajectory; float cars; spatial community; transport system 1. Introduction Transport systems are of great importance to urban environments for their connectivity, aggregation, and dynamic functions. Land uses are connected by the transport network to improve the accessibility of human activities. Considering the spatial heterogeneity of trac flows, multiple land uses are also attracted by each other, showing an agglomeration (or aggregation) pattern in the space. In addition, such characteristics of a transport system depend largely on the temporal dimension. Therefore, mining the connectivity, aggregation, and dynamic patterns of transport flows can be helpful for revealing trac structures and the associated mechanisms of socioeconomic phenomena, e.g., logistics, neighborhood, living habitats, and urban function zones [1–3]. In reality, the regionalization of urban areas is often non-adjacent. For example, working areas and residential areas belong to the same group in terms of their functions, while in the physical space, they are often distant from each other. Therefore, only using geometric indicators such as geographic distance to measure the connectivity of land uses of interest is limited, and the potential solution could come from the function space of transport. Instead of the static condition of geometric space, a transport system implies the real interactions between land uses across space and time. For example, in the morning, the interaction between residential areas and working areas is much intense, while at lunch Appl. Sci. 2019, 9, 2054; doi:10.3390/app9102054 www.mdpi.com/journal/applsci Appl. Sci. 2019, 9, 2054 2 of 21 time, the interaction between working areas and catering service areas is more intense. In addition, the interactions of land uses in terms of trac flows can reveal the trac conditions on dierent routes in the urban space. It is believed that dierent routes serve dierent roles in the daily transportation. For example, the routes connecting residential areas and working areas are likely to be chosen by commuters in the morning. Besides the main roads, some minor roads, for example detours, could also be favored by residents. However, how to extract the functional associations between land uses remains a challenging task. Considering the importance of transport systems, many studies have been done to discover the hidden regularities in urban transportation. However, due to limited data sources, it is rather dicult to identify the changes of trac flows across space and time [1,4]. In addition, China’s transport infrastructures are developing very fast, and the associated transport systems are becoming more complex and dynamic. It is necessary to study the transport system in a more eective and timely way. Benefiting from the recent development of location-aware sensing technologies, there is an unprecedented opportunity for us to obtain big trac data with agent trajectory information [5,6]. For example, nowadays, most of vehicles are equipped with a global positioning system (GPS), which records the locations and other semantic properties (e.g., speed and direction) of agents. Analyzing these data within the context of a transport system could provide a detailed view of the trac flow, and then reveal typical spatial interaction patterns in the urban space, e.g., popular routes for driving, associations among land uses, and the spatial structures of urban space. For example, Ahas et al. [7] used mobile phone positioning data to explore the movement patterns of suburban commuters in Tallinn, Estonia. They found that there is a remarkable temporal rhythm to respondents’ locations. Based on mobile phone data, Sevtsuk et al. [8] also discovered that there is significant temporal regularity in human mobility. Other data sources can be also used to analyze trac flows, e.g., location data of buses [9], smart card transaction data of subways [10], and taxi trajectory data [11,12]. Compared to other modes of transport, the taxi trajectory has no limitations of a fixed line, and thus is more flexibly able to reflect real trac flows in an urban environment. There are also many relevant studies analyzing taxi trajectory data under dierent application contexts. For example, Zheng et al. [13] analyzed taxi trajectory data to construct interaction relationships between local regions, and then applied the result to assist in city planning. Guo et al. [14] and Yuan et al. [15] tried to extract the operation status of a trac system from taxi trajectory data. To identify the city structure, Zhou et al. [16] proposed a field-based data clustering analysis method to detect the changing patterns of constant hotspot areas and inconstant hotspot areas. Since the pick-up and drop-o points of taxi trajectories often imply facilities of interest, Yue et al. [17] used taxi trajectory data to discover the attractive areas that people often visit, e.g., hot shopping and leisure land uses or living and working areas. Recently, Liu et al. [18] proposed an approach to identify trac congestion regions and their spatiotemporal distributions from taxi trajectory data. Liu et al. [19] viewed the trips of taxis as a displacement in the random walk model, and found that the distribution of directions of taxi trajectories in Shanghai shows a characteristic northeast east–southwest west dominant direction. In addition, they implemented the Monte Carlo simulation and found that geographical heterogeneity leads to a faster observed decay of trips, and the distance decay eect makes the spatial distribution of trips more concentrated in the urban area. Liu et al. [20] proposed the use of spatially embedded networks and network analysis techniques to model intra-city spatial interactions. Zhou et al. [21] allocated Origin/Destination points to land use parcels for describing regional activities, and then combined a series of relevant indicators to explore the land use patterns of Wuhan city. Pan et al. [22] analyzed the characteristics of the pick-up/drop-o points extracted from taxi GPS trajectories, and, based on these features of pick-up/drop-o points, extracted the regular patterns that correspond to the land-use classes within dierent regions in Hangzhou city. In addition to these aspects, taxi trajectory data can be also used for human mobility pattern mining [5,23,24] and environmental pollution analysis [25,26]. Appl. Sci. 2019, 9, 2054 3 of 21 Most previous studies have focused on the connectivity and aggregation of land uses, and few studies have been conducted to comprehensively understand the spatio-temporal community structures of a transport system. With continuous transportation development, there is an increasingly urgent demand for analysis of the spatial structures of transport flows and their variations across time. In general, a trac flow can be considered a link that connects two land uses, and thus, with multiple flows of this type, a transport system can be modeled as a weighted graph. In this regard, we can introduce community detection and graph matching methods to explore the complex organization of land uses in a transport flow system [27,28]. In this study, we propose a three-levels framework to mine intra-city vehicle trajectory data and detect the spatio-temporal relationships between land uses in the trac flow system. Therefore, the main contributions of this study are the following: Modeling the connectivity structure of trac flows. We first constructed a spatially embedded network to model the connectivity of land uses, in which the node represents the partitioned region, the edge represents the linkage between adjacent nodes, and the weight of the edge depends on the volume of trac flows between the corresponding nodes. Extracting the aggregation patterns of land uses (i.e., nodes). Based on the network trac flow, we then employed a community detection technique (i.e., K-Medoids clustering) to classify all the nodes. Instead of the simple geographic distance, our community detection method takes into account the real trac volume and graph structure properties. In this way, the land uses that have a strong relationship could be aggregated in the same group. Analyzing the dynamic patterns of transportation communities. Since the transport system is a highly dynamic system, we propose a graph matching method to detect the change of network trac flows across time. In this way, we can not only identify the structure of trac flows across space, but also its variation across time. The rest of this paper is organized as follows. Section 2 introduces the related definitions, and how to construct the network trac flows and to generate the communities. To explore the variation of communities across time, this section then introduces an indicator to measure the similarity of two communities. Section 3 describes our extensive experiments based on the taxi trajectory data in Beijing, showing the potential of the proposed approach for transport system analysis and urban applications. Finally, Section 4 concludes the paper. 2. Community Detection across Space and Its Variation across Time Our method improves the traditional K-Medoids method based on trac flow volume and network structure properties. Firstly, we partition the study region into equally sized square cells, and then model each cell as a node and the connectivity between each pair of cells as a link. Based on the network trac flow, we propose that the similarity of nodes can be calculated based on the attraction degree and structure similarity. In this way, community clustering can be implemented. Finally, we use the graph matching technique to calculate the similarity between the community structures within dierent time periods. Therefore, our method can be considered a spatio-temporal analysis tool. 2.1. Network Construction and Its Variation with Dierent Cell Sizes 2.1.1. Network Construction Due to signal loss or degradation, taxi trajectories are usually recorded with spatial uncertainty. Even if a set of trajectory flows comes from (or drives to) the same regions, the recorded trajectory points are unlikely to share the same coordinates. Therefore, in order to extract the collective regularities from these massive trajectory points, our method proposes that the study region be partitioned into equally sized square cells, each of which represents a place in the urban space. In this way, each trajectory point could be assigned to its nearest cell, and a trac flow consisting of multiple trajectory points could Appl. Sci. 2019, 9, 2054 4 of 21 be represented as a set of partitioned cells. More specifically, by specifying the cell size k, the whole Appl. Sci. 2019, 9, x FOR PEER REVIEW 4 of 21 region would be transformed into a grid of size k k. There are also other ways to partition the region, e.g., trac analysis zones (TAZs). However, TAZs are constrained to a fixed scale, and our method specifying the cell size k, the whole region would be transformed into a grid of size k × k. There are can be used for multi-scale spatial analysis with dierent cell sizes (please see Section 3). In general, also other ways to partition the region, e.g., traffic analysis zones (TAZs). However, TAZs are for fine scale applications, a small cell can be used, while for coarse scale applications, a large cell constrained to a fixed scale, and our method can be used for multi-scale spatial analysis with would be used. With the tessellation of space, we can then define the nodes and edges of network different cell sizes (please see Section 3). In general, for fine scale applications, a small cell can be trac flow as follows. used, while for coarse scale applications, a large cell would be used. With the tessellation of space, we can then define the nodes and edges of network traffic flow as follows. Definition 1. (Node): Assuming the range of Euclidean coordinates of a cell C is {[x , x ], [y , y ]}, max max min min this cell can be modeled as a node only when there is at least one trajectory point (with coordinate (x, y)) falling Definition 1. (Node): Assuming the range of Euclidean coordinates of a cell C is {[𝑥 , 𝑥 ], [ 𝑦 , into the cell. Specifically, this constraint can be formalized as follows. 𝑦 ]}, this cell can be modeled as a node only when there is at least one trajectory point (with coordinate (x, y)) falling into the cell. Specifically, this constraint can be formalized as follows. x < x x min max (1) 𝑥 <𝑥 ≤𝑥 y < y y max min (1) 𝑦 <𝑦 ≤𝑦 Definition 2. (Edge): Assuming there are n trajectory flows between nodes C1 and C2, the connectivity Definition 2. (Edge): Assuming there are n trajectory flows between nodes C and C , the connectivity between 1 2 between C1 and C2 can be modeled as an edge with weight n. As presented in Figure 1, the weighted edge is used C and C can be modeled as an edge with weight n. As presented in Figure 1, the weighted edge is used here to 1 2 here to represent the flow transitions between cells (i.e., sub-regions). represent the flow transitions between cells (i.e., sub-regions). (b) (a) Figure 1. Example of the trac flow and network: (a) taxi trajectory data, (b) flow network with edge Figure 1. Example of the traffic flow and network: (a) taxi trajectory data, (b) flow network with edge weight equal to corresponding probabilities of movement between sub-regions. weight equal to corresponding probabilities of movement between sub-regions. The movement of vehicles implies the complex interaction between land uses, and connects The movement of vehicles implies the complex interaction between land uses, and connects distant regions into an integrated system. Since the basic characteristic of this system is connectivity, distant regions into an integrated system. Since the basic characteristic of this system is connectivity, we propose the construction of a spatially embedded network consisting of nodes (Definition 1) and we propose the construction of a spatially embedded network consisting of nodes (Definition 1) and edges (Definition 2) to represent traffic flow. Based on such a network, we can then employ graph edges (Definition 2) to represent trac flow. Based on such a network, we can then employ graph analysis techniques (see Sections 2.2 and 2.3) to discover the hidden regularities in a transport analysis techniques (see Sections 2.2 and 2.3) to discover the hidden regularities in a transport system. system. 2.1.2. Variation of Network with Dierent Cell Sizes 2.1.2. Variation of Network with Different Cell Sizes Dividing the whole study area with dierent cell sizes would make the distribution patterns of Dividing the whole study area with different cell sizes would make the distribution patterns of trajectory points dierent, and would also lead to dierent results in the detection of communities. trajectory points different, and would also lead to different results in the detection of communities. Figure 2 shows the eect of dierent cell sizes on the construction of network flows. Although Figure 2a,c Figure 2 shows the effect of different cell sizes on the construction of network flows. Although Figure 2a,c have the same trajectory flow, they are divided by different cell sizes. As a result, the constructed networks have different granularities (Figure 2b, c). Therefore, with different cell sizes, we can observe the variation of the network from different scales of space. Appl. Sci. 2019, 9, 2054 5 of 21 have Appl. Sci. the 2019 same , 9, x FO trajectory R PEER RE flow VIEW , they are divided by dierent cell sizes. As a result, the constructed 5 of 21 networks have dierent granularities (Figure 2b,c). Therefore, with dierent cell sizes, we can observe the variation of the network from dierent scales of space. (a) (b) (c) (d) Figure 2. Variation of network with dierent cell sizes. (a,c) represent the variation of tessellation using Figure 2. Variation of network with different cell sizes. (a) and (c) represent the variation of dierent cell sizes, and (b,d) are the corresponding networks. tessellation using different cell sizes, and (b) and (d) are the corresponding networks. 2.2. Community Detection across Space 2.2. Community Detection Across Space 2.2.1. Similarity of Nodes 2.2.1. Similarity of Nodes Besides the connectivity property, a transport system has a spatial heterogeneity in urban space. In other Besid wor es the c ds, some onnectivity p land usesroperty, are mora transport sy e attractive tosteach em has a sp other in atial heterogeneity terms of transport in ation, urban and space. in In other words, some land uses are more attractive to each other in terms of transportation, and in the function space they form aggregation patterns, i.e., community. Such community could imply athe f popular unction spa route at ce they f a specific orm a time, ggrega or an tion pa agglomeration tterns, i.e., commun of livingity ar. S eas uch and comm work unity co areas. uld imply Generally,a popular route at a specific time, or an agglomeration of living areas and work areas. Generally, the the more intense the trac flow interaction between land uses is, the higher probability that the land uses more i have ntense the traf to be grouped fic ftogether low intera . Since ction between la the transport nd uses is, th system is modeled e higher prob as a spatially ability th embedded at the land uses have to be grouped together. Since the transport system is modeled as a spatially embedded network, we can then use the community detection method to extract the clustering patterns of land uses. It should be noted that, compared to the classic graph measures, the concept of community in Appl. Sci. 2019, 9, 2054 6 of 21 network, we can then use the community detection method to extract the clustering patterns of land uses. It should be noted that, compared to the classic graph measures, the concept of community in Appl. Sci. 2019, 9, x FOR PEER REVIEW 6 of 21 our context has its own characteristics. Specifically, community detection in a network trac flow should not only take into account the graph structure factor, but should also consider the trac flow our context has its own characteristics. Specifically, community detection in a network traffic flow volumes among dierent regions. Therefore, we propose that the measures of attraction degree and should not only take into account the graph structure factor, but should also consider the traffic flow structure similarity be integrated to group the nodes of network trac flow. volumes among different regions. Therefore, we propose that the measures of attraction degree and Before defining the attraction degree, we first introduce the related concept, as follows. structure similarity be integrated to group the nodes of network traffic flow. Before defining the attraction degree, we first introduce the related concept, as follows. Definition 3. (Attraction factor): In a network trac flow, the trac volume characteristics between a pair of directly connected nodes reveal their closeness relationship, and we term this connection the attracting Definition 3. (Attraction factor): In a network traffic flow, the traffic volume characteristics between a pair of factor. For the directly connected nodes N and N , their degrees d and d are defined as the number of 1 2 N N 1 2 directly connected nodes reveal their closeness relationship, and we term this connection the attracting factor. their connected edges, respectively. Assuming the edge between N and N is E with associated weight 1 2 (N ,N ) 1 2 For the directly connected nodes 𝑁 and 𝑁 , their degrees 𝑑 and 𝑑 are defined as the number of their W , the attracting factor f between N and N is calculated as follows: (N ,N ) (N ,N ) 1 2 1 2 1 2 connected edges, respectively. Assuming the edge between 𝑁 and 𝑁 is 𝐸 with associated weight ( , ) 𝑊 , the attracting factor 𝑓 between 𝑁 and 𝑁 is calculated as follows: ( , ) ( , ) f = ln (1 + W ) (2) (N ,N ) (N ,N ) 1 2 1 2 Pd 1 j j=1 𝑓 =ln(1 + ∗𝑊 ) ( , ) ( , ) (2) In Figure 3, there are six nodes, including N , N , N , N , N and N , with their associated edges. 1 2 3 4 5 6 In Figure 3, there are six nodes, including 𝑁 , 𝑁 , 𝑁 , 𝑁 , 𝑁 and 𝑁 , with their associated Pd 1 3 In this community, d = 3, W = 1, W = 3 + 2 + 1 = 6, f = ln 1 + 1 = 0.4055, N (N ,N ) 1 j (N ,N ) 1 6 edges. In this community, 𝑑 1 =3 5 , 𝑊 j=1 = 1, 𝑊 =3 + 2 1 + 5 1 =6 , 𝑓 = ln(1 + ∗ ( , ) ( , ) 5 3 d = 3, W = 1, W = 3 + 1 + 1 = 5, f = ln 1 + 1 = 0.4700. N 1) = 0.4055 (N, ,N 𝑑 ) =3, 𝑊 1 j =1, 𝑊 =3 (N+,N1 + ) 1 = 5, 𝑓 = ln(1 + ∗ 1) = 0.4700. 5 5 5 1 j= ( 1 , ) 5 1 ( , ) Figure 3. A sample of a community. Figure 3. A sample of a community Definition 4. (Attraction degree): Assuming nodes N and N are directly connected, their attracting degree 1 2 Definition 4. (Attraction degree): Assuming nodes 𝑁 and 𝑁 are directly connected, their attracting degree can be measured as following. can be measured as following. f + f (N ,N ) (N ,N ) 2 2 1 1 Attr = (3) (N ,N ) 1 2 𝑓 + 𝑓 ( , ) ( , ) (3) 𝐴 𝑡𝑡𝑟 = ( , ) f + f (N ,N ) (N ,N ) ( , ) ( , ) 1 5 5 1 InIn Fi Figur gure e 3,3,the the a attraction ttraction degree degree between between N N and 1 and N N is5 is Attr 𝐴𝑡𝑡𝑟 = = == 1 5 (N ,N() , ) 5 2 0.4055+0.4700 .. = 0.43775. = 0.43775. Definitions 3 and 4 model the force of attraction between any two directly connected nodes Definitions 3 and 4 model the force of attraction between any two directly connected nodes (i.e., (i.e., cells). However, in reality, a node is attracted not only by its directly connected nodes, but also cells). However, in reality, a node is attracted not only by its directly connected nodes, but also by its by its indirectly connected nodes. Therefore, we propose the extension of Definition 4 to take into indirectly connected nodes. Therefore, we propose the extension of Definition 4 to take into account the attraction degrees from both the directly connected nodes and the indirectly connected nodes. Assuming nodes Ni and Nj are indirectly connected by the path PT (Ni, Ni+1, Ni+2, …, Nj), their attractive degree can be calculated as the product of the attractive degrees of all the pairs of directly connected nodes in PT, i.e., 𝑓 (𝑖 ≤ 𝑘 < 𝑗) . The detail is as follows: ( , ) Appl. Sci. 2019, 9, 2054 7 of 21 account the attraction degrees from both the directly connected nodes and the indirectly connected nodes. Assuming nodes N and N are indirectly connected by the path PT (N , N , N , ::: , N ), i j i i+ 1 i+ 2 j their attractive degree can be calculated as the product of the attractive degrees of all the pairs of directly connected nodes in PT, i.e., f (i k < j). The detail is as follows: (N ,N ) k k+1 k= j 1 Attr = f (4) (N ,N ) (N ,N ) i j k k+1 k=i It should be noted that there might be multiple paths between nodes N and N , and thus there i j could be more than one attraction degree value for our analysis. In this regard, we choose the largest-weight path between N and N to calculate the attraction degree. In general, the nodes directly i j or indirectly connected by a larger-weight path have a stronger mutual relationship. Compared to the indirectly connected nodes, the directly connected nodes have a higher probability to be attracted by each other. In Figure 3, there are three paths from N to N , including path 1 (PT1: N !N !N ), path 2 (PT2: 1 6 1 2 6 N !N !N ), and path 3 (PT3: N !N !N !N ). The weight of PT1 is W = 2 + 1 = 3, the weight 1 5 6 1 2 5 6 PT1 of PT2 is W = 1 + 3 = 4, and the weight of PT3 is W = 2 + 1 + 3 = 6. The largest-weight path, PT2 PT3 between N and N , is PT3. Thus, according to Equation (2) and Equation (3), Attr = 0.8047, 1 6 (N ,N ) 1 2 Attr = 0.5148, Attr = 0.9730, and thus Attr = 0.8047 0.5148 0.9730 = 0.4031 (N ,N ) (N ,N ) (N ,N ) 2 5 5 6 1 6 (Equation (4)). Besides the strength of connectivity between nodes, the local structure of a graph is also critical to cluster nodes. More specifically, for any pair of nodes connected by a path, the greater proportion their path weight has in the total weight of their neighbors, the more similar the two nodes are. In a local structure, the nodes with a relatively stronger linkage tend to be grouped together. As a comparison, two nodes with a large connection could also be separated into dierent groups if one of them were to have another, stronger linkage to other nodes. To this end, we introduce structure similarity into our method, as follows. Our structure similarity indicator is inspired by the Jaccard similarity coecient, which has been widely applied to describe the relevance among objects. Assuming X and Y are two sets, the Jaccard similarity coecient is defined as follows: jX\ Yj sim = (5) (X,Y) jX[ Yj In addition, in graph theory, it is believed that the critical structural factors of a graph are the links that have relatively larger weight [29]. In this regard, we define structure similarity based on local edges and associated weights, as presented in Definition 5. Definition 5. (Structure similarity): For two directly connected nodes N and N , their structure similarity is 1 2 as follows: (N ,N ) 1 2 sim = (6) (N ,N ) 1 P P d d N N 1 2 W + W W (N ,N ) (N ,N ) (N ,N ) 2 2c 2 c=1 1 1c c=1 1 where W is the weight of the edge connecting node N and its neighbor N , and W is the weight (N ,N ) 1 1c (N ,N ) 1 1c 2 2c of the edge connecting node N and its neighbor N . Equation (6) can be only used to measure the structure 2 2c similarity of directly connected nodes, and in order to analyze the relationships between indirectly connected nodes, we extend Equation (6), as follows: k= j 1 sim = sim (7) (N ,N ) (N ,N ) i j k k+1 k=i Appl. Sci. 2019, 9, 2054 8 of 21 where N and N are two directly connected nodes on the path connecting N and N . k k+1 i j In addition, we choose the largest-weight path between N and N to calculate the structure i j similarity. Generally, the directly connected nodes have a larger structure similarity than the indirectly connected nodes do. Pd Pd N N 1 5 1 In Figure 3, W = 6, W = 5, W = 1, sim = = 0.1. 1 j 5 j (N ,N ) (N ,N ) 1 5 1 5 6+5 1 j=1 j=1 The largest-weight path between N and N is N !N !N !N , and, thus, sim = 0.2, 1 6 1 2 5 6 (N ,N ) 1 2 sim = 0.1111, sim = 0.2727. Therefore, according to Equation (7), sim = 0.2 (N ,N ) (N ,N ) (N ,N ) 2 5 5 6 1 6 0.1111 0.2727 = 0.0061. Finally, since both the attraction degree (Equation (4)) and the structure similarity (Equation (7)) have been normalized, we can integrate them into a single measure, as follows: f sim = sim + Attr (8) (N ,N ) (N ,N ) (N ,N ) i j i j i j 2.2.2. Algorithm Based on the integrated similarity measure, we then calculated the final distance for each pair of directly or indirectly connected nodes (i.e., cells) in the network trac flow. f dis = (9) (N ,N ) i j f sim (N ,N ) i j In the process of detecting community, the dissimilarity index for each pair of nodes is adopted, with which one can measure the extent of proximity between the nodes of a network and signify to what extent two nodes would ‘like’ to be in the same community [30]. This proximity reflects the connectivity property of nodes in a diusion process. The final minimization problem under this distance can also be solved by a k-means algorithm [31]. For our community detection algorithm, we adopted the K-Medoids algorithm, which belongs to the family of k-means clustering. More specifically, we first calculate the distances between all the pairs of nodes, and then select k nodes (i.e., initial k medoids) which have the largest distance to each other. Secondly, we assign each node (except the nodes that have already been labeled) to its nearest cluster according to the distances measured on the network trac flow. This process is iteratively conducted until the medoids do not change or the number of iterations is equal to the threshold. In addition, in the end of each iteration, the node that has the minimum sum of distances within the cluster is selected as the medoid. Our community detection algorithm is as follows (Algorithm 1): Appl. Sci. 2019, 9, 2054 9 of 21 Algorithm 1. Community Clustering Input: A spatially embedded network consisting of nodes and edges with weight, the number of communities k, the maximum number of iterations MaxI. Output: A set of communities: C= {C , C , ::: , C } 1 2 k 1. Initialization: f dis = 0, iteration=0, ClusterCentriod [] C=null; (N ,N ) i j 2. Node distance calculating: //calculate the distance between each pair of nodes. for each pair of nodes N and N (i , j) i j f sim = sim + Attr ; (N ,N ) (N ,N ) (N ,N ) i j i j i j f dis = ; (N ,N ) i j f sim (N ,N ) i j end for for each pair of nodes N and N (i = j) i j f dis = 1; (N ,N ) i j end for 3. Community detection based on the K-Medoids framework: Select k nodes that have the largest distance to each other as initial k medoids, i.e., {C , C , ::: , C }; 1 2 k Assign each node to the closest medoid; While (the medoids do not change or iterationMaxI) for each node N Assign N to the closest medoid C with min { f dis }; i (N ,C ) i m end for for each cluster C (jk) Update the medoid of each community by detecting the node that has the minimum sum of distances within the cluster; end for iteration++; end while Return the structure consisting of k communities: C= {C , C , ::: , C }. 1 2 k 2.3. Variation of Community across Time Besides the spatial heterogeneity, a transport system also has the dynamic property, and, in order to analyze such variation of a transport system across time, we propose a graph structure matching measurement (GSMM) between two network trac flows sharing the node set. Specifically, as presented in previous sections, a network trac flow in a specific time slice can be divided into several communities. In other words, the variation of a transport system across time can be represented as the change of the corresponding community structures. Hence, the GSMM measures the degree of matching between two community structures, i.e., two node sets. Definition 6. (Similarity of two node sets): Let S and S be two node sets, the similarity between S and S is 1 2 1 2 defined as follows: 2jS \ S j 1 2 Ssim = (10) (S ,S ) 1 2 jS j+jS j 1 2 where |S| is the number of the nodes of set S and jS \ S j is the number of the nodes that S and S share. 1 2 1 2 For example, if S = S , Ssim = 1; if S \ S = , Ssim = 0. 1 2 1 2 (S ,S ) (S ,S ) 1 2 1 2 Equation (10) only measures the similarity between two node sets (i.e., two communities), each of which plays a dierent role in the corresponding graph structure. Specifically, for a graph, some communities are more important than others, and, in order to measure the global similarity between two graphs (i.e., two sets of communities), we propose calculation of the sum of the weighted similarity between two sets of communities. In this process, we define the weight of a community as its contribution rate in the corresponding graph. In general, the more nodes the community has, the larger Appl. Sci. 2019, 9, 2054 10 of 21 contribution rate it has for the whole system. Assuming the graph C consists of the communities {C , C , ::: , C } and S , S , ::: , S are the node sets of C , C , ::: , C , respectively, the contribution rate of 2 k 1 2 k 1 2 k C (i = 1, 2, ::: , k) is defined as follows: comR = (11) Based on the contribution rate of community, we then calculate the weighted similarity between two network trac flows as follows. Definition 7. (Similarity of spatially embedded networks): Let C and C be two spatially embedded networks p q p p p q q q p p with k-size community structures {C , C , ::: , C } and {C , C , ::: , C }, respectively. comR , comR , ::: , 1 2 k 1 2 k 1 2 p p p p q q comR are the contribution rates of the communities C , C , ::: , C , respectively, and comR , comR , ::: , 2 2 k 1 k 1 q q q q p p p comR are the contribution rates of the communities C , C , ::: , C , respectively. S , S , ::: , S are the node k 1 2 k 1 2 k p p p q q q q sets of the communities C , C , ::: , C , respectively, and S , S , ::: , S are the node sets of the communities C , 1 2 k 1 2 k 1 q q C , ::: , C , respectively. Then, the similarity between the two graphs C and C is defined as: p q 2 PsimC (i ,i :::i ) 1 2 k FsimC = max (12) i ,i :::i p q 1 2 k comR + comR j=1 j j where PsimC is calculated as follows: (i ,i ,:::,i ) 1 2 k p q p q 1 1 PsimC = comR + comR Ssim p q + comR + comR Ssim p q (i ,i :::i ) 1 2 k 2 1 i1 (S ,S ) 2 2 i2 (S ,S ) 1 i1 2 i2 (13) p q p q + ::: + comR + comR Ssim 2 (S ,S ) k ik k ik where (i , i ::: i ) is a full permutation of the set I = f1, 2::: kg. 1 2 k The community structure is a partition of all the land uses (i.e., nodes) of the network trac flow for a specific time slice. Hence, the variation of networks across time can be analyzed by measuring the similarity between the corresponding community structures. The better the matching between two community structures, the more similar the corresponding trac conditions in dierent time slices. 3. Data Sets and Settings We conducted a series of experiments to explore the transport system of Beijing city using taxi trajectory points. As the capital of China, Beijing is the national political, economic, and administrative center. By the end of 2016, the number of taxis in Beijing had reached 71,600, with a permanent population of 21.729 million. More than 55% of residents take taxis every week [32]. In this paper, 0 0 we chose the central zone of the city (i.e., the range of latitude 39 49 41”–39 59 17” N and longitude 0 0 116 15 47”–116 29 09” E) as the study region (Figure 4). The research area is within the third ring area of Beijing. The total number of valid records was more than 20,000,000, which covers a time period of 24 h (from 0 a.m. to 12 p.m.) (Figure 4). Because of signal loss or degradation, geospatial locations of trajectory may have been recorded with spatial and temporal uncertainties. Considering these factors, our data were preprocessed by the provider to indicate whether the record was valid or invalid with respect to the GPS signal. In our research, only the valid records were used. Nevertheless, there were some small deviations in position information relative to the actual position. In this respect, we adopted the technique of region tessellation, which is used to model collective behaviors between regions. The internal variation of the region is not considered. Hence, it can handle small deviations of location information and can be used to reveal the collective travel patterns between regions. In order to have a macroscopic understanding of people s travel patterns, we extracted the taxi pick-up and drop-o points (Figure 5). It can be observed that the hotspots of pick-up and drop-o points are distributed within the second ring roads of Beijing. Appl. Sci. 2019, 9, x FOR PEER REVIEW 11 of 21 Appl. Sci. 2019, 9, x FOR PEER REVIEW 11 of 21 observed that the hotspots of pick-up and drop-off points are distributed within the second ring roads of Beijing. observed that the hotspots of pick-up and drop-off points are distributed within the second ring Appl. Sci. 2019, 9, 2054 11 of 21 roads of Beijing. Figure 4. Study area: the core area of Beijing. Figure 4. Study area: the core area of Beijing. Figure 4. Study area: the core area of Beijing. (a) Passengers pick-up location (b) Passengers drop-off location Figure 5. Passengers pick-up/drop-o points. (a) Passengers pick-up location (b) Passengers drop-off location Figure 5. Passengers pick-up/drop-off points. 4. Experiment and Result Figure 5. Passengers pick-up/drop-off points. 4. Experiment and Result 4.1. Result and Analysis 4. Experiment and Result 4.1. Result and Analysis We first partitioned the study area into a set of cells with size 1 km 1 km. This scale was 4.1. Result and Analysis determined on the basis of relevant studies suggesting that the cell size is fine enough to depict urban structure [33]. We then used these cells to construct spatially embedded networks to analyze the Appl. Sci. 2019, 9, x FOR PEER REVIEW 12 of 21 We first partitioned the study area into a set of cells with size 1 km × 1 km. This scale was Appl. Sci. 2019, 9, 2054 12 of 21 determined on the basis of relevant studies suggesting that the cell size is fine enough to depict urban structure [33]. We then used these cells to construct spatially embedded networks to analyze the interactions between land uses (i.e., sub-regions). Moreover, we obtained the results in different interactions between land uses (i.e., sub-regions). Moreover, we obtained the results in dierent time time periods to discover the dynamic patterns of the transport system. periods to discover the dynamic patterns of the transport system. In the spatially embedded network, the nodes represent the regions of the city and the edges In the spatially embedded network, the nodes represent the regions of the city and the edges represent the traffic linkage between different regions. Furthermore, the intensity of the connections represent the trac linkage between dierent regions. Furthermore, the intensity of the connections between different regions varies with time. In order to clearly reveal the travel patterns, we between dierent regions varies with time. In order to clearly reveal the travel patterns, we constructed constructed networks for typical time periods (Figure 6), i.e., morning rush hour, noon rush hour, networks for typical time periods (Figure 6), i.e., morning rush hour, noon rush hour, evening rush evening rush hour, and midnight. hour, and midnight. (a) (b) (c) (d) Figure 6. Spatially embedded networks for trac flows in dierent time periods with cell size Figure 6. Spatially embedded networks for traffic flows in different time periods with cell size k = k = 1000 m: (a) morning rush hour; (b) noon rush hour; (c) evening rush hour; (d) midnight. 1000 m: (a) morning rush hour; (b) noon rush hour; (c) evening rush hour; (d) midnight. During dierent time periods of the day, the trac flows showed dierent spatial connectivity During different time periods of the day, the traffic flows showed different spatial connectivity patterns to meet the varying travel demands of people. In the morning rush hour, the interactions patterns to meet the varying travel demands of people. In the morning rush hour, the interactions between residential areas, working areas, and schools were more intense than those between the other between residential areas, working areas, and schools were more intense than those between the areas. Later, entering the period of the noon rush hour, the trac flow volume and the associated other areas. Later, entering the period of the noon rush hour, the traffic flow volume and the connectivity patterns became more significant in the central areas and main roads of the city. Most of associated connectivity patterns became more significant in the central areas and main roads of the the central areas belong to the working zone and commercial zone, and thus the strong connectivity city. Most of the central areas belong to the working zone and commercial zone, and thus the strong indicates the frequent interactions between working and lunch activities. In addition, although the connectivity indicates the frequent interactions between working and lunch activities. In addition, volume of trac flows in the evening rush hour was less than that in morning rush hour, their network although the volume of traffic flows in the evening rush hour was less than that in morning rush structures were similar. In Beijing, in order to avoid trac congestions, many people choose to get o work and go home or go to recreational areas after 19:00. This may be a reason that the trac Appl. Sci. 2019, 9, x FOR PEER REVIEW 13 of 21 hour, their network structures were similar. In Beijing, in order to avoid traffic congestions, many Appl. Sci. 2019, 9, 2054 13 of 21 people choose to get off work and go home or go to recreational areas after 19:00. This may be a reason that the traffic volume in evening rush hour was less than that in morning rush hour. Additionally, the routes that people choose to go to work and get off work are similar, and thus the volume in evening rush hour was less than that in morning rush hour. Additionally, the routes that networks in the morning rush hour and evening rush hour had a similar structure. Furthermore, the people choose to go to work and get o work are similar, and thus the networks in the morning rush networks at midnight had a multi-center structure, which depended on the hubs of recreational hour and evening rush hour had a similar structure. Furthermore, the networks at midnight had areas and commercial areas. In general, except the morning rush hour, the other time periods a multi-center structure, which depended on the hubs of recreational areas and commercial areas. depended more on the eastern areas than the western areas. In general, except the morning rush hour, the other time periods depended more on the eastern areas Among these hours, the noon rush hour had the largest volume of traffic flows. From the than the western areas. morning rush hour to the noon rush hour, the volume of trajectory flow increased substantially and Among these hours, the noon rush hour had the largest volume of trac flows. From the morning some new links emerged in local regions. This implies that the interactions between land uses in the rush hour to the noon rush hour, the volume of trajectory flow increased substantially and some new noon rush hour are more intense than in the morning rush hour. From the noon rush hour to the links emerged in local regions. This implies that the interactions between land uses in the noon rush evening rush hour, the interactions decreased not only through the main roads but also across the hour are more intense than in the morning rush hour. From the noon rush hour to the evening rush western areas. The obvious feature of interactions at midnight is that there were significant hour, the interactions decreased not only through the main roads but also across the western areas. connections to or from recreational land (i.e., the eastern area). The obvious feature of interactions at midnight is that there were significant connections to or from Besides the connectivity property, the spatially embedded network can also imply the recreational land (i.e., the eastern area). aggregation patterns of land uses in the functional space. The land uses that have a strong Besides the connectivity property, the spatially embedded network can also imply the aggregation connectivity relationship in the network traffic flow tend to be grouped together (Figure 7), and, in patterns of land uses in the functional space. The land uses that have a strong connectivity relationship this way, we can explore the city structure and transport system using the resulting clustering in the network trac flow tend to be grouped together (Figure 7), and, in this way, we can explore the patterns of land uses. city structure and transport system using the resulting clustering patterns of land uses. (b) (a) (d) (c) Figure 7. Communities in dierent time periods with cell size k = 1000 m: (a) morning rush hour; Figure 7. Communities in different time periods with cell size k = 1000 m: (a) morning rush hour; (b) (b) noon rush hour; (c) evening rush hour; (d) midnight. noon rush hour; (c) evening rush hour; (d) midnight. Appl. Sci. 2019, 9, 2054 14 of 21 As presented in Figure 7, we classified the land uses into eight communities and there were some cells with no nodes. This is because there were no trajectory flows traversing across these cells. In addition, the land uses in the same community do not have to be contiguous in space. The reason is that our study aimed to cluster land uses from the perspective of their transport function, and the final similarity of the nodes was decided by both their attractive degree and structure similarity, which were calculated based on the spatially embedded network. Some neighboring land uses could have a low similarity due to their weak connectivity in the spatially embedded network, and some distant land uses could be grouped together if they are connected by a route with large trac flows. More specifically, as presented in Figure 7a, the study region in the morning rush hour was partitioned into five main communities, in each of which the land uses had a strong transport connection. Such relations also imply the actual aggregation of human activities and urban functions, e.g., the business district in the eastern part, the residential area in the southern part, and the universities in the northern area. In addition, there were four main communities in the noon rush hour. The most significant feature in this time period was that many communities were non-contiguous, or spanned multiple regions. For example, the community labeled by blue dots spanned the northern area (i.e., universities and high-technology regions) and the southern area (i.e., business districts and railway stations). In this time period, the trac connection between distant land uses became stronger and cross-regional human activities were more frequent. In the period of the evening rush hour, the city was partitioned into two main communities which correspond to interactions among the residential areas, business districts, and universities. The small part (pink dots) corresponded to the connection between residential areas and train station areas. At midnight, there were four main communities, in which the large volume of trac flows was directed for entertainment (e.g., bar). For example, the central Hohai entertainment area (green dots) attracted most of the neighboring land uses. We then used the GSMM method to quantify the similarity between community structures in dierent time periods. In such a way, we were able to find out the degree to which the transport system changed across time. As presented in Figure 8, the GSMM measure values were calculated at the macro level rather than the micro level. First, it can be observed that community structures in successive time periods usually had a high similarity. For example, the community similarities in the successive time slices of [4:00–8:00], [11:00–14:00], and [15:00–18:00] were higher than those in non-successive time slices. Secondly, most of the community structures in the rush hours (e.g., the similarity between [7:00–8:00] and [8:00–9:00]) had a high similarity. Hence, the distributions of trac flows are so regular in these periods that urban planners could estimate the associated travel behavior patterns. Note that the community structures between [12:00–14:00] and [4:00–7:00] were similar at the macro scale. Considering the routines and habits of residents, there are relatively few trac flows in these periods and the arterial roads provide the main functions of transportation in the city. The travel origins and destinations are concentrated in a few business districts and railway stations. Hence, the trac conditions in these time periods showed a similar characteristic. In addition, it can be observed that the community structure in [21:00–22:00] was very dierent from most of the structures in the other time periods. The reason may be that there are many dierent activities happening (e.g., working and entertainment) in [21:00–22:00], and thus the connectivity between local regions is much more complex than those in other time periods. Furthermore, the community structure in the evening rush hour of [18:00–19:00] was also very dierent from most of the other structures. This could be because in this time period the travel activities become increasingly active, and most of the residents in Beijing choose to travel along dierent routes. The land uses were also aggregated into dierent communities in this period compared to those in the successive time periods. Appl. Sci. 2019, 9, x FOR PEER REVIEW 15 of 21 Appl. Sci. 2019, 9, x FOR PEER REVIEW 15 of 21 Appl. Sci. 2019, 9, 2054 15 of 21 Figure 8. Similarity of the community structures in different time periods with 1000 m cell size. Figure 8. Similarity of the community structures in dierent time periods with 1000 m cell size. Figure 8. Similarity of the community structures in different time periods with 1000 m cell size. 4.2. Variation in Dierent Spatial Scales 4.2. Variation in Different Spatial Scales 4.2. Variation in Different Spatial Scales Our proposed method is adaptive to applications with dierent spatial scales. Hence, the next Our proposed method is adaptive to applications with different spatial scales. Hence, the next Our proposed method is adaptive to applications with different spatial scales. Hence, the next experiment refined the tessellation of space using a 500 m grid and analyzed the associated spatial experiment refined the tessellation of space using a 500 m grid and analyzed the associated spatial community experimentpatterns refined t acr he t oss ess space ellation and of time. space using a 500 m grid and analyzed the associated spatial community patterns across space and time. community patterns across space and time. As the size of cells becomes smaller, there are more cells with no nodes, which correspond to As the size of cells becomes smaller, there are more cells with no nodes, which correspond to the buildings As the size (e.g., of cells Imperial become Palace) s sm or aller, lakes the (e.g., re are Beihai more Park) cells w (Figur ith no e 9). nodes, We can which easily correspond observe the to the buildings (e.g., Imperial Palace) or lakes (e.g., Beihai Park) (Figure 9). We can easily observe the the buildings (e.g., Imperial Palace) or lakes (e.g., Beihai Park) (Figure 9). We can easily observe the distributions distributions of of these land these land uses uses fr from om the sp the spatially atially e embedded mbedded net networks. works. In Inad addition, dition, compared compared to to networks distrib networks wi utions with thof these land 10 1000 00 m cel m cell l siz size, uses e, networks wi networks from the sp with th atia 50500 0 lly e m m cell m cell bedded net size c size an p can resent more d pr works. In esent mor ad ee tailed str dition, compared detailed uctures o structur f to es networks wi street network i th 10 nf00 ras m cel tructure a l size n , networks wi d associated traf th 5 fi 0 c 0flows i m cell n size c the ci atn p y. Wi resent more d th the refined tessella etailed str ti u on, we ctures of of street network infrastructure and associated trac flows in the city. With the refined tessellation, could observe more detailed interactions from the results. In the morning rush hour, strong street network infrastructure and associated traffic flows in the city. With the refined tessellation, we we could observe more detailed interactions from the results. In the morning rush hour, strong connectivity existed mainly among the regions of residential areas, business districts, and high-tech could observe more detailed interactions from the results. In the morning rush hour, strong connectivity existed mainly among the regions of residential areas, business districts, and high-tech areas. Entering the noon rush hour, the ring-like structure of the transport system became most connectivity existed mainly among the regions of residential areas, business districts, and high-tech areas. Entering the noon rush hour, the ring-like structure of the transport system became most significant. In addition, in the periods of the evening rush hour and midnight, the traffic flows were areas. Entering the noon rush hour, the ring-like structure of the transport system became most significant. In addition, in the periods of the evening rush hour and midnight, the trac flows were concentrated in the eastern and northern parts, which are the business cores of the city. Therefore, significant. In addition, in the periods of the evening rush hour and midnight, the traffic flows were concentrated in the eastern and northern parts, which are the business cores of the city. Therefore, the transport system of Beijing depends largely on the loop lines, with a significant temporal pattern. concentrated in the eastern and northern parts, which are the business cores of the city. Therefore, the transport system of Beijing depends largely on the loop lines, with a significant temporal pattern. the transport system of Beijing depends largely on the loop lines, with a significant temporal pattern. (a) (b) Figure 9. Cont. (a) (b) Appl. Sci. 2019, 9, x FOR PEER REVIEW 16 of 21 Appl. Sci. 2019, 9, x FOR PEER REVIEW 16 of 21 Appl. Sci. 2019, 9, 2054 16 of 21 (c) (d) (c) (d) Figure 9. Spatially embedded networks for traffic flows in different time periods with cell size k = 500 m: (a) morning rush hour; (b) noon rush hour; (c) evening rush hour; (d) midnight. Figure 9. Spatially embedded networks for trac flows in dierent time periods with cell size k = 500 m: Figure 9. Spatially embedded networks for traffic flows in different time periods with cell size k = 500 (a) morning rush hour; (b) noon rush hour; (c) evening rush hour; (d) midnight. m: (a) morning rush hour; (b) noon rush hour; (c) evening rush hour; (d) midnight. In order to compare the results under the two scales, we also classified the land uses into eight In order to compare the results under the two scales, we also classified the land uses into communities. As presented in Figure 10a, the study region was partitioned into three main In order to compare the results under the two scales, we also classified the land uses into eight eight communities. As presented in Figure 10a, the study region was partitioned into three main communities in the morning rush hour. The eastern region was divided into two parts, and the communities. As presented in Figure 10a, the study region was partitioned into three main communities in the morning rush hour. The eastern region was divided into two parts, and the community represented by green dots implies the aggregation of residential area (i.e., western part) communities in the morning rush hour. The eastern region was divided into two parts, and the community represented by green dots implies the aggregation of residential area (i.e., western part) and commercial business districts (i.e., eastern part). Another part in the eastern region was merged community represented by green dots implies the aggregation of residential area (i.e., western part) and commercial business districts (i.e., eastern part). Another part in the eastern region was merged with the northern region, and the resulting community implies the aggregation of residential areas, and commercial business districts (i.e., eastern part). Another part in the eastern region was merged with the northern region, and the resulting community implies the aggregation of residential areas, universities, and high-technology regions. Later, entering the noon rush hour and the evening rush with the northern region, and the resulting community implies the aggregation of residential areas, universities, and high-technology regions. Later, entering the noon rush hour and the evening rush hour, the study region was partitioned into three communities and four communities, respectively. universities, and high-technology regions. Later, entering the noon rush hour and the evening rush hour, the study region was partitioned into three communities and four communities, respectively. In addition, the aggregation of regions in both of the two time periods seems to be more significant hour, the study region was partitioned into three communities and four communities, respectively. In addition, the aggregation of regions in both of the two time periods seems to be more significant tha In an d those dition, t inh the correspondi e aggregation ofng ti regim ons in both of e periods wi the th cel two ti l size me periods seem 1000 m. At midnight, s to be more signif the study ar ic ea an wa t s than those in the corresponding time periods with cell size 1000 m. At midnight, the study area was pa tharti n those tioned into three communi in the corresponding titi m es. C e peri ompa ods wi red to th th cell siz e res e 10 u00 lt wit m. h At mi cell dnight, size 100 the study 0 m, the int areae wa ract s ion partitioned into three communities. Compared to the result with cell size 1000 m, the interaction b pa etrti wteie on ned into three communi land uses was weakenti ed es. C at m ompa idnig red to th ht with c e res ell su izlte wit 500h m cel , al nsi dze th1 e0 r0 e0 g m, ion t w he int as die vract ideion d in to between land uses was weakened at midnight with cell size 500 m, and the region was divided into two two ma betweenin p lana drts: one p uses wasa w rt ewa akes merged into the commu ned at midnight with cell s ni izty of e 500 the busi m, and t ness distri he region w ct (green as dividdots) ed in,t o a nd main parts: one part was merged into the community of the business district (green dots), and the other two main parts: one part was merged into the community of the business district (green dots), and the other one was merged into the community of business district and residential areas (blue dots). one the other one was mer was merged into the ged community into the community of b of business district usines and s district residential and rear sid eas ential areas (b (blue dots).lu In e dots). general, In general, the communities with cell size 500 m can reveal more detailed information of the In g communities eneral, the c with omm cell un size ities500 wim th c canellr s eveal ize mor 500 em detailed can re information veal more d ofeaggr taileegation d inform ofaland tion o uses. f aggregation of land uses. aggregation of land uses. (a) (b) (a) (b) Figure 10. Cont. Appl. Sci. 2019, 9, x FOR PEER REVIEW 17 of 21 Appl. Sci. 2019, 9, x FOR PEER REVIEW 17 of 21 Appl. Sci. 2019, 9, 2054 17 of 21 (d) (c) (d) (c) Figure 10. Communities in different time periods with cell size k = 500 m: (a) morning rush hour; (b) Figure 10. Communities in dierent time periods with cell size k = 500 m: (a) morning rush hour; Figure 10. Communities in different time periods with cell size k = 500 m: (a) morning rush hour; (b) noon rush hour; (c) evening rush hour; (d) midnight. (b) noon rush hour; (c) evening rush hour; (d) midnight. noon rush hour; (c) evening rush hour; (d) midnight. Using the GSMM method, we also calculated the similarity matrix, as presented in Figure 11. It Using the GSMM method, we also calculated the similarity matrix, as presented in Figure 11. Using the GSMM method, we also calculated the similarity matrix, as presented in Figure 11. It can be observed that in the rush hours, the measured values with 500 m cell size were similar to It can be observed that in the rush hours, the measured values with 500 m cell size were similar to can be observed that in the rush hours, the measured values with 500 m cell size were similar to those with 1000 m cell size. In addition, during the non-rush hours, the measured values with 500 m those with 1000 m cell size. In addition, during the non-rush hours, the measured values with 500 m those with 1000 m cell size. In addition, during the non-rush hours, the measured values with 500 m cell size were higher than the corresponding values with 1000 m cell size. The reason could be that, cell size were higher than the corresponding values with 1000 m cell size. The reason could be that, cell size were higher than the corresponding values with 1000 m cell size. The reason could be that, with the refining of space tessellation, more cells had traffic flows and the number of the matching with the refining of space tessellation, more cells had trac flows and the number of the matching with the refining of space tessellation, more cells had traffic flows and the number of the matching cells across time increased. Specifically, the small size grid can capture the local interactions between cells across time increased. Specifically, the small size grid can capture the local interactions between cells across time increased. Specifically, the small size grid can capture the local interactions between regions, which were more regular in the non-rush hours than in the rush hours (see Figure 9). regions, which were more regular in the non-rush hours than in the rush hours (see Figure 9). regions, which were more regular in the non-rush hours than in the rush hours (see Figure 9). Figure 11. Similarity of the community structures in dierent time periods with 500 m cell size. Figure 11. Similarity of the community structures in different time periods with 500 m cell size. 4.3. Algorithm Eciency with Dierent Cell Sizes Figure 11. Similarity of the community structures in different time periods with 500 m cell size. 4.3. Algorithm Efficiency with Different Cell Sizes The results above show the eect of dierent cell sizes on community detection. In order to further 4.3. Algorithm Efficiency with Different Cell Sizes explore the algorithm eciency with dierent cell sizes, we implemented the method with cell sizes of 500 m, 600 m, 700 m, 800 m, 900 m, and 1000 m. Appl. Sci. 2019, 9, x FOR PEER REVIEW 18 of 21 Appl. Sci. 2019, 9, 2054 18 of 21 The results above show the effect of different cell sizes on community detection. In order to further explore the algorithm efficiency with different cell sizes, we implemented the method with cell sizes of 500 m, 600 m, 700 m, 800 m, 900 m, and 1000 m. With the increase of cell size, the total running time decreased (Figure 12a), and when the cell With the increase of cell size, the total running time decreased (Figure 12a), and when the cell size changed from 500 m to 700 m, the algorithm eciency increased sharply. When the cell size size changed from 500 m to 700 m, the algorithm efficiency increased sharply. When the cell size changed from 700 m to 1000 m, the running time remained stable. The reason could be that the number changed from 700 m to 1000 m, the running time remained stable. The reason could be that the of nodes decreased with the increase of cell size, and thus the algorithm cost less time during the number of nodes decreased with the increase of cell size, and thus the algorithm cost less time clustering of nodes. Furthermore, when the cell size was larger than 700 m, the number of iterations in during the clustering of nodes. Furthermore, when the cell size was larger than 700 m, the number k-means algorithm did not change much, and thus the running time of the algorithm remained stable. of iterations in k-means algorithm did not change much, and thus the running time of the algorithm In addition, as presented in Figure 12b, the running time of the algorithm changed across the time remained stable. In addition, as presented in Figure 12b, the running time of the algorithm changed periods. In general, the running time of the algorithm in the rush hours (e.g., 7:00–9:00, 11:00–13:00, across the time periods. In general, the running time of the algorithm in the rush hours (e.g., 7:00– 18:00–20:00, 22:00–23:00) was larger than those in the other hours. The reason could be that the flow 9:00, 11:00–13:00, 18:00–20:00, 22:00–23:00) was larger than those in the other hours. The reason structures are more complex in rush hours. could be that the flow structures are more complex in rush hours. (a) (b) Figure 12. The algorithm eciency with dierent cell sizes: (a) the total running time of the algorithm with dierent cell sizes, and (b) the running time of the algorithm in dierent time periods. Appl. Sci. 2019, 9, 2054 19 of 21 5. Conclusion and Future Directions Based on the collective intra-city trips extracted from the emerging taxi GPS trajectory data, this paper explored network trac flows towards a deep understanding of city structure. We introduced network science techniques (e.g., community detection) to reveal the regular patterns of trac flows across space and time. More specifically, aiming at the connectivity, aggregation, and dynamic properties of transport system, we proposed a three level framework to explore the complex trac network. It firstly partitions the study region and constructs a spatially embedded network for representing the connectivity relationships between local regions. In order to extract the aggregation patterns of land uses, the method then uses the community detection techniques based on the volume of trac flows and structural properties of the network. Furthermore, our method employs a graph structure matching measure to uncover the regularities of the transport system across time. The proposed method is also adaptive to multi-scale applications in space and time. Through the case study, we found that the interactions of land uses show dierent characteristics in dierent time periods, and the aggregation patterns of functional areas is dynamic across the time. This result is highly associated with the travel behaviors of residents in the city, and thus can be used further in social science research. In addition, the result can provide references for the dispatching of the trac system. For example, we can plan for the prevention of trac jams in regions which have intense interactions of trac flows. Moreover, it can be used to assist urban structure analysis. As presented in our case study, Beijing has a polycentric form with significant loop structure. In this paper, we took the taxi trajectory data into consideration because taxi accounts for a large proportion of public transport in Beijing city. Taxi drivers are very familiar with the city of concern, and thus there are increasingly more studies focusing on the use of taxi trajectory data for urban analysis [3,12,33]. In addition, since the taxi is a common mode of transport, our method could be adaptable to other cities. We would like to regard this research as a beginning of detecting spatial interaction communities based on vehicle datasets. With the rapid development of big data, more trac data (e.g., bus trajectories, passenger car data, and biking trajectories) can be introduced into our framework for exploring city structures comprehensively. Further study can also use more methods (e.g., complex system) to understand the mechanisms of the trac flow space. It would be interesting to analyze the associations between the physical space and virtual space of a city using social media data. Author Contributions: Methodology Development & Implementation, Data Acquisition, and Analysis: W.Y.; M.G.; Z.C.; Writing & Revision: led by W.Y. with contributions from all other co-authors. Funding: The project was supported by the National Natural Science Foundation of China (No. 41701440, 41871305), the Natural Science Foundation of Hubei Province (No. 2018CFB513), the Fundamental Research Funds for the Central Universities, China University of Geosciences (Wuhan) (No. CUG170640), and a grant from State Key Laboratory of Resources and Environmental Information System. Conflicts of Interest: The authors declare no conflict of interest. References 1. Zheng, L.; Xia, D.; Zhao, X.; Tan, L.Y.; Li, H. Spatial–temporal travel pattern mining using massive taxi trajectory data. Phys. A Stat. Mech. Appl. 2018, 501, 24–41. [CrossRef] 2. Zhu, X.; Guo, D. Mapping large spatial flow data with hierarchical clustering. Trans. GIS 2014, 18, 421–435. [CrossRef] 3. Yu, W. Discovering frequent movement paths from taxi trajectory data using spatially embedded networks and association rules. IEEE Trans. Intell. Transp. Syst. 2019, 23, 855–866. [CrossRef] 4. Yuan, Y.; Raubal, M.; Liu, Y. Correlating mobile phone usage and travel behavior—A case study of Harbin, China. Comput. Environ. Urban Syst. 2012, 36, 118–130. [CrossRef] 5. Jiang, B.; Yin, J.; Zhao, S. Characterizing the human mobility pattern in a large street network. Phys. Rev. E 2009, 80, 021136. [CrossRef] [PubMed] 6. Song, C.; Koren, T.; Wang, P.; Barabási, A.-L. Modelling the scaling properties of human mobility. Nat. Phys. 2010, 6, 818–823. [CrossRef] Appl. Sci. 2019, 9, 2054 20 of 21 7. Ahas, R.; Aasa, A.; Silm, S.; Tiru, M. Daily rhythms of suburban commuter ’s movements in the Tallinn metropolitan area: Case study with mobile positioning data. Transp. Res. Part C 2010, 18, 45–54. [CrossRef] 8. Sevtsuk, A.; Ratti, C. Does urban mobility have a daily routine? Learning from the aggregate data of mobile networks. J. Urban Technol. 2010, 17, 41–60. [CrossRef] 9. Jiang, S.; Guan, W.; Zhang, W.; Chen, X.; Yang, L. Human mobility in space from three modes of public transportation. Phys. A Stat. Mech. Appl. 2017, 483, 227–238. [CrossRef] 10. Wang, Z.; Hu, Y.; Zhu, P.; Qin, Y.; Jia, L. Ring aggregation pattern of metro passenger trips: A study using smart card data. Phys. A Stat. Mech. Appl. 2018, 491, 471–479. [CrossRef] 11. Wang, W.; Pan, L.; Yuan, N.; Zhang, S.; Liu, D. A comparative analysis of intra-city human mobility by taxi. Phys. A Stat. Mech. Appl. 2015, 420, 134–147. [CrossRef] 12. Cai, H.; Zhan, X.; Zhu, J.; Jia, X.; Chiu, A.S.F.; Xu, M. Understanding taxi travel patterns. Phys. A Stat. Mech. Appl. 2016, 457, 590–597. [CrossRef] 13. Zheng, Y.; Liu, Y.; Yuan, J.; Xie, X. Urban computing with taxicabs. In Proceedings of the 13th International Conference on Ubiquitous Computing, Beijing, China, 17–21 September 2011; ACM: New York, NY, USA, 2011; pp. 89–98. 14. Guo, D.; Zhu, X.; Jin, H.; Gao, P.; Andris, C. Discovering spatial patterns in origin-destination mobility data. Trans. GIS 2012, 16, 411–429. [CrossRef] 15. Yuan, J.; Zheng, Y.; Zhang, C.; Xie, W.; Xie, X.; Sun, G.; Huang, Y. T-drive: Driving directions based on taxi trajectories. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2016, 16, 1682–1750. 16. Zhou, Q.; Qin, K.; Chen, Y.X.; Li, Z.X. Hotspots detection from taxi trajectory data based on data field clustering. Geogr. Geo-Inf. Sci. 2016, 32, 51–56. 17. Yue, Y.; Zhuang, Y.; Li, Q.; Mao, Q. Mining time-dependent attractive areas and movement patterns from taxi trajectory data. In Proceedings of the International Conference on Geoinformatics, Fairfax, VA, USA, 12–14 August 2009; IEEE: Piscataway, NJ, USA, 2009. 18. Liu, C.; Qin, K.; Kang, C.G. Exploring time-dependent trac congestion patterns from taxi trajectory data. In Proceedings of the 2015 2nd IEEE International Conference on Spatial Data Mining and Geographical Knowledge Services (ICSDM), Fuzhou, China, 8–10 July 2015; pp. 39–44. 19. Liu, Y.; Kang, C.; Gao, S.; Xiao, Y.; Tian, Y. Understanding intra-urban trip patterns from taxi trajectory data. J. Geogr. Syst. 2012, 14, 463–483. [CrossRef] 20. Liu, X.; Gong, L.; Gong, Y. Revealing travel patterns and city structure with taxi trip data. J. Transp. Geogr. 2015, 43, 78–90. [CrossRef] 21. Zhou, Y.; Fang, Z. Labeling residential community characteristics from collective activity patterns using taxi trip data. ISPRS Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2017, XLII-2/W7, 1481–1486. [CrossRef] 22. Pan, G.; Qi, G.; Wu, Z.; Zhang, D.; Li, S. Land-use classification using taxi gps traces. IEEE Trans. Intell. Transp. Syst. 2013, 14, 113–123. [CrossRef] 23. Gonzalez, M.C.; Hidalgo, C.A.; Barabasi, A.L. Understanding individual human mobility patterns. Nature 2008, 453, 779–782. [CrossRef] 24. Liu, L.; Andris, C.; Ratti, C. Uncovering cabdrivers’ behavior patterns from their digital traces. Comput. Environ. Urban Syst. 2010, 34, 541–548. [CrossRef] 25. Prapat, P.; Nguyen, T.K.O. Assessment of potential long-range transport of particulate air pollution using trajectory modeling and monitoring data. Atmos. Res. 2007, 85, 3–17. 26. Dias, D.; Tchepel, O. Modelling of human exposure to air pollution in the urban environment: A GPS-based approach. Environ. Sci. Pollut. Res. 2014, 21, 3558–3571. [CrossRef] 27. Cho, E.; Myers, S.A.; Leskovec, J. Friendship and mobility: User movement in location-based social networks. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, 21–24 August 2011; ACM: New York, NY, USA, 2011; pp. 1082–1090. 28. Ratti, C.; Sobolevsky, S.; Calabrese, F.; Andris, C.; Reades, J.; Martino, M.; Claxton, R.; Strogatz, S.H. Redrawing the map of Great Britain from a network of human interactions. PLoS ONE 2010, 5, e14248. [CrossRef] [PubMed] 29. Kaufman, L.; Rousseeuw, P.J. Finding groups in data: an introduction to cluster analysis. J. R. Stat. Soc. Ser. C 1991, 40, 401–423. 30. Liu, J.; Liu, T. Detecting community structure in complex networks using simulated annealing with k-means algorithms. Phys. A Stat. Mech. Appl. 2010, 389, 2300–2309. [CrossRef] Appl. Sci. 2019, 9, 2054 21 of 21 31. Lafon, S.; Lee, A. Diusion maps and coarse-graining: A unified framework for dimensionality reduction, graph partitioning, and data set parameterization. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 1393–1403. [CrossRef] [PubMed] 32. Wang, B. Research on Taxi Travel Demand Based on Beijing Passenger Hot Spot Area. Master ’s Thesis, Beijing Jiaotong University, Beijing, China, 2018. 33. Liu, Y.; Wang, F.; Xiao, Y.; Gao, S. Urban land uses and trac ‘source-sink areas’: Evidence from GPS-enabled taxi data in Shanghai. Landsc. Urban Plan. 2012, 106, 73–87. [CrossRef] © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png
Applied Sciences
Multidisciplinary Digital Publishing Institute
http://www.deepdyve.com/lp/multidisciplinary-digital-publishing-institute/analyzing-spatial-community-pattern-of-network-traffic-flow-and-its-5krVSP9OtT