Access the full text.
Sign up today, get DeepDyve free for 14 days.
References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.
In this paper, we consider the utilization of the graph and network theory in the field of modeling and simulating the dynamics of infectious diseases. We describe basic principles and tools and show how we can use them to fight against the spread of this phenomenon. We also present our software solutions that can be used to support decision-making activities. Keywords: networks. infectious diseases; simulation; social We should also take note of the fact that all types of diseases spreading over populations each have unique properties. We have to model these and the notion of a probabilistic finite state machine seems to be useful in this field. Owing to this tool, we are able to model any type of disease with any state (e.g., susceptible, infected, carrier, immunized, deceased, etc.) and transitions between them. Thus, the underlying topology seems to have a huge impact on the dynamics of infectious diseases and their unique properties which can be modeled by probabilistic finite state machines. *Corresponding author: Rafal Kasprzyk, Military University of Technology, Faculty of Cybernetics, Gen. Sylwestra Kaliskiego Str. 2, 00-908 Warsaw, Poland, E-mail: rkasprzyk@wat.edu.pl Cezary Bartosiak and Andrzej Najgebauer: Military University of Technology, Faculty of Cybernetics, Warsaw, Poland Definitions and notations Let us define the network as follows: Net ( t ) = G( t ) = V ( t ), E ( t ) , { fi ( v ,t )}i{ 1,...,NF } ,{ hj ( e ,t )} j{ 1,...,NH } v ( t ) V eE ( t ) Introduction The utilization of the graph and network theory in the epidemiology field has a very practical purpose, particularly today: countering infectious diseases, for example, HIV/AIDS, malaria, SARS, etc. We demonstrate how this theory can help us in tackling epidemics. This has enormous practical potential in regions such as Africa where there are too few medicines to treat all people who may be at risk. A standard approach is based on a simplified assumption that infectious diseases spread over populations similar to a stone dropped in water. This approach does not explain the real dynamics of infectious diseases, in particular: why diseases can spread over populations long term; how to choose individuals to immunize in order to minimize the range of epidemics; what are the mechanisms of arising outbreaks of diseases. This is due to not taking into consideration the topologies of populations. The fundamental question is who is connected to whom. We can answer this question because of the graph and network theory. where: G(t)= V(t), E(t) simple dynamic graph, V(t), E(t) sets of vertices and edges of the graph, E(t){{v,v'}:v,v'V(t)} (`dynamic' means that V(t) and E(t) can change over time); fi : V(t)Vali the i-th function described on the vertices of the graph, i = 1, ..., NF (number of functions of the vertex), Vali is a set of fi values; hj : E(t) Valj the j-th function described on the edges of the graph, j = 1, ..., NH (number of functions of the edge), Valj is a set of hj values. We assume that the values of functions fi and hj can also change over time. For the purpose of this paper, we omit these functions and focus on the characteristics of the graph G(t). Simple dynamic graphs are often represented by matrix A(t), called the adjacency matrix, that is, a A(t) × A(t) symmetric matrix. The element aij(t) of adjacency matrix equals 1 if there is an edge between vertices i and j, and 0 otherwise. The first neighborhood of vertex vi denoted as i1 ( t ) is defined as a set of vertices immediately connected with vi, that is: i1 ( t ) = { v j v( t ):{ vi ,v j } E ( t )} 18Bartosiak etal.: The graph and network theory as a tool for infectious diseases The degree ki(t) of a vertex vi is the number of vertices in the first neighborhood of a vertex, that is: ki ( t ) =| i1 ( t )| The path starting in vertex vi and ending in vertex vj is a sequence of vertices v0 ,v1 ,...,vk-1 ,vk , where {vi1, vi}E(t)i = 1, ..., k. The length of a path is defined by the number of links in it. The shortest path length starting in vertex vi and ending in vertex vj is denoted as dij(t). The diameter D is the longest shortest path, that is: D( t ) = maxvi ,v j V ( t ) { dij ( t )} The number of existing edges in the first neighborhood of a vertex vi is denoted as: N i ( t ) = { vl ,vk }: vl ,vk i1 ( t ) { vl ,vk } E ( t ) Let us define a very important concept called the local clustering coefficient Ci for a vertex vi, which is then given by the proportion of Ni(t) and divided by the number of edges that could possibly exist in the first neighborhood of vertex vi (every neighbor of vi is connected to every other neighbor of vi). Formally: 2 Ni ( t ) 1 k ( t )( k ( t ) - 1) , | i ( t )| > 1 Ci ( t ) = i i 0, | i1 ( t )| 1 The clustering coefficient C for the whole network is defined as the average of Ci overall viV, that is: C( t ) = 1 V( t ) vi ( t ) V Ci ( t ) The degree distribution P(k,t) of a network is defined as the fraction of nodes in the network with degree k. Formally: P( k ,t ) = Vk ( t ) V( t ) between these objects (e.g., contacts). Identifying and measuring the properties of social networks is the first step towards understanding their topology. In the literature, two basic models are considered: small world [1] and scale free [24] (Figure 1). Naturally they have numerous modifications, extensions and generalizations. However, all address several properties occurring in real-life networks: a small average of the shortest path length, a relatively small diameter, a high clustering coefficient [1] and degree distributions following a power law P(k)~k -, where is some constant between 2 and 3. These properties affect the dynamics of the diffusion processes of diseases within networks. The small world concept emerges as a natural, very realistic case between two extreme models: a regular network based on a ring graph and a random network. In a regular network, every node has the same degree and connections are created using a defined pattern. In practice, regular networks idealize the reality which makes them less useful in making models of real-life networks. By contrast, random networks do not show any type of regularity that is present in real-life networks. Watts and Strogatz [1] noted that regular networks can be modified in such a way that makes them more useful in modeling real-life networks. They are not ideally regular and random and we can `build' them by rewiring some of the connections in a random way. This approach allows us to answer the question of what causes the propagation of outbreaks of diseases. Scale-free networks can be characterized using notions from a field of computer networks: these are networks in which the key role is fulfilled by hubs. These nodes are called `super-spreaders' because they can quickly spread a disease to its numerous neighbors. Barabási and Albert [2] remarked that such networks evolve by adding new nodes in each time step (this feature is called constant growth). New nodes are connected to existing ones with a probability depending on their degrees. Naturally, nodes with the highest degrees are `preferred'. where |Vk(t)| is the number of nodes with degree k; |V(t)| is the total number of nodes. Social networks Social networks are specific structures in the context of the dynamics of infectious diseases. These are networks in which nodes are individuals or organizations and the connections between them reflect some type of relation Figure 1Classic network models (from the left): random graph, small world, scale free. Bartosiak etal.: The graph and network theory as a tool for infectious diseases19 Therefore, we call this feature preferential attachment. There are many models based on these simple ideas which take some additional facts into consideration, for example, some networks evolve by removing the existing nodes and/or rewiring the existing connections, etc. Actually, it is believed that scale-free models are the best ones. Figure 2The importance of nodes according to degree centrality. Centrality measures One of the basic questions is how to identify what the most important nodes are in a given network. This knowledge would help us to minimize the spread of diseases within this network. There are tools, called centrality measures, which address this problem. They are the most fundamental and frequently used measures for networks and allow us to find nodes that may be considered critical (the most `central'). Of course no single measure is suited for all applications. They are based on different formulas and should be used depending on a given context. Immunizing certain chosen individuals, instead of randomly selected ones, against being infected, may be a very effective way to prevent the loss of time and funds due to some disease. It is obvious that immunizing the whole population would eradicate this disease, but this is very costly and not always possible. Thus, identifying critical individuals should be the first step to reduce the consequences of the disease. In this paper, we describe five measures which we consider the most important: degree centrality, radius centrality, closeness centrality, betweenness/load centrality and eigenvector centrality. Owing to these, we are able to show, among other things, how to disintegrate a given network with a minimum number of steps, which would result in a minimized number of individuals affected by the disease. Radius centrality Radius centrality (Figure 3) makes it possible to find nodes which are most influential in some area. It is based on a pessimist criterion. The highest rank is given to the node which is the nearest to the furthest nodes of a given network (the shortest path to the furthest node is minimized): rci ( t ) = 1 . maxv j V ( t ) dij ( t ) Closeness centrality According to closeness centrality (Figure 4), a given node is the more central, the closer it is to other nodes of a given network. Figure 3The importance of nodes according to radius centrality. Degree centrality Degree centrality (Figure 2) is the simplest and most intuitive centrality measure. It gives the highest rank to the node that has the highest degree (the largest neighborhood). Naturally, it should be normalized and it is done by taking the highest possible degree into consideration: dci ( t ) = ki ( t ) . V( t ) -1 Figure 4The importance of nodes according to closeness centrality. 20Bartosiak etal.: The graph and network theory as a tool for infectious diseases This measure allows us to indicate which of any two nodes requires fewer steps to `communicate' with another node: cci ( t ) = V( t ) -1 . v V ( t ) di j ( t ) Betweenness/load centrality Betweenness centrality, sometimes called load centrality (Figure 5), is defined as a fraction of a number of all the shortest paths. As a numerator only these paths which contain a given node are considered. It is normalized by taking the maximal possible number of the shortest paths (in a network which would be a complete graph) into consideration. If pl,i,k(t) is the number of all shortest paths between vertices vl and vk passing through vertex vi and pl,k(t) is the number of all shortest paths between vertices vl and vk then: pl ,i ,k ( t ) pl ,k ( t ) bci ( t ) = . ( V ( t ) - 2 )( V ( t ) - 1) Figure 6The importance of nodes according to eigenvector centrality. 1 ec( t ) = A( t ) ec( t ). Thus, we have A( t )ec( t ) - Iec( t ) = 0 and we can calculate the value using det(A(t)I) = 0. Therefore, ec( t ) is an eigenvector of an adjacency matrix with the largest eigenvalue . l<k Global connection efficiency coefficient Using centrality measures it is possible to optimize vaccination strategies (Figure 7). At first sight, it seems that the most central nodes should be considered as potential individuals to be vaccinated. However, there is an issue connected with the fact that the structures of real-life networks are not fully known. To evaluate how well a G network is connected before and after the removal of a set of nodes, we use global connection efficiency (GCE) [5]. We assume that the connection efficiency between vertex vi and vj is inversely proportional to the shortest distance: connectioniefficiency ( G ) = j 1 . dij Eigenvector centrality Eigenvector centrality (Figure 6) acknowledges that not all connections are equal. If we denote the centrality of vertex vi by eci(t), then we initiate this effect by making eci(t) proportional to the centralities of the first neighbors of vi: eci ( t ) = 1 aij ( t ) ec j ( t ). j=1 V(t) Using the matrix notation: The global connection efficiency is defined as the average connection efficiency over all pairs of nodes: GCE ( G ) = 2 1 d . n( n - 1) i< j i j Figure 5The importance of nodes according to betweenness/load centrality. Unlike the average path length, the global connection efficiency is a well-defined quantity, as well as in the case of non-connected graphs. Let G(y, rs) describe graph G after the removal of yN nodes using rsRS, where RS = {rn, rrn, dc, rc, cc, bc, ec}. Elements of RS describe the removal strategy. The simplest Bartosiak etal.: The graph and network theory as a tool for infectious diseases21 strategy is rn (random nodes), which means that random nodes are removed. A simple modification of the random strategy is rrn (random-random nodes), which means that the removal strategy is a two-step process. Firstly, nodes are chosen randomly and, secondly, among all neighbors of these nodes random nodes are chosen again. This strategy is often called `vaccinate thy neighbor' [68]. All other strategies are based on centrality measures. Using these strategies the nodes with the greatest value of the following measures are removed: dc degree centrality, rc radius centrality, cc closeness centrality, bc betweenness centrality, ec eigenvector centrality. The network durability measure is represented by the function: GCEcoef (G , y ,rs) = GCE ( G - ( y ,rs )) . GCE ( G ) Random Target Random-random Figure 7The designation of vaccination strategies. The lower the value of the function the higher the effectiveness of the removal/vaccination strategy for a particular graph G. Diffusion model Network structures are very important and crucial for diffusions in them, but we need to be aware of the fact that different diseases have different unique properties. We should take them into consideration. Finally, we have defined the general model of diffusion in networks [9]. It is a vector consisting of three elements: Diff ( t ) = Net( t ), PSMx=1,2 ,...,n ,Gen( v ,t ) where Net(t) a network model of some diffusion environment (e.g., a population of some city); PSMx a probabilistic finite state machine model of the considered behavior (in this case a disease); Gen:V(t)SIG a specific function called the generator of signals, which assigns a set of signals for each node in each simulation step, as a result of the first neighborhood of the nodes and their states. These signals are received and processed by PSMx on each node. decide to analyze the effectiveness of different vaccination strategies for different scale-free networks. We take into account three main strategies, that is, random, target and random-random. As we can see in Figures 811, the random strategy is very ineffective. Thus, it is obvious that we should use the target strategy, which is based on centrality measures. In this particular case, we take advantage of the betweenness centrality which seems to be most effective. The problem with the target strategy is the fact that it is necessary to know the exact topology of networks. Our knowledge about most real networks is incomplete and uncertain; that is why the target strategy is very often unusable. Then, as experiments prove, the randomrandom strategy could be utilized, which is much more effective than the random strategy. Let us now analyze a very simple case study of an infectious disease. One of the most extensively studied epidemic models is SIS (susceptible-infected-susceptible) (Figure 12). In each time step, `susceptible' individuals are infected by each `infected' neighbor with probability `' and the recovering rate of the `infected' individuals to `susceptible' ones is `'. Parameter `' is known in the literature as the speed of spreading or virulence of the disease and is defined as: = /. Figure 12 representing the PSM1 diagram of the SIS model of a disease prepared in one of our simulation environments (presented hereinafter) with = 0.5/0.1 = 5. 1.0 0.9 0.8 0.7 GCEcoef 0.6 0.5 0.4 0.3 Simple case studies When vaccine supplies for a deadly disease are limited, who should health workers target? Many searches in social networks show that human contacts have the scale-free feature, which was mentioned earlier. Thus, we 0.2 0.1 0 0 10 20 30 40 50 60 70 80 % of removed nodes (isolated, vaccinated) 90 100 Figure 8The effectiveness of vaccination strategies (scale-free network with 1000 nodes, = 3, k6). 22Bartosiak etal.: The graph and network theory as a tool for infectious diseases 1.0 0.9 0.8 0.7 GCEcoef 0.6 0.5 0.4 0.3 0.2 0.1 0 0 10 20 30 40 50 60 70 80 % of removed nodes (isolated, vaccinated) 90 100 100 90 80 % of Nodes infected 70 60 50 40 30 20 10 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Simulation step Scale free network-at start time 1% of nodes with lowest value of degree centrality is infected Scale free network-at start time 1% of nodes with highest value of degree centrality is infected Scale free network-at start time 1% of nodes chosen randomly is infected Random graph-at start time 1% of nodes with lowest value of degree centrality is infected Random graph-at start time 1% of nodes with highest value of degree centrality is infected Random graph-at start time 1% of nodes chosen randomly is infected Figure 12SIS model of an infection disease. Figure 9The effectiveness of vaccination strategies (scale-free network with 1000 nodes, = 3, k4). 0.50 0.45 0.40 0.35 GCEcoef 0.30 0.25 0.20 0.15 0.10 0.05 0 0 10 20 30 40 50 60 70 80 % of removed nodes (isolated, vaccinated) 90 100 Figure 13The SIS model of a disease ( = 5) in the scale-free network and the random graph. Figure 10The effectiveness of vaccination strategies (scale-free network with 1000 nodes, = 3, k2). 1.0 0.9 0.8 0.7 GCEcoef 0.6 0.5 0.4 0.3 0.2 0.1 0 0 10 20 30 40 50 60 70 80 % of removed nodes (isolated, vaccinated) 90 100 We use two networks: the scale-free network and the random graph. The networks consist of 1000 nodes and 2000 edges. The average degrees of the nodes are similar and close to 4. At time 0 a small number of nodes (1%) are infected. The nodes are chosen using various ways: the lowest value of degree centrality; the highest value of degree centrality; randomly. Then simulation of an epidemic is started. This experiment has been repeated 100 times. The dynamics of an infectious disease in different networks is presented in Figure 13. We can see that the topology of the networks, as well as the way in which the nodes are chosen to infect at the start time, have a great impact on the dynamics of the infectious disease. Applications We have prepared the simulation environment based on the well-known Gephi platform [10, 11] for interactive Figure 11The effectiveness of vaccination strategies (scale-free network with 1000 nodes, = 2, k2). Bartosiak etal.: The graph and network theory as a tool for infectious diseases23 Figure 14GUI of the simulation environment. visualization and exploration of networks. We have implemented it as a set of plugins because this is the most effective way to extend this application. Its architecture is based on the MVC (model-view-controller) and service locator patterns. The first method allows us to separate algorithms and the data layer from the GUI (presented in Figure 14), which makes it possible to independently develop, test and maintain each one. The second pattern is an implementation of the IoC (inversion of control) concept. This technique allows us to remove dependencies from the code. We have extended Gephi by adding new functionalities, that is, new generators of the networks, scenarios for centrality measures utilization in simulations of diffusions, and finally the ability to simulate diffusion of any behavior (especially infectious diseases) in any networks. Our code is very extensible and scalable owing to the fact that the Gephi architecture follows the SOLID principles (single responsibility, open-closed, Liskov substitution, interface segregation and dependency injection), five basic principles of object-oriented programming and design. This environment is used mainly for research purposes, that is, for testing algorithms and new concepts. We have also presented a practical utilization of our investigations in the form of the CARE system (creative Figure 15The CARE graphical user interface (main screen). 24Bartosiak etal.: The graph and network theory as a tool for infectious diseases Figure 16The GUI of the disease modeling module. Figure 17The GUI of the social network modeling module. application to remedy epidemics) (Figure 15) [12]. It is a DSS (decision support system) which can help people to fight against epidemics. CARE contains five crucial modules which address several topics considered in the spreading field of epidemics: disease modeling, social networks, simulation, vaccination and polls. In the first module (disease modeling; Figure 16), using the state machine approach we can model any type of disease based on the knowledge from the epidemiology field. We allow the building of models of diseases with any state in the editor we have proposed. We are able to define some essential parameters, such as, the transition probability between the states, the minimum/maximum time that an individual spends in each state, the maximum number of neighbors that can be infected by an individual in a given time period, etc. Bartosiak etal.: The graph and network theory as a tool for infectious diseases25 Figure 18The GUI of the simulation module with `layout' visualization. Figure 19The GUI of the simulation module with the `geo-contextual' visualization. The second module (social networks; Figure 17) is similar to the first one, but it is used in the context of diffusion environments, that is, social networks. We can model and generate social networks using the complex network theory. Using the proposed generators, we obtain synthetic networks, but with the same statistical properties as real social networks. The algorithms generate networks that are random graphs, small world networks, scale-free networks or modifications thereof. The third module (simulation), the heart of the system, makes it possible to visualize and simulate how a given infectious disease would spread in a given population. The system proposes two ways of information visualization. The first way is called `layout' (Figure 18) and helps the user to manipulate the networks and some parameters of a simulation. The alternative way of visualization is the `geo-contextual' one (Figure 19), which allows the visualization of networks on the world map. 26Bartosiak etal.: The graph and network theory as a tool for infectious diseases Figure 20The GUI of reports simulation outcomes of a disease in a network. The system estimates the expected outcomes of different simulation scenarios and generates detailed reports (Figure 20). The user can assess the results and the effectiveness of the chosen vaccination strategy. A report chart is created on the basis of the simulation. The x-axis represents simulation steps and the y-axis represents a count of individuals in each state in appropriate steps. The fourth module (vaccination; Figure 21) allows us to identify `super-spreaders' and comes up with the most effective vaccination strategy for a given population. Naturally it helps decision makers to reduce the consequences of epidemics or even stop them early. We use a number of centrality measures to address the question: "Who is the most important person in a given social network from an epidemic point of view?" We demonstrate how to discover the critical elements of any network, the so-called `superspreaders' of a disease. The crucial step in fighting against a disease is to obtain information about the social network subject to Figure 21The GUI of the vaccination module. Bartosiak etal.: The graph and network theory as a tool for infectious diseases27 Figure 22The GUI of the polls module. that disease. The last module (polls; Figure 22) helps in building special polls based on sociological knowledge to help discover network topology. Questionnaires designed in this way are deployed to mobile devices to gather social data in the field. identification of individuals that are the most important from the point of view of the spreading epidemics; estimating the effectiveness of the proposed vaccination strategies; estimating the amount of medicines which are essential to fight against some disease. Summary In this paper, we have presented some basic principles and ideas that can provide a deeper understanding of the dynamics of infectious diseases. These tools can be used in a variety of applications: simulations of the spreading of any disease in any population; The solutions presented in this paper have their practical implementations as the CARE system, which is now a subsystem of the SARNA monitoring, early warning and forecasting system, built at MUT (Military University of Technology) and put into practice in the Government Safety Centre in Poland [13]. Received November 20, 2012; revised January 19, 2013; accepted January 20, 2013
Bio-Algorithms and Med-Systems – de Gruyter
Published: Mar 1, 2013
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.