Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Distribution of Base Pair Alternations in a Periodic DNA Chain: Application of Polya Counting to a Physical System

Distribution of Base Pair Alternations in a Periodic DNA Chain: Application of Polya Counting to... ISSN 1560-3547, Regular and Chaotic Dynamics, 2018, Vol. 23, No. 2, pp. 1–16. c Pleiades Publishing, Ltd., 2018. Distribution of Base Pair Alternations in a Periodic DNA Chain: Application of Po´lya Counting to a Physical System 1* 1 Malcolm Hillebrand , Guy Paterson-Jones , 2 1 George Kalosakas , and Charalampos Skokos Department of Mathematics and Applied Mathematics, University of Cape Town, Rondebosch, Cape Town 7701, South Africa Department of Materials Science, University of Patras, Rio GR-26504, Greece Received October 13 2017; accepted December 11, 2017 Abstract—In modeling DNA chains, the number of alternations between Adenine-Thymine (AT) and Guanine-Cytosine (GC) base pairs can be considered as a measure of the heterogeneity of the chain, which in turn could affect its dynamics. A probability distribution function of the number of these alternations is derived for circular or periodic DNA. Since there are several symmetries to account for in the periodic chain, necklace counting methods are used. In particular, Po´lya’s Enumeration Theorem is extended for the case of a group action that preserves partitioned necklaces. This, along with the treatment of generating functions as formal power series, allows for the direct calculation of the number of possible necklaces with a given number of AT base pairs, GC base pairs and alternations. The theoretically obtained probability distribution functions of the number of alternations are accurately reproduced by Monte Carlo simulations and fitted by Gaussians. The effect of the number of base pairs on the characteristics of these distributions is also discussed, as well as the effect of the ratios of the numbers of AT and GC base pairs. MSC2010 numbers: 05A15, 92D20 DOI: 10.0000/S1560354718000013 Keywords: DNA models, Po´lya’s Counting Theorem, Heterogeneity, Necklace Combinatorics 1. Introduction Single circular DNA molecules are abundant in nature. The whole genome in a typical bacterium is usually contained in a closed DNA molecule, while in eucaryotes the organelle DNA, inside the mitochondria and chloroplasts, is also found in the same form [1, 23]. Also plasmids, either naturally found in bacteria, or used as vectors in gene cloning, are smaller circular DNA segments. Apart from these cases, in considering the dynamics and other properties of DNA chains, it is often useful to model the chain using periodic boundary conditions in order to avoid finite size or edge effects. For example, periodic boundary conditions have been used to study denaturation bubbles and the melting behavior of DNA [2, 6, 13, 37, 39, 43], probability distributions of thermal openings in the double strand [7, 18], bubble opening profiles in promoter regions which regulate gene transcription [3–5, 11, 12, 16, 20], binding sites of DNA-associated proteins [26, 38], various dynamical and nonlinear properties of DNA [21, 27, 28, 40, 41, 44], as well as charge transport in DNA [10, 14, 17, 19, 33]. A DNA chain consists of a series of base pairs, where each base pair is either Adenine-Thymine (AT) or Guanine-Cytosine (GC). Currently, we are investigating the influence of different factors on the chaoticity of periodic DNA chains [36]. One of the examined quantities is the number of base pair alternations, which can be considered as a quantifier of the system’s heterogeneity. In this work we focus on the rigorous mathematical treatment of alternation counting in periodic DNA sequences. To study periodic DNA, we will consider the DNA necklace associated to a DNA chain, E-mail: malcolm.hillebrand@gmail.com arXiv:1805.06245v1 [math.CO] 16 May 2018 2 Hillebrand et al. where the first and the last base pairs in the chain will become neighbors. This periodicity presents some modeling challenges - if one considers two distinct chains of DNA, it may still be the case that their corresponding necklaces are the same, as one may be merely a rotation or reflection of the other. Such symmetries need to be addressed if any conclusions are to be made about the structure and the dynamics of DNA necklaces. In particular, we are concerned with the number α of base pair alternations in the necklace, where an alternation is defined to be a point at which an AT base pair neighbors a GC base pair or vice versa. Consider, for instance, the DNA chain shown in Fig. 1. Representing a GC base pair (black bead) with a 0 and an AT base pair (white bead) with a 1, the 0 0 0 0 1 0 1 1 0 0 1 Fig. 1. An example of a DNA chain. GC base pairs are represented by black beads and the number 0, while AT base pairs are represented by white beads and the number 1. In the DNA necklace corresponding to this chain, the AT base pair at the far right neighbors the GC base pair at the far left. ¯¯¯ ¯ ¯¯ chain can be written in the form (1)00001011001(0). Here, we have given the leftmost base pair at each alternation point an overbar, and used brackets to denote the fact that in the corresponding DNA necklace the first and last base pairs are neighbors. This necklace is illustrated in Fig. 2, and counting the number of overbars we see that there are α = 6 alternations. Fig. 2. The DNA necklace corresponding to the chain of Fig. 1. This necklace has α = 6 alternations. It is worth noting that a base pair alternation corresponds to the appearance of the particular sequences (often referred to as “words”) 01 or 10 in a DNA chain. Word occurrence probabilities have already been studied in the literature (see e.g. [22, 24, 30–32, 34, 35] and references therein), with emphasis on the appearance of patterns with unexpectedly high or low frequencies, as well as on repeating sequences. However these studies concern the case of linear DNA segments, or in other words DNA chains with fixed boundary conditions. The periodic boundary conditions we consider in our study make the problem of counting alternations (or more generally the appearance of specific words) in circular DNA segments much more complicated than in the case of linear DNA segments due to the appearance of additional symmetries in the DNA structures imposed by rotations and/or reflections. Each base pair in a DNA necklace can contribute at most 2 alternations, depending on which neighbors it differs from. Supposing that the number of AT and GC base pairs in the necklace is given by N and N respectively, this yields the restriction 0 ≤ α ≤ min{2N , 2N }. We AT GC AT GC note that in the extreme case of a homogeneous chain composed of base pairs of the same kind α = 0, while if both types of base pairs are present in the DNA chain the smallest possible value of alternations is α = 2. The later corresponds to a chain having all AT (and consequently GC) base pairs grouped together. Furthermore, if we traverse the necklace pair by pair until we end up where we started, we must necessarily switch between AT and GC base pairs an even number of times. Thus α = 2M for some M ∈ N. Now the natural question is: what is the probability that a random DNA necklace with a specified number of AT and GC base pairs, N and N respectively, has a specified number of AT GC alternations α? Or in other words, how many possible combinations of such base pairs are there REGULAR AND CHAOTIC DYNAMICS Vol. 23 No. 2 2018 Po´lya Counting in Periodic DNA Chains 3 that yield α alternations once the cyclic and reflective symmetries are taken into account? In what follows we answer these questions and provide an algorithm for computing the number of distinct DNA necklaces satisfying these constraints. The paper is organized in the following way: In Sect. 2, the mathematical background is laid out, leading into a Po´lya Enumeration Theorem for bipartite sets. In Sect. 3 an explicit algorithm for calculating the number of distinct DNA necklaces with given values of α, N and N is AT GC described, while in Sect. 4 we compare the theoretical results to those obtained from Monte-Carlo simulations and investigate the effect of the N and N values on the characteristics of the AT GC probability distribution function (pdf) of α. Finally, in Sect. 5 we summarize our results, while in the Appendix we provide a Python computer code implementing the algorithm of Sect. 3. 2. Theoretical Treatment Our problem can be neatly related to the combinatorics of necklaces. Effectively, we are interested in the number of distinct necklaces with N = N + N beads, where N of the beads are white, AT GC AT N of the beads are black, and there are α alternations between the colors. We consider necklaces GC to be the same if they can be reflected or rotated into one another, and beads of the same color are treated as indistinguishable. Because of this, we can equivalently think of a necklace with α alternations as a necklace of α containers, where each container carries some number of black or white beads of the same color, and adjacent containers have different colors. This idea is illustrated in Fig. 3. Fig. 3. The necklace of containers corresponding to the DNA necklace of Fig. 2. The numbers in each container represent the number of consecutive black or white beads in that segment of the necklace. We will refer to containers carrying black beads as black containers, and similarly for white containers. Counting the number of distinct necklaces with the given constraints can thus be reformulated as the problem of assigning numbers of beads to α containers, such that the total of the numbers in the black and white containers is equal to N and N respectively. Two such GC AT assignments will be considered equivalent if the containers can be rotated or reflected into one another in such a way as to preserve both the colors and numbers of beads they contain. Enumerating such assignments is simpler than enumerating necklaces, as we have one less constraint - the number of alternations is now implicit in the formulation of the problem. To perform this enumeration we will require some tools from Po´lya counting theory - in particular, we will need a version of the Po´lya Enumeration Theorem for sets partitioned into two parts, which we will refer to as bipartite sets. For completeness’ sake, we present this material below. 2.1. Group Actions Let A be a set. Then we define the symmetric group on A to be the set of permutations of A: S = {ϕ : A → A | ϕ is a bijection}. (2.1) A cycle is a permutation ϕ ∈ S such that there exist distinct elements {x , x , . . . , x } ∈ A and: A 1 2 k x if x = x for some 1 ≤ i < k i+1 i ϕ(x) = (2.2) x if x = x 1 k x otherwise. REGULAR AND CHAOTIC DYNAMICS Vol. 23 No. 2 2018 4 Hillebrand et al. We denote such a cycle suggestively as (x x . . . x ), and say that ϕ ∈ S is a k-cycle if 1 2 k A ϕ = (x x . . . x ) for some x ∈ S . Two cycles (x x . . . x ) and (y y . . . y ) are said 1 2 k i A 1 2 k 1 2 l to be disjoint if the sets {x , x , . . . , x } and {y , y , . . . , y } are disjoint. 1 2 k 1 2 l If A is a finite set, every element of S can be written as a composition of cycles; in general, however, this cannot be done uniquely. On the other hand, we have the following fundamental structure theorem for elements of finite symmetric groups (see for example [15]): Theorem (Cycle Decomposition Theorem). If A is a finite set, then every element ϕ ∈ S can be written as a product of pairwise disjoint cycles, unique up to order of the cycles: ϕ = (x x . . . x ) · · · (x x . . . x ). 11 12 1k n1 n2 nk 1 n Given a group G and a set A, a group action of G on A is a homomorphism Γ : G → S . In G A other words, elements of G are identified with permutations of A in a manner that preserves the group structure. To simplify the notation, we will write gx instead of Γ (g)(x) for the action of g ∈ G on some x ∈ A. The orbit of an element x ∈ A under the group action Γ is defined to be the set Orb = {gx | G x g ∈ G}, and its stabilizer is given by the subgroup Stab = {g ∈ G | gx = x}. Given some g ∈ G, we denote its set of fixed points by Fix = {x ∈ A | gx = x}. 2.2. Po´lya’s Counting Theory One can often rephrase counting problems in terms of computing the number of distinct orbits of some group action. Po´lya’s counting theory can be thought of as a tool for making these computations systematic and expedient. A fundamental lemma on which this theory is built is the following [9]: Lemma 1 (Burnside’s Lemma). The number of distinct orbits in a group action of a finite group G on A is given by the average number of fixed points of elements of G: #Orbits = |Fix |. (2.3) |G| g∈G A basic problem in combinatorics is the following. Suppose one has a finite set of objects A, and one wishes to color them with colors from another set Ω. How many distinct ways are there of coloring the objects up to some kind of symmetry? This can be recast in the language of group actions. The set of possible colorings is given by Ω = {ϕ : A → Ω | ϕ a function}, and the symmetry is given by a group action Γ on A. This group action passes naturally to a group action Γ on Ω , defined by gϕ : x 7→ ϕ(gx). The question now reduces to counting the number of distinct orbits of this latter action. In this simplified case, Burnside’s lemma is often sufficient to answer the question. We can generalize this problem slightly, however. Suppose that each color has an associated weight, given by a function ω : Ω → N. Given a coloring ϕ : A → Ω of the objects, we define its total weight to be the sum: |ϕ| = ω ◦ ϕ(x). (2.4) x∈A How many distinct colorings of A with a given total weight are there, up to symmetries given by some group action Γ ? Note that the total weight of any coloring in a given orbit is the same, as elements of g merely permute the set A. Thus, the problem boils down to calculating the number of distinct orbits with a given total weight. Po´lya identified two necessary ingredients for a systematic answer to this question: generating functions, and an understanding of the cycle structure of elements of G [29]. Definition (Generating Function). Let ω : Ω → N be an assignment of weights to some set Ω. Suppose further that there are at most a finite number of elements of any given weight, that is, −1 |ω (n)| is finite for every n ∈ N. Then the generating function of ω is given by the polynomial: −1 i f (x) = |ω (i)| x . (2.5) i=0 REGULAR AND CHAOTIC DYNAMICS Vol. 23 No. 2 2018 Po´lya Counting in Periodic DNA Chains 5 Generating functions are useful as they encode combinatorial data - in this case the number of colors of a given weight - as algebraic objects. In particular, we will need the following lemma: Lemma 2. Let ω : Ω → N and ω : Ω → N be assignments of weights to the sets Ω and Ω 1 1 2 2 1 2 respectively. Define an assignment of weights to the set Ω × Ω by ω : (x , x ) 7→ ω (x ) + ω (x ). 1 2 1 2 1 1 2 2 Then f (x) = f (x) · f (x). ω ω ω 1 2 Given a group action Γ and an element g ∈ G, we denote by C (g) the number of k-cycles in G k the unique disjoint cycle decomposition of Γ (g). We can now encode information about the cycle structure of elements of G in the following multivariate polynomial: Definition (Cycle Index). Let G be a finite group. Then the cycle index of a group action Γ on a finite set A of cardinality n is given by the polynomial [8]: C (g) C (g) 1 2 C (g) Z (x , x , . . . , x ) = x x · · · x . (2.6) G 1 2 n 1 2 |G| g∈G This cycle index will allow us to efficiently compute the number of distinct orbits of the group action. With this in mind, we are now in a position to state a version of the Po´lya counting theorem, answering the generalized problem given earlier: Theorem (Po´lya Enumeration Theorem). Let A be a finite set of objects, Ω a set of colors, ω : Ω → N an assignment of weights to the colors with generating function f , and Γ a group ω G action of a finite group G on A. Then Γ passes naturally to a group action Γ on Ω , and a G G generating function by total weight for the number of distinct orbits of Γ is given by: 2 n Orbits (x) = Z f (x), f (x ), . . . , f (x ) . (2.7) ˜ G w w w 2.3. Po´lya Enumeration Theorem for Bipartite Sets By considering multivariate generating functions, the Po´lya enumeration theorem can be generalized to the case where the colors take weights in N . We will generalize the theorem in a different direction, however. Suppose we have a partition of A into two parts, A = X ⊔ Y , and a group action Γ on A. We would like to consider the problem of counting distinct colorings of A under this symmetry, with the additional constraint that we color elements of X from a set Ω , and elements of Y from a set Ω . To this end, we will say that a coloring ϕ : A → Ω ⊔ Ω is valid Y X Y if ϕ(x) ∈ Ω ⇐⇒ x ∈ X and ϕ(x) ∈ Ω ⇐⇒ x ∈ Y . X Y There is an obstruction to this, however - the group action may map elements in X to elements in Y or vice versa. In this case, the extension of Γ to the set of possible colorings is no longer well-defined, as there is no natural way to compare the sets of colors Ω and Ω . Fortunately, X Y this is the only obstruction to proving a Po´lya-type theorem for this problem. This motivates the following definition: Definition (Partition-Preserving Group Action). Let A = X ⊔ Y , and let Γ be a group action on A. Then we say that Γ is partition-preserving if for every g ∈ G, gx ∈ X ⇐⇒ x ∈ X and gx ∈ Y ⇐⇒ x ∈ Y . The importance of this property is as follows. Suppose we have a group action Γ on A = X ⊔ Y , and some element g ∈ G. Then Γ (g) has a unique disjoint cycle decomposition given by Γ (g) = G G C · C · . . . · C . If Γ is partition-preserving then each cycle C is contained entirely in either X 1 2 G i or Y , and Γ is in fact partition-preserving if and only if this is the case for every g ∈ G. If Γ is partition-preserving, then we define C (g) to be the number of k-cycles in the disjoint cycle decomposition of Γ (g) that are contained in X, and we define C (g) analogously. We will now define an analogue of the cycle index polynomial for the case of partition-preserving group actions. This will allow us to keep track of the cycle structure of elements of the group as well as which partition part each cycle acts on: REGULAR AND CHAOTIC DYNAMICS Vol. 23 No. 2 2018 6 Hillebrand et al. Definition (Bipartite Cycle Index). Let G be a finite group and A = X ⊔ Y a finite set of cardinality n. Then the bipartite cycle index of a partition-preserving group action Γ on A is defined to be the polynomial: 1 X X Y Y C (g) C (g) C (g) C (g) 1 n 1 n Z (x , . . . , x , y , . . . , y ) = x · · · x y · · · y . (2.8) G 1 n 1 n 1 n 1 n |G| g∈G We can now generalize Po´lya’s theorem to the case of partition-preserving group actions. We note that this theorem is used implicitly in [29] without proof. Theorem 1 (Bipartite Po´lya Enumeration Theorem). Let Γ be a partition preserving group action of a finite group G on a finite set A = X ⊔ Y . Let Ω = Ω ⊔ Ω be a set of colors, and let X Y + + ω : Ω → N and ω : Ω → N be their assigned weights with respective generating functions X X Y Y f and f . If Φ is the set of valid colorings of A, then Γ passes naturally to a group action Γ X Y G G on Φ, and a generating function by total weight for the number of orbits of Γ is given by: k k Orbits (x) = Z f (x), . . . , f (x ), f (x), . . . , f (x ) . (2.9) ˜ G X X Y Y Proof. We pass to a group action Γ on Φ as follows. Given a valid coloring ϕ ∈ Φ and an element g ∈ G, we define the action of g on ϕ by gϕ : x 7→ ϕ(gx). To compute a generating function for the number of orbits of Γ by total weight, we will determine the generating functions for the number of fixed points of each g ∈ G by total weight. Consider some g ∈ G. As A is finite, there exists a unique disjoint cycle decomposition Γ (g) = C · C · . . . · C , where each C is a cycle in the symmetric group S . Now suppose that g fixes 1 2 k i A some valid coloring ϕ ∈ Φ; that is, gϕ = ϕ. Then, assuming the cycle C = (x x . . . x ) for some i 1 2 k x ∈ A, we have by definition that ϕ(x ) = (gϕ)(x ) = ϕ(gx ) = ϕ(x ), and hence every element i i i i i+1 in the cycle must have the same color under ϕ. The number of colorings of C that are fixed by g is k k i i thus given by the generating function f (x ) if C lies in X, and f (x ) if C lies in Y . We note X i Y i that one of these two cases must occur for every cycle as Γ is partition-preserving. By lemma 2, then, the number of valid colorings of A that are fixed by g is given by the generating function: X X Y Y C (g) C C C k k 1 k 1 k Fix (x) = f (x) · · · f (x )f (x) · · · f (x ). (2.10) X X Y Y By Burnside’s lemma, the number of orbits of Γ of a particular weight is given by the average number of fixed colorings of that weight by elements g ∈ G. Applying Burnside’s lemma for each possible weight, the number of orbits of Γ is thus given by the generating function: Orbits (x) = Fix (x) |G| g∈G X Y X Y 1 C C C (g) C k k 1 k 1 k = f (x) · · · f (x )f (x) · · · f (x ) X X Y Y |G| g∈G k k = Z f (x), . . . , f (x ), f (x), . . . , f (x ) . (2.11) G X X Y Y We note that as a corollary of this proof, we can recover a bivariate generating function from a b this expression, where the coefficient of x y represents the number of distinct colorings with total weight a in Ω , and total weight b in Ω : X Y Corollary. A bivariate generating function by total weight in Ω and Ω , for the number of X Y distinct colorings of A, is given by: k k Orbits (x, y) = Z f (x), . . . , f (x ), f (y), . . . , f (y ) . (2.12) ˜ G X X Y Y REGULAR AND CHAOTIC DYNAMICS Vol. 23 No. 2 2018 Po´lya Counting in Periodic DNA Chains 7 2.4. The Dihedral Group, its Cycle Index and its Extension To apply these results to the problem of counting distinct DNA necklaces, we will need to describe the relevant group action and compute its (bipartite) cycle index. The set of elements acted on by the group is given by the α containers in the DNA necklace and this set can be partitioned into two groups: containers of black beads and containers of white beads. We consider two DNA necklaces to be the same if one can be rotated or reflected into the other. These symmetries can be described by an action of the dihedral group, which we will denote by D , where we have α = 2M. The 2M rotational and reflective symmetries are what distinguishes the case of periodic DNA chains from linear, fixed boundary condition chains studied in [31] and elsewhere. A fundamental fact about D is that it is generated by two elements r and s, where r is a 2M reflection satisfying r = 1, and s is a rotation of order M. Therefore, to describe a group action of D on a DNA necklace it suffices to give the action of r and s. In Fig. 4 the action of such a 2M rotation on the necklace is illustrated, while in Figs. 5 and 6 the action of a reflection is illustrated for the cases where M is odd and even respectively. It is clear that the resulting group action is partition-preserving. Fig. 4. The action of a rotation s ∈ D on the DNA necklace. 2M To compute the bipartite cycle index of this group action, we will treat reflections and rotations separately. To begin with, we can see from Fig. 4 that rotations act symmetrically on the black and white containers in the DNA necklace. Thus, the terms of the cycle index polynomial corresponding to rotations will be symmetric in the x and y . The natural action of the cyclic group C on the i i M M containers in a partition is given by [25]: M/d Z (x , . . . , x ) = ϕ(d)x , (2.13) C 1 M M d d|M where ϕ(d) is defined to be the number of natural numbers less that d that are coprime to it (the Euler totient function). Note that 1 is considered to be coprime to all natural numbers, and so in particular ϕ(d) > 0. Exactly half of the elements of D are rotations, and thus the rotational part 2M M/d M/d of the bipartite cycle index Z is given by ϕ(d)x y . 2M d|M 2 d d The reflective part of the group D , on the other hand, acts differently depending on the parity 2M of M. Suppose first that M is odd, in which case a typical reflection is illustrated in Fig. 5. Each of the M possible reflections occur across an axis consisting of one black container and one white container, both of which are fixed by the reflection. The rest of the containers are split into 2-cycles, and thus the bipartite cycle index Z for odd M is given by: 2M 1 1 M/d M/d (M−1)/2 (M−1)/2 Z (x , . . . , x , y , . . . , y ) = ϕ(d)x y + x y x y . (2.14) D 1 M 1 M 1 1 2M 2 2 d d 2 2 d|M If M is even, a typical reflection is illustrated in Fig. 6. In this case, each possible reflection occurs across an axis consisting of either two white containers or two black containers. The rest of the containers again split into 2-cycles. Thus the bipartite cycle index Z for even M is given by: 2M 1 1 1 M/d M/d (M−2)/2 M/2 (M−2)/2 M/2 2 2 Z (x , . . . , x , y , . . . , y ) = ϕ(d)x y + x x y + y y x . D 1 M 1 M 1 1 2M d d 2 2 2 2 2 4 4 d|M (2.15) REGULAR AND CHAOTIC DYNAMICS Vol. 23 No. 2 2018 8 Hillebrand et al. Fig. 5. The action of a reflection r ∈ D on the DNA necklace, for the case where M is odd. 2M Fig. 6. The action of a reflection r ∈ D on the DNA necklace, for the case where M is even. 2M 2.5. Generating Functions as Formal Power Series In our particular application of Po´lya theory, the elements we are coloring are the α containers in the DNA necklace and the color of a particular container is defined to be the number of black or white beads it contains. As each container must contain at least one bead, the set of colors is given by N . We are interested in the total number of black and white beads, so the weight of each color will be given quite simply by ω(n) = n for each n ∈ N . This weighting corresponds to the 2 3 generating function (2.5) f (x) = x + x + x + · · · . To compute the number of distinct DNA necklaces with N white beads and N black beads, AT GC N N AT GC we need to calculate the coefficient of x y in (2.12), where the bivariate cycle index is given by the appropriate Z(D ) from Sect. 2.4 and the weight generating function is given by f (x). 2M ω n 2 3 n This requires us to calculate the coefficients of specific terms in f (x) = (x + x + x + . . . ) for potentially large n. However, doing this expansion naively requires many computing steps, whose number grows exponentially fast as n increases. Thus, this approach is impractical. Fortunately, there exists a way to bypass this problem: treating f (x) as a formal power series, we can manipulate it into a form that makes such computations significantly faster. An introduction to the theory of formal power series can be found, for instance, in [42]. For our purposes, we will only need the fact that a form of the binomial theorem holds in this setting: −n n Lemma 3. Letting (1 − x) denote the formal inverse of (1 − x) , we have: n + k − 1 −n k (1 − x) = x . (2.16) n − 1 k=0 This implies the following useful lemma regarding powers of f (x): ∞ n+k−1 n n n+k Lemma 4. As a formal power series f (x) can be written as f (x) = x . ω ω k=0 n−1 2 3 Proof. Note that xf (x) = x + x + · · · = f (x) − x. Rearranging this for f (x), we see that ω ω ω −1 n n −n f (x) = x(1 − x) , and hence f (x) = x (1 − x) . The result now follows from lemma 3. ω ω REGULAR AND CHAOTIC DYNAMICS Vol. 23 No. 2 2018 Po´lya Counting in Periodic DNA Chains 9 In contrast to naively expanding powers of f (x), computing binomial coefficients is computation- ally inexpensive, taking at most a linear number of steps in n. We now list a few results that will come in handy later, when we describe an explicit algorithm for computing the number of distinct DNA necklaces with the given constraints. r a b Lemma 5. The coefficient of x in f (x ) is given by: 1 if b = 0 and a = 0 h i 0 if b = 0 and a > 0 a b f (x ) = (2.17) 0 if b > 0 and a ∤ r or r < ab r/a−1 otherwise. b−1 r a b a b 1 1 2 2 Lemma 6. The coefficient of x in f (x ) · f (x ) is given by: ω ω h i h i h i a b a b a b a b 1 1 2 2 1 1 2 2 f (x ) · f (x ) = f (x ) f (x ) . (2.18) ω ω ω ω r k r−k k=0 3. The Algorithm for Computing the Number of Distinct Valid Necklaces Now we are able to evaluate the number of distinct necklaces, which correspond to a particular value of alternations α. The algorithm is fairly straightforward and efficient. Its implementation requires the following steps: a) Set constraint parameters, N , N , and α = 2M. AT GC b) Choose partitioned cycle index polynomial of the Dihedral group based on parity of M. If M is odd, use (2.14), while for M even use (2.15). c) By the corollary to Po´lya’s Enumeration Theorem (2.12), we know that the number of necklaces, up to symmetry, is given by k k Orbits (x, y) = Z f (x), . . . , f (x ), f (y), . . . , f (y ) . (3.1) ˜ G X X Y Y If M is odd using the outcome of the previous step we get M/d d M/d d Orbits (x, y) = ϕ(d)f (x )f (y ) 2M d|M (M−1)/2 2 (M−1)/2 2 + f(x)f(y)f (x )f (y ). (3.2) If M is even, then we have M/d d M/d d Orbits (x, y) = ϕ(d)f (x )f (y ) 2M d|M 1 1 2 (M−2)/2 2 M/2 2 2 (M−2)/2 2 M/2 2 + f (x)f (x )f (y ) + f (y)f (y )f (x ). (3.3) 4 4 d) Every term in the polynomial produced by (3.1) will be of the form in (2.17) or (2.18). The number of necklaces with N white beads and N black beads is given by the coefficient of AT GC N N AT GC the term x y . To calculate the total number of necklaces, simply sum over each of these terms appearing in the polynomial. REGULAR AND CHAOTIC DYNAMICS Vol. 23 No. 2 2018 10 Hillebrand et al. A Python computer code implementating this algorithm is presented in the Appendix. In order to illustrate the application of this algorithm let us consider a simple, but not trivial case: We set α = 2M = 10, N = 8, N = 6. Clearly M = 5 is odd, so identifying white beads AT GC with AT base pairs and black beads with GC base pairs, we have the cycle index 1 1 2 2 ˜ ˜ ˜ Z(D ) = Z(C ) + x y (x ) (y ) 10 5 1 1 2 2 2 2 1 1 5/d 5/d 2 2 = ϕ(d)(x ) (y ) + x y (x ) (y ) . (3.4) d d 1 1 2 2 5 2 d|5 Now the partitioned Po´lya Enumeration Theorem tells us that we can put the generating functions d d f x and f y in place of the x and y respectively to find the generating function of fixed W B d d orbits. So we have 2 3 5 2 3 5 Orbits (x, y) = 1(x + x + x + . . . ) (y + y + y + . . . ) 2 · 5 5 10 15 5 10 15 + 4(x + x + x + . . . )(y + y + y + . . . ) 2 2 4 2 2 2 4 2 + (x + x + . . . )(x + x + . . . ) (y + y + . . . )(y + y + . . . ) . (3.5) Let us first look at the cyclic part. Since 5 is prime, the only two integers that divide it are 1 and 5, so this polynomial will be 2 3 5 2 3 5 5 10 15 5 10 15 1(x + x + x + . . .) (y + y + y + . . .) + 4(x + x + x + . . .)(y + y + y + . . .) . 2 · 5 AT Now we try to extract the coefficients of terms that are allowed. These are the terms in x and GC y and we can use (2.17) in order to calculate these coefficients directly. In this case, there will 8 6 be no contribution from the second term, as there are no terms in x and y . So the total cyclic contribution will be (with r = 8 and r = 6 for the respective cases and a = 1, b = 5 for both) 1 N − 1 N − 1 1 5 7 175 GC AT = = . 10 5 − 1 5 − 1 10 4 4 10 Then the same coefficient identifying process can be followed for the reflective part. Now the polynomial is given by 2 2 4 2 2 2 4 2 (x + x + . . .)(x + x + . . .) (y + y + . . .)(y + y + . . .) . So for both x and y the coefficients will come from the product of two series, one of them squared. Thus, the relevant terms will come in a series of products given in (2.18). In y the sum of coefficients 1 1 contracts to a single element. That contribution is simply = 1. In x however, there will be 0 1 2 6 4 4 terms from x · x as well as x · x . So then, the sum will be 1 3 3 1 + = 4, 0 1 0 1 1 175 giving a total contribution of (1 + 4) + = 20. Thus there are 20 DNA chains with 8 AT base 2 10 pairs, 6 GC base pairs and 10 alternations. 4. Numerical Results The developed algorithm for calculating the number of distinct DNA chains having α alternations can be used to produce the pdf of α, P(α), which afterwards can be compared to pdfs numerically obtained from Monte-Carlo (MC) simulations. In Figs. 7(a) and (b) we present such pdfs for a DNA chain containing N = 100 base pairs. In particular, we consider the case of N = 40, AT N = 60 in Fig. 7(a) and the case of N = 50, N = 50 in Fig. 7(b). From Figs. 7(a) and (b) GC AT GC we clearly see that the results obtained by the algorithm presented in Sect. 3 (empty circles) and REGULAR AND CHAOTIC DYNAMICS Vol. 23 No. 2 2018 Po´lya Counting in Periodic DNA Chains 11 0.18 0.18 Monte Carlo Monte Carlo (a) (b) 0.16 0.16 Theoretical Theoretical N = 40 N = 50 AT AT 0.14 0.14 N = 60 N = 50 GC GC 0.12 0.12 0.10 0.10 0.08 0.08 0.06 0.06 0.04 0.04 0.02 0.02 0.00 0.00 0 20 40 60 80 100 0 20 40 60 80 100 α α 0.12 (c) 0.10 0.08 0.06 0.04 0.02 0.00 10000 20000 30000 MC Fig. 7. Comparison of the pdf P (α) of the number of alternations α, obtained by the algorithm presented in Sect. 3 [empty circles in panels (a) and (b)] and by randomly created DNA chains of N = 100 base pairs through MC simulations [filled stars in panels (a) and (b)]. The pdfs for N = 40, N = 60 and N = 50, AT GC AT N = 50 are presented in panels (a) and (b) respectively. The number of MC simulations used in (a) and GC (b) are N = 20000. (c) The evolution of the average total absolute difference hdi between the theoretically MC and the numerically obtained pdfs as a function of N for the case of N = 50, N = 50. The values of MC AT GC hdi are obtained as the average of the quantity (4.1) evaluated for 5 different sets of N runs. The error bars MC denote the corresponding standard deviations. by MC simulations of DNA chains with N = 100 base pairs (filled stars) agree very well. The slight differences between them are to be expected, as the number of possible chains is generally very large. For instance, in the case of N = 50, N = 50 and α = 50, the number of possible DNA AT GC chains is of the order of 10 possible necklaces. Thus, in general, the number of performed MC simulations cannot get close to the actual total number of possible chains. Nevertheless, although the results of Figs. 7(a) and (b) were obtained by only N = 20000 MC simulations they manage MC to capture the theoretically obtained pdf quite accurately. Of course it is expected that increasing the number of MC simulations will improve the accuracy of the numerical results. As a measure of this accuracy we can consider the total absolute difference d(N ) = |P (N , α) − P(α)|, (4.1) MC MC MC between the two distributions. In (4.1) P (N , α) is the probability of α alternations obtained MC MC by N MC simulations, P(α) is the one obtained theoretically, while the sum is performed over MC all possible values of α. From the results of Fig. 7(c) where we plot the averaged value of d(N ) MC over 5 sets of N MC simulations as a function of N we see that as the number of simulations MC MC increases, the numerical results get closer to the theoretical ones. REGULAR AND CHAOTIC DYNAMICS Vol. 23 No. 2 2018 P(α) P(α) 12 Hillebrand et al. The results of Fig. 7 clearly show that in order to study the dynamical properties of DNA chains, statistical analysis performed over a few thousands of MC generated random chains (even of the order of 5000) would suffice, as such numbers of MC simulations are enough for capturing quite accurately the influence of alternations on the system’s dynamics. The shape of the pdfs in Figs. 7(a) and (b) suggests that they could possibly be fitted by Gaussian distributions. This is actually true as we can see from the results of Fig. 8, where we performed such a fit for the theoretically obtained pdf of Fig. 7(b). The Gaussian approximation of 0.18 Fitted Gaussian 0.16 Theoretical 0.14 0.12 0.10 0.08 0.06 0.04 0.02 0.00 0 20 40 60 80 100 Fig. 8. Fitting by a Gaussian of the theoretical pdf of Fig. 7(b) (empty circles) with N = 50, N = 50. AT GC The mean of the Gaussian is α = 50.5 and standard deviation σ = 5.1. 0 α the pdfs has several advantages as it allows us to easily quantify the influence of different variables on the number of alternations. Let us first look at the effect of increasing the number of only one type of base pair, keeping constant the number of the other type of base pair. In Fig. 9 we present some pdfs of α for N = 100 and increasing values of N from 25 up to 2500. Starting from AT GC N = 100, N = 2500 N = 100, N = 75 0.30 AT GC AT GC N = 100, N = 500 N = 100, N = 50 AT GC AT GC N = 100, N = 100 N = 100, N = 25 AT GC AT GC 0.25 0.20 0.15 0.10 0.05 0.00 50 100 150 200 Fig. 9. Pdfs of α for fixed number of AT base pairs (N = 100) and increasing values of N . Points AT GC correspond to the theoretically obtained values of the pdfs, while curves correspond to the Gaussian fits of these points. Note that even for long DNA chains the value of α cannot exceed α = 200. REGULAR AND CHAOTIC DYNAMICS Vol. 23 No. 2 2018 P(α) P(α) Po´lya Counting in Periodic DNA Chains 13 small values of N , we find a very “lopsided” and narrow distribution which as N increases GC GC becomes gradually more symmetric and spreads out, up to a value of N = 200. Then, increasing GC N further, as the numbers of different types of base pairs become more dissimilar we again find GC gradually more unbalanced pdfs with sharp peaks. The very “lopsided” base pair distributions are obtained when the minority base pairs are significantly less than the majority ones and therefore are spread out and isolated among the others. In this case the distribution is sharply peaked around the corresponding maximum possible number of alternations. For the N = 100, N = 25 case AT GC this number is α = 50, while for the N = 100, N = 2500 case it is α = 200. AT GC 250 8 0.35 (b) (c) (a) 7 0.30 6 0.25 5 0.20 α σ 4 0.15 3 0.10 0 2 0.05 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 N N N GC GC GC Fig. 10. The effect of increasing the number N of the GC base pairs for a fixed number of AT base pairs GC (N = 100) on the Gaussian fit P (α) of the pdf values of α, and in particular on (a) the mean value α , AT G 0 (b) the standard deviation σ and (c) the maximum probability max [P (α)]. Some of these pdfs are shown α G in Fig. 9. These changes of the distributions are quantitatively presented in Fig. 10 through the variations of the fitted Gaussian characteristics. The increase of the mean value α of the Gaussian fits as the number N increases is shown in Fig. 10(a). The upper limit of α is 200, when N becomes GC 0 GC much larger than N . The dependence of the width (standard deviation) σ of the Gaussian fits AT α on N is depicted in Fig. 10(b). The initial increase with N corresponds to the spreading out of GC GC the distributions when the numbers of base pairs become more similar. Further increase of the N GC values pushes the pdfs to the other extreme and the lopsidedness comes through again, resulting in narrower distributions (see Fig. 9). This results in the decrease of σ for large values of N . α GC Finally in Fig. 10(c) we observe that as N increases the maximum probability of the pdfs initially GC decreases rapidly and then increases slowly, in accordance with the results of Fig. 9 and of course with the fact that it is inversely proportional to the standard deviation of the Gaussian fit. Let us now focus our attention on the effect of the increment of the total number of base pairs N = N + N , i.e. the total ‘length’ of the DNA chain, when the ratio N : N is kept AT GC GC AT constant. Such cases are presented in Fig. 11, where we plot several pdfs for different values of N but for fixed ratios N : N . In particular, the values of the ratios N : N are 1 : 1 in panel GC AT GC AT (a) (b) (c) 0.200.20 0.20 N = 1000 N = 900 N = 1050 N :N = 2 : 1 N :N = 6 : 1 GC AT GC AT N :N = 1 : 1 GC AT N = 400 N = 450 N = 700 0.150.15 0.15 N = 200 N = 150 N = 350 0.100.10 0.10 0.050.05 0.05 0.000.00 0.00 100 200 300 400 500 600 100 200 300 400 500 600 100 200 300 400 500 600 α α α Fig. 11. Pdfs of α for fixed ratios N : N = 1 : 1 (a), 2 : 1 (b) and 6 : 1 (c). Points correspond to the GC AT theoretically obtained values of the pdfs, while curves correspond to the Gaussian fits of these points. (a), 2 : 1 in (b) and 6 : 1 in (c). In all cases the pdfs are fitted by appropriate Gaussian distributions REGULAR AND CHAOTIC DYNAMICS Vol. 23 No. 2 2018 P(α) max[P ( )] G 14 Hillebrand et al. 500 18 0.9 (a) Ratio 6:1 Ratio 6:1 Ratio 6:1 (b) 16 0.8 (c) Ratio 2:1 Ratio 2:1 Ratio 2:1 14 0.7 Ratio 1:1 Ratio 1:1 Ratio 1:1 12 0.6 10 0.5 α σ 8 0.4 6 0.3 4 0.2 2 0.1 0 0 0.0 0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000 N N N Fig. 12. The effect of increasing the total number of base pairs N for fixed ratios N : N on the parameters GC AT of the Gaussian fit P (α) of the pdf for α: (a) the mean value α , (b) the standard deviation σ and (c) the G 0 α maximum probability max [P (α)]. Some of these pdfs are shown in Fig. 11. whose characteristics are plotted in Fig. 12 as a function of N. From the results of Figs. 11 and 12 we see that as the total number N of base pairs increases the pdfs become more broad, and consequently their maximum value decreases. This means that for large N more α values have a relatively high probability to appear in a randomly created DNA chain. In addition, increasing the ratio N : N results in a decrease of the spreading, as evidenced by the lower standard GC AT deviation in Fig. 12(b) and the higher maximum probability in Fig. 12(c). A linear relationship between N and the mean α is observed for all ratios, with the slope of the line influenced by the ratio. The slope m for each case is: m = 0.25 for ratio 6 : 1, m = 0.45 for 2 : 1 and m = 0.5 for 1 : 1. 5. Conclusions Motivated by the possibility that the number α of base pair alternations in a circular or periodic DNA chain might affect the dynamics of the system, we have found a probability distribution for this number. Algorithms for such distributions are known for linear DNA sequences with fixed boundary conditions [31]. The introduction of the periodic boundary conditions we consider in our study makes the counting of alternations a much more complicated problem due to the appearance of additional rotational and reflectional symmetries. To account for the additional complexity arising from these symmetries we have implemented Po´lya counting theory. In particular, extending Po´lya’s Enumeration Theorem for a partition-preserving group action on a partitioned set, we have constructed a well defined algorithm for calculating the number of DNA chains having a given number of alternations for particular values of the number of AT (N ) and GC (N ) base pairs. AT GC The obtained theoretical results were compared with numerically constructed pdfs through MC simulations. We found that, in general, creating a few thousands of random DNA chains (around 5000) by MC simulations we can approximate quite accurately the theoretical pdf of α. This means that a statistical analysis of these DNA chains will suffice to uncover the potential influence of heterogeneity on the dynamic behavior of the considered DNA model. In addition, approximating the obtained pdfs by Gaussians we investigated the effect of the number of the two base pairs, as well as their ratio on various characteristics of the pdfs, like their mean value, their standard deviation and their maximum. APPENDIX Here we present a Python computer code implementing the algorithm of Sect. 3. The function necklace count(n, B, W) returns the total number of possible necklaces under the symmetry constraints with 2n alternations, B black beads and W white beads. from math import gcd # Compute binomial c o e f f i c i e n t s in l i n e a r time . def binomial (n , k ) : i f k > n or k < 0: return 0 i f k = = 0: return 1 REGULAR AND CHAOTIC DYNAMICS Vol. 23 No. 2 2018 max[P ( )] G Po´lya Counting in Periodic DNA Chains 15 i f k > n //2: return binomial (n , n−k) return (n ∗ binomial (n−1, k−1)) // k # Compute the Euler t o t i e n t function \ phi (n ) , which # g i v e s the number of i n t e g e r s 0 < d <= n t h a t are # r e l a t i v e l y prime to n . def t o t i e n t (n ) : count = 0 for d in range (1 , n+1): i f gcd (d , n) = = 1: count += 1 return count # Get the xˆ r c o e f f i c i e n t of our weight generating f u n c t i o n s f ( xˆm)ˆn , # where : # f ( x ) = x + xˆ2 + xˆ3 + . . . def weight gf ( r , m, n ) : i f n = = 0: i f r = = 0: return 1 return 0 i f r%m != 0: return 0 i f ( r //m) < n : return 0 return binomial ( ( r // m)−1, n−1) # Get the xˆ r c o e f f i c i e n t of a binary product of weight generating # f u n c t i o n s f ( xˆm1)ˆ n1 ∗ f ( xˆm2)ˆn2 , where : # f ( x ) = x + xˆ2 + xˆ3 + . . . def b i n ar y w ei gh t gf ( r , m1, n1 , m2, n2 ) : t o t a l = 0 for i in range (1 , r ) : t o t a l += weight gf ( i , m1, n1 ) ∗ weight gf ( r−i , m2, n2 ) return t o t a l # Compute the number of necklaces up to d i h e d r a l symmetry with # 2n a l t e r n a t i o n s , B b l a c k beads and W white beads . def necklace count (n , B, W) : # F i r s t we count the c o n t r i b u t i o n s from the c y c l i c part # of the c y c l e index . count = 0 for d in range (1 , n+1): i f n%d != 0: continue count += t o t i e n t (d) ∗ weight gf (B, d , n//d) ∗ weight gf (W, d , n//d) # Next we count the c o n t r i b u t i o n s from the d i h e d r a l part # of the c y c l e index . i f n%2 == 0: count += ( weight gf (B, 2 , n//2) ∗ b i n ar y w ei gh t gf (W, 1 , 2 , 2 , (n−2)//2) ∗ (n //2)) count += ( weight gf (W, 2 , n//2) ∗ b i n ar y w ei gh t gf (B, 1 , 2 , 2 , (n−2)//2) ∗ (n //2)) REGULAR AND CHAOTIC DYNAMICS Vol. 23 No. 2 2018 16 Hillebrand et al. else : count += ( b i n ar y w ei gh t gf (B, 1 , 1 , 2 , (n−1)//2) ∗ b i n ar y w ei gh t gf (W, 1 , 1 , 2 , (n−1)//2) ∗ n) return count // (2∗n) Acknowledgements M.H. and G.P-J. acknowledge financial assistance from the National Research Foundation (NRF) of South Africa towards this research. G.K. and Ch.S. were supported by the Erasmus+/ International Credit Mobility KA107 program. Ch.S. acknowledges support by the NRF of South Africa (IFRR and CPRR Programmes), the UCT (URC Conference Travel Grant) and thanks Hans-Peter Kunzi for useful discussions. REFERENCES 1. Alberts B., Bray D., Hopkin K., Johnson A., Lewis J., Raff M., Roberts K., Walter P., Essential Cell Biology, 2nd Ed., Garland Science 2004. 2. Alexandrov, B.S., Gelev, V., Monisova, Y., Alexandrov, L.B., Bishop, A.R., Rasmussen, K.Ø., Usheva, A., Nucleic Acids Res. 37, 2405 (2009). 3. Alexandrov, B.S., Gelev, V., Yoo, S.W., Bishop, A.R., Rasmussen, K.Ø., Usheva, A., PLoS Comput. Biol. 5, e1000313 (2009). 4. Alexandrov, A.S., Gelev, V., Yoo, S.W., Alexandrov, L.B., Fukuyo, Yayoi. Bishop, A.R., Rasmussen, K.Ø., Usheva, A., Nucleic Acids Res. 38, 1790 (2010). 5. Apostolaki, A., Kalosakas, G., Phys. Biol. 8, 026006 (2011). 6. Ares, S., Voulgarakis, N.K., Rasmussen, K.Ø., Bishop, A.R., Phys. Rev. Lett. 94, 035504 (2005). 7. Ares, S., Kalosakas, G., Nano Lett. 7, 307 (2007). 8. Brualdi, R. A., Po´lya Counting. In: Introductory Combinatorics, 5th ed., Upper Saddle River, NJ: Prentice Hall, 2010 9. Burnside, W., Theory of groups of finite order, Cambridge: Cambridge University Press, 1897. 10. Chetverikov, A.P., Ebeling, W., Lakhno, V.D., Shigaev A.S., Velarde, M.G., Eur. Phys. J. B 89, 101 (2016). 11. Choi, C.H., Kalosakas, G., Rasmussen, K.Ø., Hiromura, M., Bishop, A.R., Usheva, A., Nucleic Acids Res. 32, 1584 (2004). 12. Choi, C.H., Rapti, Z., Gelev, V., Hacker, M.R., Alexandrov, B.S., Park, E.J., Park, J.S., Horikoshi, N., Smerzi, A., Rasmussen, K.Ø., Bishop, A.R., Usheva, A., Biophys. J. 95, 597 (2008). 13. Dauxois, T., Peyrard, M., Bishop, A.M, Phys. Rev. E 47, 684 (1993). 14. Hennig, D., Eur. Phys. J. B 30, 211 (2002). 15. Herstein, I. N., Abstract Algebra, 3rd ed., Wiley, 1999. 16. Huang, H.-H., Lindblad, P., J. Biol. Eng. 7, 10 (2013). 17. Kalosakas, G., Phys. Rev. E 84, 051905 (2011). 18. Kalosakas, G., Ares, S., J. Chem. Phys. 130, 235104 (2009). 19. Kalosakas, G., Ngai, K.L., Flach, S., Phys. Rev. E 71, 061901 (2005). 20. Kalosakas, G., Rasmussen, K.Ø., Bishop, A.R., Choi, C.H., Usheva, A., Europhys. Lett. 68, 127 (2004). 21. Kalosakas, G., Rasmussen, K.Ø., Bishop, A.R., Chem. Phys. Lett. 432, 291 (2006). 22. Kolpakov, R., Bana, G., Kucherov, G., Nuc. Ac. Res., 31, 3672 (2003) 23. Lewin B., Genes VIII, Pearson Prentice Hall 2004. 24. Li, W., Computers Chem. 21, 257 (1997). 25. van Lint, J. H., Wilson, R. M., Po´lya theory of counting. In: A Course in Combinatorics, Cambridge: Cambridge University Press, 1992 26. Nowak-Lovato, K., Alexandrov, L.B., Banisadr, A., Bauer, A.L., Bishop, A.R., Usheva, A., Mu, F., Hong-Geller, E., Rasmussen, K.Ø., Hlavacek, W.S., Alexandrov, B.S., PLoS Comput. Biol. 9, e1002881 (2013). 27. Peyrard, M., Nonlinearity 17, R1 (2004). 28. Peyrard, M., Fargo, J., Physica A 288, 199 (2000). 29. Po´lya G., Read R. C., Chemical Compounds. In: Combinatorial Enumeration of Groups, Graphs, and Chemical Compounds., New York: Springer, 1987. 30. R´egnier, M., Disc. App. Math. 104, 259 (2000). 31. Robin, S., Daudin, J.J., Journ. Appl. Prob. 36, 179 (1999) REGULAR AND CHAOTIC DYNAMICS Vol. 23 No. 2 2018 Po´lya Counting in Periodic DNA Chains 17 32. Robin, S., Schbath, S., Journ. Comp. Biol. 8, 349 (2001). 33. Tabi, C.B., Dang Koko, A., Oumarou Doko, R., Ekobena Fouda, H.P., Kofane, T.C., Physica A 442, 498 (2016). 34. Schbath, S., ESAIM: Probability and Statistics 1, 1 (1995). 35. Schbath, S., Prum, B., de Turckheim, E., Journ. Comp. Biol. 2, 417 (1995). 36. Skokos, Ch., Hillebrand, M., Schwellnus, A., Kalosakas, G., in preparation, (2018). 37. Tapia-Rojo, R., Mazo, J.J., Falo, F., Phys. Rev. E 82, 031916 (2010). 38. Tapia-Rojo, R., Mazo, J.J., Hernandez, J.A., Peleato, M.L., Fillat, M.F., Falo, F., PLoS Comput. Biol. 10, e1003835 (2014). 39. Theodorakopoulos, N., Phys. Rev. E 77, 031919 (2008). 40. Voulgarakis, N.K., Kalosakas, G., Rasmussen, K.Ø., Bishop, A.R., Nano Lett. 4, 629 (2004). 41. Yakushevich, L.V., Nonlinear Physics of DNA, 2nd Ed., Wiley-VCH, 2004. 42. Zariski O., Samuel P., Polynomial and Power Series Rings. In: Commutative Algebra. Graduate Texts in Mathematics, vol 29. Berlin: Springer, 1960. 43. Zoli, M., J. Phys.: Condens. Matter 24, 195103 (2012). 44. Zoli, M., J. Theor. Biol. 354, 95 (2014). REGULAR AND CHAOTIC DYNAMICS Vol. 23 No. 2 2018 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Statistics arXiv (Cornell University)

Distribution of Base Pair Alternations in a Periodic DNA Chain: Application of Polya Counting to a Physical System

Loading next page...
 
/lp/arxiv-cornell-university/distribution-of-base-pair-alternations-in-a-periodic-dna-chain-LorNzC1F0v

References (57)

ISSN
1560-3547
eISSN
ARCH-3347
DOI
10.1134/S1560354718020016
Publisher site
See Article on Publisher Site

Abstract

ISSN 1560-3547, Regular and Chaotic Dynamics, 2018, Vol. 23, No. 2, pp. 1–16. c Pleiades Publishing, Ltd., 2018. Distribution of Base Pair Alternations in a Periodic DNA Chain: Application of Po´lya Counting to a Physical System 1* 1 Malcolm Hillebrand , Guy Paterson-Jones , 2 1 George Kalosakas , and Charalampos Skokos Department of Mathematics and Applied Mathematics, University of Cape Town, Rondebosch, Cape Town 7701, South Africa Department of Materials Science, University of Patras, Rio GR-26504, Greece Received October 13 2017; accepted December 11, 2017 Abstract—In modeling DNA chains, the number of alternations between Adenine-Thymine (AT) and Guanine-Cytosine (GC) base pairs can be considered as a measure of the heterogeneity of the chain, which in turn could affect its dynamics. A probability distribution function of the number of these alternations is derived for circular or periodic DNA. Since there are several symmetries to account for in the periodic chain, necklace counting methods are used. In particular, Po´lya’s Enumeration Theorem is extended for the case of a group action that preserves partitioned necklaces. This, along with the treatment of generating functions as formal power series, allows for the direct calculation of the number of possible necklaces with a given number of AT base pairs, GC base pairs and alternations. The theoretically obtained probability distribution functions of the number of alternations are accurately reproduced by Monte Carlo simulations and fitted by Gaussians. The effect of the number of base pairs on the characteristics of these distributions is also discussed, as well as the effect of the ratios of the numbers of AT and GC base pairs. MSC2010 numbers: 05A15, 92D20 DOI: 10.0000/S1560354718000013 Keywords: DNA models, Po´lya’s Counting Theorem, Heterogeneity, Necklace Combinatorics 1. Introduction Single circular DNA molecules are abundant in nature. The whole genome in a typical bacterium is usually contained in a closed DNA molecule, while in eucaryotes the organelle DNA, inside the mitochondria and chloroplasts, is also found in the same form [1, 23]. Also plasmids, either naturally found in bacteria, or used as vectors in gene cloning, are smaller circular DNA segments. Apart from these cases, in considering the dynamics and other properties of DNA chains, it is often useful to model the chain using periodic boundary conditions in order to avoid finite size or edge effects. For example, periodic boundary conditions have been used to study denaturation bubbles and the melting behavior of DNA [2, 6, 13, 37, 39, 43], probability distributions of thermal openings in the double strand [7, 18], bubble opening profiles in promoter regions which regulate gene transcription [3–5, 11, 12, 16, 20], binding sites of DNA-associated proteins [26, 38], various dynamical and nonlinear properties of DNA [21, 27, 28, 40, 41, 44], as well as charge transport in DNA [10, 14, 17, 19, 33]. A DNA chain consists of a series of base pairs, where each base pair is either Adenine-Thymine (AT) or Guanine-Cytosine (GC). Currently, we are investigating the influence of different factors on the chaoticity of periodic DNA chains [36]. One of the examined quantities is the number of base pair alternations, which can be considered as a quantifier of the system’s heterogeneity. In this work we focus on the rigorous mathematical treatment of alternation counting in periodic DNA sequences. To study periodic DNA, we will consider the DNA necklace associated to a DNA chain, E-mail: malcolm.hillebrand@gmail.com arXiv:1805.06245v1 [math.CO] 16 May 2018 2 Hillebrand et al. where the first and the last base pairs in the chain will become neighbors. This periodicity presents some modeling challenges - if one considers two distinct chains of DNA, it may still be the case that their corresponding necklaces are the same, as one may be merely a rotation or reflection of the other. Such symmetries need to be addressed if any conclusions are to be made about the structure and the dynamics of DNA necklaces. In particular, we are concerned with the number α of base pair alternations in the necklace, where an alternation is defined to be a point at which an AT base pair neighbors a GC base pair or vice versa. Consider, for instance, the DNA chain shown in Fig. 1. Representing a GC base pair (black bead) with a 0 and an AT base pair (white bead) with a 1, the 0 0 0 0 1 0 1 1 0 0 1 Fig. 1. An example of a DNA chain. GC base pairs are represented by black beads and the number 0, while AT base pairs are represented by white beads and the number 1. In the DNA necklace corresponding to this chain, the AT base pair at the far right neighbors the GC base pair at the far left. ¯¯¯ ¯ ¯¯ chain can be written in the form (1)00001011001(0). Here, we have given the leftmost base pair at each alternation point an overbar, and used brackets to denote the fact that in the corresponding DNA necklace the first and last base pairs are neighbors. This necklace is illustrated in Fig. 2, and counting the number of overbars we see that there are α = 6 alternations. Fig. 2. The DNA necklace corresponding to the chain of Fig. 1. This necklace has α = 6 alternations. It is worth noting that a base pair alternation corresponds to the appearance of the particular sequences (often referred to as “words”) 01 or 10 in a DNA chain. Word occurrence probabilities have already been studied in the literature (see e.g. [22, 24, 30–32, 34, 35] and references therein), with emphasis on the appearance of patterns with unexpectedly high or low frequencies, as well as on repeating sequences. However these studies concern the case of linear DNA segments, or in other words DNA chains with fixed boundary conditions. The periodic boundary conditions we consider in our study make the problem of counting alternations (or more generally the appearance of specific words) in circular DNA segments much more complicated than in the case of linear DNA segments due to the appearance of additional symmetries in the DNA structures imposed by rotations and/or reflections. Each base pair in a DNA necklace can contribute at most 2 alternations, depending on which neighbors it differs from. Supposing that the number of AT and GC base pairs in the necklace is given by N and N respectively, this yields the restriction 0 ≤ α ≤ min{2N , 2N }. We AT GC AT GC note that in the extreme case of a homogeneous chain composed of base pairs of the same kind α = 0, while if both types of base pairs are present in the DNA chain the smallest possible value of alternations is α = 2. The later corresponds to a chain having all AT (and consequently GC) base pairs grouped together. Furthermore, if we traverse the necklace pair by pair until we end up where we started, we must necessarily switch between AT and GC base pairs an even number of times. Thus α = 2M for some M ∈ N. Now the natural question is: what is the probability that a random DNA necklace with a specified number of AT and GC base pairs, N and N respectively, has a specified number of AT GC alternations α? Or in other words, how many possible combinations of such base pairs are there REGULAR AND CHAOTIC DYNAMICS Vol. 23 No. 2 2018 Po´lya Counting in Periodic DNA Chains 3 that yield α alternations once the cyclic and reflective symmetries are taken into account? In what follows we answer these questions and provide an algorithm for computing the number of distinct DNA necklaces satisfying these constraints. The paper is organized in the following way: In Sect. 2, the mathematical background is laid out, leading into a Po´lya Enumeration Theorem for bipartite sets. In Sect. 3 an explicit algorithm for calculating the number of distinct DNA necklaces with given values of α, N and N is AT GC described, while in Sect. 4 we compare the theoretical results to those obtained from Monte-Carlo simulations and investigate the effect of the N and N values on the characteristics of the AT GC probability distribution function (pdf) of α. Finally, in Sect. 5 we summarize our results, while in the Appendix we provide a Python computer code implementing the algorithm of Sect. 3. 2. Theoretical Treatment Our problem can be neatly related to the combinatorics of necklaces. Effectively, we are interested in the number of distinct necklaces with N = N + N beads, where N of the beads are white, AT GC AT N of the beads are black, and there are α alternations between the colors. We consider necklaces GC to be the same if they can be reflected or rotated into one another, and beads of the same color are treated as indistinguishable. Because of this, we can equivalently think of a necklace with α alternations as a necklace of α containers, where each container carries some number of black or white beads of the same color, and adjacent containers have different colors. This idea is illustrated in Fig. 3. Fig. 3. The necklace of containers corresponding to the DNA necklace of Fig. 2. The numbers in each container represent the number of consecutive black or white beads in that segment of the necklace. We will refer to containers carrying black beads as black containers, and similarly for white containers. Counting the number of distinct necklaces with the given constraints can thus be reformulated as the problem of assigning numbers of beads to α containers, such that the total of the numbers in the black and white containers is equal to N and N respectively. Two such GC AT assignments will be considered equivalent if the containers can be rotated or reflected into one another in such a way as to preserve both the colors and numbers of beads they contain. Enumerating such assignments is simpler than enumerating necklaces, as we have one less constraint - the number of alternations is now implicit in the formulation of the problem. To perform this enumeration we will require some tools from Po´lya counting theory - in particular, we will need a version of the Po´lya Enumeration Theorem for sets partitioned into two parts, which we will refer to as bipartite sets. For completeness’ sake, we present this material below. 2.1. Group Actions Let A be a set. Then we define the symmetric group on A to be the set of permutations of A: S = {ϕ : A → A | ϕ is a bijection}. (2.1) A cycle is a permutation ϕ ∈ S such that there exist distinct elements {x , x , . . . , x } ∈ A and: A 1 2 k x if x = x for some 1 ≤ i < k i+1 i ϕ(x) = (2.2) x if x = x 1 k x otherwise. REGULAR AND CHAOTIC DYNAMICS Vol. 23 No. 2 2018 4 Hillebrand et al. We denote such a cycle suggestively as (x x . . . x ), and say that ϕ ∈ S is a k-cycle if 1 2 k A ϕ = (x x . . . x ) for some x ∈ S . Two cycles (x x . . . x ) and (y y . . . y ) are said 1 2 k i A 1 2 k 1 2 l to be disjoint if the sets {x , x , . . . , x } and {y , y , . . . , y } are disjoint. 1 2 k 1 2 l If A is a finite set, every element of S can be written as a composition of cycles; in general, however, this cannot be done uniquely. On the other hand, we have the following fundamental structure theorem for elements of finite symmetric groups (see for example [15]): Theorem (Cycle Decomposition Theorem). If A is a finite set, then every element ϕ ∈ S can be written as a product of pairwise disjoint cycles, unique up to order of the cycles: ϕ = (x x . . . x ) · · · (x x . . . x ). 11 12 1k n1 n2 nk 1 n Given a group G and a set A, a group action of G on A is a homomorphism Γ : G → S . In G A other words, elements of G are identified with permutations of A in a manner that preserves the group structure. To simplify the notation, we will write gx instead of Γ (g)(x) for the action of g ∈ G on some x ∈ A. The orbit of an element x ∈ A under the group action Γ is defined to be the set Orb = {gx | G x g ∈ G}, and its stabilizer is given by the subgroup Stab = {g ∈ G | gx = x}. Given some g ∈ G, we denote its set of fixed points by Fix = {x ∈ A | gx = x}. 2.2. Po´lya’s Counting Theory One can often rephrase counting problems in terms of computing the number of distinct orbits of some group action. Po´lya’s counting theory can be thought of as a tool for making these computations systematic and expedient. A fundamental lemma on which this theory is built is the following [9]: Lemma 1 (Burnside’s Lemma). The number of distinct orbits in a group action of a finite group G on A is given by the average number of fixed points of elements of G: #Orbits = |Fix |. (2.3) |G| g∈G A basic problem in combinatorics is the following. Suppose one has a finite set of objects A, and one wishes to color them with colors from another set Ω. How many distinct ways are there of coloring the objects up to some kind of symmetry? This can be recast in the language of group actions. The set of possible colorings is given by Ω = {ϕ : A → Ω | ϕ a function}, and the symmetry is given by a group action Γ on A. This group action passes naturally to a group action Γ on Ω , defined by gϕ : x 7→ ϕ(gx). The question now reduces to counting the number of distinct orbits of this latter action. In this simplified case, Burnside’s lemma is often sufficient to answer the question. We can generalize this problem slightly, however. Suppose that each color has an associated weight, given by a function ω : Ω → N. Given a coloring ϕ : A → Ω of the objects, we define its total weight to be the sum: |ϕ| = ω ◦ ϕ(x). (2.4) x∈A How many distinct colorings of A with a given total weight are there, up to symmetries given by some group action Γ ? Note that the total weight of any coloring in a given orbit is the same, as elements of g merely permute the set A. Thus, the problem boils down to calculating the number of distinct orbits with a given total weight. Po´lya identified two necessary ingredients for a systematic answer to this question: generating functions, and an understanding of the cycle structure of elements of G [29]. Definition (Generating Function). Let ω : Ω → N be an assignment of weights to some set Ω. Suppose further that there are at most a finite number of elements of any given weight, that is, −1 |ω (n)| is finite for every n ∈ N. Then the generating function of ω is given by the polynomial: −1 i f (x) = |ω (i)| x . (2.5) i=0 REGULAR AND CHAOTIC DYNAMICS Vol. 23 No. 2 2018 Po´lya Counting in Periodic DNA Chains 5 Generating functions are useful as they encode combinatorial data - in this case the number of colors of a given weight - as algebraic objects. In particular, we will need the following lemma: Lemma 2. Let ω : Ω → N and ω : Ω → N be assignments of weights to the sets Ω and Ω 1 1 2 2 1 2 respectively. Define an assignment of weights to the set Ω × Ω by ω : (x , x ) 7→ ω (x ) + ω (x ). 1 2 1 2 1 1 2 2 Then f (x) = f (x) · f (x). ω ω ω 1 2 Given a group action Γ and an element g ∈ G, we denote by C (g) the number of k-cycles in G k the unique disjoint cycle decomposition of Γ (g). We can now encode information about the cycle structure of elements of G in the following multivariate polynomial: Definition (Cycle Index). Let G be a finite group. Then the cycle index of a group action Γ on a finite set A of cardinality n is given by the polynomial [8]: C (g) C (g) 1 2 C (g) Z (x , x , . . . , x ) = x x · · · x . (2.6) G 1 2 n 1 2 |G| g∈G This cycle index will allow us to efficiently compute the number of distinct orbits of the group action. With this in mind, we are now in a position to state a version of the Po´lya counting theorem, answering the generalized problem given earlier: Theorem (Po´lya Enumeration Theorem). Let A be a finite set of objects, Ω a set of colors, ω : Ω → N an assignment of weights to the colors with generating function f , and Γ a group ω G action of a finite group G on A. Then Γ passes naturally to a group action Γ on Ω , and a G G generating function by total weight for the number of distinct orbits of Γ is given by: 2 n Orbits (x) = Z f (x), f (x ), . . . , f (x ) . (2.7) ˜ G w w w 2.3. Po´lya Enumeration Theorem for Bipartite Sets By considering multivariate generating functions, the Po´lya enumeration theorem can be generalized to the case where the colors take weights in N . We will generalize the theorem in a different direction, however. Suppose we have a partition of A into two parts, A = X ⊔ Y , and a group action Γ on A. We would like to consider the problem of counting distinct colorings of A under this symmetry, with the additional constraint that we color elements of X from a set Ω , and elements of Y from a set Ω . To this end, we will say that a coloring ϕ : A → Ω ⊔ Ω is valid Y X Y if ϕ(x) ∈ Ω ⇐⇒ x ∈ X and ϕ(x) ∈ Ω ⇐⇒ x ∈ Y . X Y There is an obstruction to this, however - the group action may map elements in X to elements in Y or vice versa. In this case, the extension of Γ to the set of possible colorings is no longer well-defined, as there is no natural way to compare the sets of colors Ω and Ω . Fortunately, X Y this is the only obstruction to proving a Po´lya-type theorem for this problem. This motivates the following definition: Definition (Partition-Preserving Group Action). Let A = X ⊔ Y , and let Γ be a group action on A. Then we say that Γ is partition-preserving if for every g ∈ G, gx ∈ X ⇐⇒ x ∈ X and gx ∈ Y ⇐⇒ x ∈ Y . The importance of this property is as follows. Suppose we have a group action Γ on A = X ⊔ Y , and some element g ∈ G. Then Γ (g) has a unique disjoint cycle decomposition given by Γ (g) = G G C · C · . . . · C . If Γ is partition-preserving then each cycle C is contained entirely in either X 1 2 G i or Y , and Γ is in fact partition-preserving if and only if this is the case for every g ∈ G. If Γ is partition-preserving, then we define C (g) to be the number of k-cycles in the disjoint cycle decomposition of Γ (g) that are contained in X, and we define C (g) analogously. We will now define an analogue of the cycle index polynomial for the case of partition-preserving group actions. This will allow us to keep track of the cycle structure of elements of the group as well as which partition part each cycle acts on: REGULAR AND CHAOTIC DYNAMICS Vol. 23 No. 2 2018 6 Hillebrand et al. Definition (Bipartite Cycle Index). Let G be a finite group and A = X ⊔ Y a finite set of cardinality n. Then the bipartite cycle index of a partition-preserving group action Γ on A is defined to be the polynomial: 1 X X Y Y C (g) C (g) C (g) C (g) 1 n 1 n Z (x , . . . , x , y , . . . , y ) = x · · · x y · · · y . (2.8) G 1 n 1 n 1 n 1 n |G| g∈G We can now generalize Po´lya’s theorem to the case of partition-preserving group actions. We note that this theorem is used implicitly in [29] without proof. Theorem 1 (Bipartite Po´lya Enumeration Theorem). Let Γ be a partition preserving group action of a finite group G on a finite set A = X ⊔ Y . Let Ω = Ω ⊔ Ω be a set of colors, and let X Y + + ω : Ω → N and ω : Ω → N be their assigned weights with respective generating functions X X Y Y f and f . If Φ is the set of valid colorings of A, then Γ passes naturally to a group action Γ X Y G G on Φ, and a generating function by total weight for the number of orbits of Γ is given by: k k Orbits (x) = Z f (x), . . . , f (x ), f (x), . . . , f (x ) . (2.9) ˜ G X X Y Y Proof. We pass to a group action Γ on Φ as follows. Given a valid coloring ϕ ∈ Φ and an element g ∈ G, we define the action of g on ϕ by gϕ : x 7→ ϕ(gx). To compute a generating function for the number of orbits of Γ by total weight, we will determine the generating functions for the number of fixed points of each g ∈ G by total weight. Consider some g ∈ G. As A is finite, there exists a unique disjoint cycle decomposition Γ (g) = C · C · . . . · C , where each C is a cycle in the symmetric group S . Now suppose that g fixes 1 2 k i A some valid coloring ϕ ∈ Φ; that is, gϕ = ϕ. Then, assuming the cycle C = (x x . . . x ) for some i 1 2 k x ∈ A, we have by definition that ϕ(x ) = (gϕ)(x ) = ϕ(gx ) = ϕ(x ), and hence every element i i i i i+1 in the cycle must have the same color under ϕ. The number of colorings of C that are fixed by g is k k i i thus given by the generating function f (x ) if C lies in X, and f (x ) if C lies in Y . We note X i Y i that one of these two cases must occur for every cycle as Γ is partition-preserving. By lemma 2, then, the number of valid colorings of A that are fixed by g is given by the generating function: X X Y Y C (g) C C C k k 1 k 1 k Fix (x) = f (x) · · · f (x )f (x) · · · f (x ). (2.10) X X Y Y By Burnside’s lemma, the number of orbits of Γ of a particular weight is given by the average number of fixed colorings of that weight by elements g ∈ G. Applying Burnside’s lemma for each possible weight, the number of orbits of Γ is thus given by the generating function: Orbits (x) = Fix (x) |G| g∈G X Y X Y 1 C C C (g) C k k 1 k 1 k = f (x) · · · f (x )f (x) · · · f (x ) X X Y Y |G| g∈G k k = Z f (x), . . . , f (x ), f (x), . . . , f (x ) . (2.11) G X X Y Y We note that as a corollary of this proof, we can recover a bivariate generating function from a b this expression, where the coefficient of x y represents the number of distinct colorings with total weight a in Ω , and total weight b in Ω : X Y Corollary. A bivariate generating function by total weight in Ω and Ω , for the number of X Y distinct colorings of A, is given by: k k Orbits (x, y) = Z f (x), . . . , f (x ), f (y), . . . , f (y ) . (2.12) ˜ G X X Y Y REGULAR AND CHAOTIC DYNAMICS Vol. 23 No. 2 2018 Po´lya Counting in Periodic DNA Chains 7 2.4. The Dihedral Group, its Cycle Index and its Extension To apply these results to the problem of counting distinct DNA necklaces, we will need to describe the relevant group action and compute its (bipartite) cycle index. The set of elements acted on by the group is given by the α containers in the DNA necklace and this set can be partitioned into two groups: containers of black beads and containers of white beads. We consider two DNA necklaces to be the same if one can be rotated or reflected into the other. These symmetries can be described by an action of the dihedral group, which we will denote by D , where we have α = 2M. The 2M rotational and reflective symmetries are what distinguishes the case of periodic DNA chains from linear, fixed boundary condition chains studied in [31] and elsewhere. A fundamental fact about D is that it is generated by two elements r and s, where r is a 2M reflection satisfying r = 1, and s is a rotation of order M. Therefore, to describe a group action of D on a DNA necklace it suffices to give the action of r and s. In Fig. 4 the action of such a 2M rotation on the necklace is illustrated, while in Figs. 5 and 6 the action of a reflection is illustrated for the cases where M is odd and even respectively. It is clear that the resulting group action is partition-preserving. Fig. 4. The action of a rotation s ∈ D on the DNA necklace. 2M To compute the bipartite cycle index of this group action, we will treat reflections and rotations separately. To begin with, we can see from Fig. 4 that rotations act symmetrically on the black and white containers in the DNA necklace. Thus, the terms of the cycle index polynomial corresponding to rotations will be symmetric in the x and y . The natural action of the cyclic group C on the i i M M containers in a partition is given by [25]: M/d Z (x , . . . , x ) = ϕ(d)x , (2.13) C 1 M M d d|M where ϕ(d) is defined to be the number of natural numbers less that d that are coprime to it (the Euler totient function). Note that 1 is considered to be coprime to all natural numbers, and so in particular ϕ(d) > 0. Exactly half of the elements of D are rotations, and thus the rotational part 2M M/d M/d of the bipartite cycle index Z is given by ϕ(d)x y . 2M d|M 2 d d The reflective part of the group D , on the other hand, acts differently depending on the parity 2M of M. Suppose first that M is odd, in which case a typical reflection is illustrated in Fig. 5. Each of the M possible reflections occur across an axis consisting of one black container and one white container, both of which are fixed by the reflection. The rest of the containers are split into 2-cycles, and thus the bipartite cycle index Z for odd M is given by: 2M 1 1 M/d M/d (M−1)/2 (M−1)/2 Z (x , . . . , x , y , . . . , y ) = ϕ(d)x y + x y x y . (2.14) D 1 M 1 M 1 1 2M 2 2 d d 2 2 d|M If M is even, a typical reflection is illustrated in Fig. 6. In this case, each possible reflection occurs across an axis consisting of either two white containers or two black containers. The rest of the containers again split into 2-cycles. Thus the bipartite cycle index Z for even M is given by: 2M 1 1 1 M/d M/d (M−2)/2 M/2 (M−2)/2 M/2 2 2 Z (x , . . . , x , y , . . . , y ) = ϕ(d)x y + x x y + y y x . D 1 M 1 M 1 1 2M d d 2 2 2 2 2 4 4 d|M (2.15) REGULAR AND CHAOTIC DYNAMICS Vol. 23 No. 2 2018 8 Hillebrand et al. Fig. 5. The action of a reflection r ∈ D on the DNA necklace, for the case where M is odd. 2M Fig. 6. The action of a reflection r ∈ D on the DNA necklace, for the case where M is even. 2M 2.5. Generating Functions as Formal Power Series In our particular application of Po´lya theory, the elements we are coloring are the α containers in the DNA necklace and the color of a particular container is defined to be the number of black or white beads it contains. As each container must contain at least one bead, the set of colors is given by N . We are interested in the total number of black and white beads, so the weight of each color will be given quite simply by ω(n) = n for each n ∈ N . This weighting corresponds to the 2 3 generating function (2.5) f (x) = x + x + x + · · · . To compute the number of distinct DNA necklaces with N white beads and N black beads, AT GC N N AT GC we need to calculate the coefficient of x y in (2.12), where the bivariate cycle index is given by the appropriate Z(D ) from Sect. 2.4 and the weight generating function is given by f (x). 2M ω n 2 3 n This requires us to calculate the coefficients of specific terms in f (x) = (x + x + x + . . . ) for potentially large n. However, doing this expansion naively requires many computing steps, whose number grows exponentially fast as n increases. Thus, this approach is impractical. Fortunately, there exists a way to bypass this problem: treating f (x) as a formal power series, we can manipulate it into a form that makes such computations significantly faster. An introduction to the theory of formal power series can be found, for instance, in [42]. For our purposes, we will only need the fact that a form of the binomial theorem holds in this setting: −n n Lemma 3. Letting (1 − x) denote the formal inverse of (1 − x) , we have: n + k − 1 −n k (1 − x) = x . (2.16) n − 1 k=0 This implies the following useful lemma regarding powers of f (x): ∞ n+k−1 n n n+k Lemma 4. As a formal power series f (x) can be written as f (x) = x . ω ω k=0 n−1 2 3 Proof. Note that xf (x) = x + x + · · · = f (x) − x. Rearranging this for f (x), we see that ω ω ω −1 n n −n f (x) = x(1 − x) , and hence f (x) = x (1 − x) . The result now follows from lemma 3. ω ω REGULAR AND CHAOTIC DYNAMICS Vol. 23 No. 2 2018 Po´lya Counting in Periodic DNA Chains 9 In contrast to naively expanding powers of f (x), computing binomial coefficients is computation- ally inexpensive, taking at most a linear number of steps in n. We now list a few results that will come in handy later, when we describe an explicit algorithm for computing the number of distinct DNA necklaces with the given constraints. r a b Lemma 5. The coefficient of x in f (x ) is given by: 1 if b = 0 and a = 0 h i 0 if b = 0 and a > 0 a b f (x ) = (2.17) 0 if b > 0 and a ∤ r or r < ab r/a−1 otherwise. b−1 r a b a b 1 1 2 2 Lemma 6. The coefficient of x in f (x ) · f (x ) is given by: ω ω h i h i h i a b a b a b a b 1 1 2 2 1 1 2 2 f (x ) · f (x ) = f (x ) f (x ) . (2.18) ω ω ω ω r k r−k k=0 3. The Algorithm for Computing the Number of Distinct Valid Necklaces Now we are able to evaluate the number of distinct necklaces, which correspond to a particular value of alternations α. The algorithm is fairly straightforward and efficient. Its implementation requires the following steps: a) Set constraint parameters, N , N , and α = 2M. AT GC b) Choose partitioned cycle index polynomial of the Dihedral group based on parity of M. If M is odd, use (2.14), while for M even use (2.15). c) By the corollary to Po´lya’s Enumeration Theorem (2.12), we know that the number of necklaces, up to symmetry, is given by k k Orbits (x, y) = Z f (x), . . . , f (x ), f (y), . . . , f (y ) . (3.1) ˜ G X X Y Y If M is odd using the outcome of the previous step we get M/d d M/d d Orbits (x, y) = ϕ(d)f (x )f (y ) 2M d|M (M−1)/2 2 (M−1)/2 2 + f(x)f(y)f (x )f (y ). (3.2) If M is even, then we have M/d d M/d d Orbits (x, y) = ϕ(d)f (x )f (y ) 2M d|M 1 1 2 (M−2)/2 2 M/2 2 2 (M−2)/2 2 M/2 2 + f (x)f (x )f (y ) + f (y)f (y )f (x ). (3.3) 4 4 d) Every term in the polynomial produced by (3.1) will be of the form in (2.17) or (2.18). The number of necklaces with N white beads and N black beads is given by the coefficient of AT GC N N AT GC the term x y . To calculate the total number of necklaces, simply sum over each of these terms appearing in the polynomial. REGULAR AND CHAOTIC DYNAMICS Vol. 23 No. 2 2018 10 Hillebrand et al. A Python computer code implementating this algorithm is presented in the Appendix. In order to illustrate the application of this algorithm let us consider a simple, but not trivial case: We set α = 2M = 10, N = 8, N = 6. Clearly M = 5 is odd, so identifying white beads AT GC with AT base pairs and black beads with GC base pairs, we have the cycle index 1 1 2 2 ˜ ˜ ˜ Z(D ) = Z(C ) + x y (x ) (y ) 10 5 1 1 2 2 2 2 1 1 5/d 5/d 2 2 = ϕ(d)(x ) (y ) + x y (x ) (y ) . (3.4) d d 1 1 2 2 5 2 d|5 Now the partitioned Po´lya Enumeration Theorem tells us that we can put the generating functions d d f x and f y in place of the x and y respectively to find the generating function of fixed W B d d orbits. So we have 2 3 5 2 3 5 Orbits (x, y) = 1(x + x + x + . . . ) (y + y + y + . . . ) 2 · 5 5 10 15 5 10 15 + 4(x + x + x + . . . )(y + y + y + . . . ) 2 2 4 2 2 2 4 2 + (x + x + . . . )(x + x + . . . ) (y + y + . . . )(y + y + . . . ) . (3.5) Let us first look at the cyclic part. Since 5 is prime, the only two integers that divide it are 1 and 5, so this polynomial will be 2 3 5 2 3 5 5 10 15 5 10 15 1(x + x + x + . . .) (y + y + y + . . .) + 4(x + x + x + . . .)(y + y + y + . . .) . 2 · 5 AT Now we try to extract the coefficients of terms that are allowed. These are the terms in x and GC y and we can use (2.17) in order to calculate these coefficients directly. In this case, there will 8 6 be no contribution from the second term, as there are no terms in x and y . So the total cyclic contribution will be (with r = 8 and r = 6 for the respective cases and a = 1, b = 5 for both) 1 N − 1 N − 1 1 5 7 175 GC AT = = . 10 5 − 1 5 − 1 10 4 4 10 Then the same coefficient identifying process can be followed for the reflective part. Now the polynomial is given by 2 2 4 2 2 2 4 2 (x + x + . . .)(x + x + . . .) (y + y + . . .)(y + y + . . .) . So for both x and y the coefficients will come from the product of two series, one of them squared. Thus, the relevant terms will come in a series of products given in (2.18). In y the sum of coefficients 1 1 contracts to a single element. That contribution is simply = 1. In x however, there will be 0 1 2 6 4 4 terms from x · x as well as x · x . So then, the sum will be 1 3 3 1 + = 4, 0 1 0 1 1 175 giving a total contribution of (1 + 4) + = 20. Thus there are 20 DNA chains with 8 AT base 2 10 pairs, 6 GC base pairs and 10 alternations. 4. Numerical Results The developed algorithm for calculating the number of distinct DNA chains having α alternations can be used to produce the pdf of α, P(α), which afterwards can be compared to pdfs numerically obtained from Monte-Carlo (MC) simulations. In Figs. 7(a) and (b) we present such pdfs for a DNA chain containing N = 100 base pairs. In particular, we consider the case of N = 40, AT N = 60 in Fig. 7(a) and the case of N = 50, N = 50 in Fig. 7(b). From Figs. 7(a) and (b) GC AT GC we clearly see that the results obtained by the algorithm presented in Sect. 3 (empty circles) and REGULAR AND CHAOTIC DYNAMICS Vol. 23 No. 2 2018 Po´lya Counting in Periodic DNA Chains 11 0.18 0.18 Monte Carlo Monte Carlo (a) (b) 0.16 0.16 Theoretical Theoretical N = 40 N = 50 AT AT 0.14 0.14 N = 60 N = 50 GC GC 0.12 0.12 0.10 0.10 0.08 0.08 0.06 0.06 0.04 0.04 0.02 0.02 0.00 0.00 0 20 40 60 80 100 0 20 40 60 80 100 α α 0.12 (c) 0.10 0.08 0.06 0.04 0.02 0.00 10000 20000 30000 MC Fig. 7. Comparison of the pdf P (α) of the number of alternations α, obtained by the algorithm presented in Sect. 3 [empty circles in panels (a) and (b)] and by randomly created DNA chains of N = 100 base pairs through MC simulations [filled stars in panels (a) and (b)]. The pdfs for N = 40, N = 60 and N = 50, AT GC AT N = 50 are presented in panels (a) and (b) respectively. The number of MC simulations used in (a) and GC (b) are N = 20000. (c) The evolution of the average total absolute difference hdi between the theoretically MC and the numerically obtained pdfs as a function of N for the case of N = 50, N = 50. The values of MC AT GC hdi are obtained as the average of the quantity (4.1) evaluated for 5 different sets of N runs. The error bars MC denote the corresponding standard deviations. by MC simulations of DNA chains with N = 100 base pairs (filled stars) agree very well. The slight differences between them are to be expected, as the number of possible chains is generally very large. For instance, in the case of N = 50, N = 50 and α = 50, the number of possible DNA AT GC chains is of the order of 10 possible necklaces. Thus, in general, the number of performed MC simulations cannot get close to the actual total number of possible chains. Nevertheless, although the results of Figs. 7(a) and (b) were obtained by only N = 20000 MC simulations they manage MC to capture the theoretically obtained pdf quite accurately. Of course it is expected that increasing the number of MC simulations will improve the accuracy of the numerical results. As a measure of this accuracy we can consider the total absolute difference d(N ) = |P (N , α) − P(α)|, (4.1) MC MC MC between the two distributions. In (4.1) P (N , α) is the probability of α alternations obtained MC MC by N MC simulations, P(α) is the one obtained theoretically, while the sum is performed over MC all possible values of α. From the results of Fig. 7(c) where we plot the averaged value of d(N ) MC over 5 sets of N MC simulations as a function of N we see that as the number of simulations MC MC increases, the numerical results get closer to the theoretical ones. REGULAR AND CHAOTIC DYNAMICS Vol. 23 No. 2 2018 P(α) P(α) 12 Hillebrand et al. The results of Fig. 7 clearly show that in order to study the dynamical properties of DNA chains, statistical analysis performed over a few thousands of MC generated random chains (even of the order of 5000) would suffice, as such numbers of MC simulations are enough for capturing quite accurately the influence of alternations on the system’s dynamics. The shape of the pdfs in Figs. 7(a) and (b) suggests that they could possibly be fitted by Gaussian distributions. This is actually true as we can see from the results of Fig. 8, where we performed such a fit for the theoretically obtained pdf of Fig. 7(b). The Gaussian approximation of 0.18 Fitted Gaussian 0.16 Theoretical 0.14 0.12 0.10 0.08 0.06 0.04 0.02 0.00 0 20 40 60 80 100 Fig. 8. Fitting by a Gaussian of the theoretical pdf of Fig. 7(b) (empty circles) with N = 50, N = 50. AT GC The mean of the Gaussian is α = 50.5 and standard deviation σ = 5.1. 0 α the pdfs has several advantages as it allows us to easily quantify the influence of different variables on the number of alternations. Let us first look at the effect of increasing the number of only one type of base pair, keeping constant the number of the other type of base pair. In Fig. 9 we present some pdfs of α for N = 100 and increasing values of N from 25 up to 2500. Starting from AT GC N = 100, N = 2500 N = 100, N = 75 0.30 AT GC AT GC N = 100, N = 500 N = 100, N = 50 AT GC AT GC N = 100, N = 100 N = 100, N = 25 AT GC AT GC 0.25 0.20 0.15 0.10 0.05 0.00 50 100 150 200 Fig. 9. Pdfs of α for fixed number of AT base pairs (N = 100) and increasing values of N . Points AT GC correspond to the theoretically obtained values of the pdfs, while curves correspond to the Gaussian fits of these points. Note that even for long DNA chains the value of α cannot exceed α = 200. REGULAR AND CHAOTIC DYNAMICS Vol. 23 No. 2 2018 P(α) P(α) Po´lya Counting in Periodic DNA Chains 13 small values of N , we find a very “lopsided” and narrow distribution which as N increases GC GC becomes gradually more symmetric and spreads out, up to a value of N = 200. Then, increasing GC N further, as the numbers of different types of base pairs become more dissimilar we again find GC gradually more unbalanced pdfs with sharp peaks. The very “lopsided” base pair distributions are obtained when the minority base pairs are significantly less than the majority ones and therefore are spread out and isolated among the others. In this case the distribution is sharply peaked around the corresponding maximum possible number of alternations. For the N = 100, N = 25 case AT GC this number is α = 50, while for the N = 100, N = 2500 case it is α = 200. AT GC 250 8 0.35 (b) (c) (a) 7 0.30 6 0.25 5 0.20 α σ 4 0.15 3 0.10 0 2 0.05 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 N N N GC GC GC Fig. 10. The effect of increasing the number N of the GC base pairs for a fixed number of AT base pairs GC (N = 100) on the Gaussian fit P (α) of the pdf values of α, and in particular on (a) the mean value α , AT G 0 (b) the standard deviation σ and (c) the maximum probability max [P (α)]. Some of these pdfs are shown α G in Fig. 9. These changes of the distributions are quantitatively presented in Fig. 10 through the variations of the fitted Gaussian characteristics. The increase of the mean value α of the Gaussian fits as the number N increases is shown in Fig. 10(a). The upper limit of α is 200, when N becomes GC 0 GC much larger than N . The dependence of the width (standard deviation) σ of the Gaussian fits AT α on N is depicted in Fig. 10(b). The initial increase with N corresponds to the spreading out of GC GC the distributions when the numbers of base pairs become more similar. Further increase of the N GC values pushes the pdfs to the other extreme and the lopsidedness comes through again, resulting in narrower distributions (see Fig. 9). This results in the decrease of σ for large values of N . α GC Finally in Fig. 10(c) we observe that as N increases the maximum probability of the pdfs initially GC decreases rapidly and then increases slowly, in accordance with the results of Fig. 9 and of course with the fact that it is inversely proportional to the standard deviation of the Gaussian fit. Let us now focus our attention on the effect of the increment of the total number of base pairs N = N + N , i.e. the total ‘length’ of the DNA chain, when the ratio N : N is kept AT GC GC AT constant. Such cases are presented in Fig. 11, where we plot several pdfs for different values of N but for fixed ratios N : N . In particular, the values of the ratios N : N are 1 : 1 in panel GC AT GC AT (a) (b) (c) 0.200.20 0.20 N = 1000 N = 900 N = 1050 N :N = 2 : 1 N :N = 6 : 1 GC AT GC AT N :N = 1 : 1 GC AT N = 400 N = 450 N = 700 0.150.15 0.15 N = 200 N = 150 N = 350 0.100.10 0.10 0.050.05 0.05 0.000.00 0.00 100 200 300 400 500 600 100 200 300 400 500 600 100 200 300 400 500 600 α α α Fig. 11. Pdfs of α for fixed ratios N : N = 1 : 1 (a), 2 : 1 (b) and 6 : 1 (c). Points correspond to the GC AT theoretically obtained values of the pdfs, while curves correspond to the Gaussian fits of these points. (a), 2 : 1 in (b) and 6 : 1 in (c). In all cases the pdfs are fitted by appropriate Gaussian distributions REGULAR AND CHAOTIC DYNAMICS Vol. 23 No. 2 2018 P(α) max[P ( )] G 14 Hillebrand et al. 500 18 0.9 (a) Ratio 6:1 Ratio 6:1 Ratio 6:1 (b) 16 0.8 (c) Ratio 2:1 Ratio 2:1 Ratio 2:1 14 0.7 Ratio 1:1 Ratio 1:1 Ratio 1:1 12 0.6 10 0.5 α σ 8 0.4 6 0.3 4 0.2 2 0.1 0 0 0.0 0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000 N N N Fig. 12. The effect of increasing the total number of base pairs N for fixed ratios N : N on the parameters GC AT of the Gaussian fit P (α) of the pdf for α: (a) the mean value α , (b) the standard deviation σ and (c) the G 0 α maximum probability max [P (α)]. Some of these pdfs are shown in Fig. 11. whose characteristics are plotted in Fig. 12 as a function of N. From the results of Figs. 11 and 12 we see that as the total number N of base pairs increases the pdfs become more broad, and consequently their maximum value decreases. This means that for large N more α values have a relatively high probability to appear in a randomly created DNA chain. In addition, increasing the ratio N : N results in a decrease of the spreading, as evidenced by the lower standard GC AT deviation in Fig. 12(b) and the higher maximum probability in Fig. 12(c). A linear relationship between N and the mean α is observed for all ratios, with the slope of the line influenced by the ratio. The slope m for each case is: m = 0.25 for ratio 6 : 1, m = 0.45 for 2 : 1 and m = 0.5 for 1 : 1. 5. Conclusions Motivated by the possibility that the number α of base pair alternations in a circular or periodic DNA chain might affect the dynamics of the system, we have found a probability distribution for this number. Algorithms for such distributions are known for linear DNA sequences with fixed boundary conditions [31]. The introduction of the periodic boundary conditions we consider in our study makes the counting of alternations a much more complicated problem due to the appearance of additional rotational and reflectional symmetries. To account for the additional complexity arising from these symmetries we have implemented Po´lya counting theory. In particular, extending Po´lya’s Enumeration Theorem for a partition-preserving group action on a partitioned set, we have constructed a well defined algorithm for calculating the number of DNA chains having a given number of alternations for particular values of the number of AT (N ) and GC (N ) base pairs. AT GC The obtained theoretical results were compared with numerically constructed pdfs through MC simulations. We found that, in general, creating a few thousands of random DNA chains (around 5000) by MC simulations we can approximate quite accurately the theoretical pdf of α. This means that a statistical analysis of these DNA chains will suffice to uncover the potential influence of heterogeneity on the dynamic behavior of the considered DNA model. In addition, approximating the obtained pdfs by Gaussians we investigated the effect of the number of the two base pairs, as well as their ratio on various characteristics of the pdfs, like their mean value, their standard deviation and their maximum. APPENDIX Here we present a Python computer code implementing the algorithm of Sect. 3. The function necklace count(n, B, W) returns the total number of possible necklaces under the symmetry constraints with 2n alternations, B black beads and W white beads. from math import gcd # Compute binomial c o e f f i c i e n t s in l i n e a r time . def binomial (n , k ) : i f k > n or k < 0: return 0 i f k = = 0: return 1 REGULAR AND CHAOTIC DYNAMICS Vol. 23 No. 2 2018 max[P ( )] G Po´lya Counting in Periodic DNA Chains 15 i f k > n //2: return binomial (n , n−k) return (n ∗ binomial (n−1, k−1)) // k # Compute the Euler t o t i e n t function \ phi (n ) , which # g i v e s the number of i n t e g e r s 0 < d <= n t h a t are # r e l a t i v e l y prime to n . def t o t i e n t (n ) : count = 0 for d in range (1 , n+1): i f gcd (d , n) = = 1: count += 1 return count # Get the xˆ r c o e f f i c i e n t of our weight generating f u n c t i o n s f ( xˆm)ˆn , # where : # f ( x ) = x + xˆ2 + xˆ3 + . . . def weight gf ( r , m, n ) : i f n = = 0: i f r = = 0: return 1 return 0 i f r%m != 0: return 0 i f ( r //m) < n : return 0 return binomial ( ( r // m)−1, n−1) # Get the xˆ r c o e f f i c i e n t of a binary product of weight generating # f u n c t i o n s f ( xˆm1)ˆ n1 ∗ f ( xˆm2)ˆn2 , where : # f ( x ) = x + xˆ2 + xˆ3 + . . . def b i n ar y w ei gh t gf ( r , m1, n1 , m2, n2 ) : t o t a l = 0 for i in range (1 , r ) : t o t a l += weight gf ( i , m1, n1 ) ∗ weight gf ( r−i , m2, n2 ) return t o t a l # Compute the number of necklaces up to d i h e d r a l symmetry with # 2n a l t e r n a t i o n s , B b l a c k beads and W white beads . def necklace count (n , B, W) : # F i r s t we count the c o n t r i b u t i o n s from the c y c l i c part # of the c y c l e index . count = 0 for d in range (1 , n+1): i f n%d != 0: continue count += t o t i e n t (d) ∗ weight gf (B, d , n//d) ∗ weight gf (W, d , n//d) # Next we count the c o n t r i b u t i o n s from the d i h e d r a l part # of the c y c l e index . i f n%2 == 0: count += ( weight gf (B, 2 , n//2) ∗ b i n ar y w ei gh t gf (W, 1 , 2 , 2 , (n−2)//2) ∗ (n //2)) count += ( weight gf (W, 2 , n//2) ∗ b i n ar y w ei gh t gf (B, 1 , 2 , 2 , (n−2)//2) ∗ (n //2)) REGULAR AND CHAOTIC DYNAMICS Vol. 23 No. 2 2018 16 Hillebrand et al. else : count += ( b i n ar y w ei gh t gf (B, 1 , 1 , 2 , (n−1)//2) ∗ b i n ar y w ei gh t gf (W, 1 , 1 , 2 , (n−1)//2) ∗ n) return count // (2∗n) Acknowledgements M.H. and G.P-J. acknowledge financial assistance from the National Research Foundation (NRF) of South Africa towards this research. G.K. and Ch.S. were supported by the Erasmus+/ International Credit Mobility KA107 program. Ch.S. acknowledges support by the NRF of South Africa (IFRR and CPRR Programmes), the UCT (URC Conference Travel Grant) and thanks Hans-Peter Kunzi for useful discussions. REFERENCES 1. Alberts B., Bray D., Hopkin K., Johnson A., Lewis J., Raff M., Roberts K., Walter P., Essential Cell Biology, 2nd Ed., Garland Science 2004. 2. Alexandrov, B.S., Gelev, V., Monisova, Y., Alexandrov, L.B., Bishop, A.R., Rasmussen, K.Ø., Usheva, A., Nucleic Acids Res. 37, 2405 (2009). 3. Alexandrov, B.S., Gelev, V., Yoo, S.W., Bishop, A.R., Rasmussen, K.Ø., Usheva, A., PLoS Comput. Biol. 5, e1000313 (2009). 4. Alexandrov, A.S., Gelev, V., Yoo, S.W., Alexandrov, L.B., Fukuyo, Yayoi. Bishop, A.R., Rasmussen, K.Ø., Usheva, A., Nucleic Acids Res. 38, 1790 (2010). 5. Apostolaki, A., Kalosakas, G., Phys. Biol. 8, 026006 (2011). 6. Ares, S., Voulgarakis, N.K., Rasmussen, K.Ø., Bishop, A.R., Phys. Rev. Lett. 94, 035504 (2005). 7. Ares, S., Kalosakas, G., Nano Lett. 7, 307 (2007). 8. Brualdi, R. A., Po´lya Counting. In: Introductory Combinatorics, 5th ed., Upper Saddle River, NJ: Prentice Hall, 2010 9. Burnside, W., Theory of groups of finite order, Cambridge: Cambridge University Press, 1897. 10. Chetverikov, A.P., Ebeling, W., Lakhno, V.D., Shigaev A.S., Velarde, M.G., Eur. Phys. J. B 89, 101 (2016). 11. Choi, C.H., Kalosakas, G., Rasmussen, K.Ø., Hiromura, M., Bishop, A.R., Usheva, A., Nucleic Acids Res. 32, 1584 (2004). 12. Choi, C.H., Rapti, Z., Gelev, V., Hacker, M.R., Alexandrov, B.S., Park, E.J., Park, J.S., Horikoshi, N., Smerzi, A., Rasmussen, K.Ø., Bishop, A.R., Usheva, A., Biophys. J. 95, 597 (2008). 13. Dauxois, T., Peyrard, M., Bishop, A.M, Phys. Rev. E 47, 684 (1993). 14. Hennig, D., Eur. Phys. J. B 30, 211 (2002). 15. Herstein, I. N., Abstract Algebra, 3rd ed., Wiley, 1999. 16. Huang, H.-H., Lindblad, P., J. Biol. Eng. 7, 10 (2013). 17. Kalosakas, G., Phys. Rev. E 84, 051905 (2011). 18. Kalosakas, G., Ares, S., J. Chem. Phys. 130, 235104 (2009). 19. Kalosakas, G., Ngai, K.L., Flach, S., Phys. Rev. E 71, 061901 (2005). 20. Kalosakas, G., Rasmussen, K.Ø., Bishop, A.R., Choi, C.H., Usheva, A., Europhys. Lett. 68, 127 (2004). 21. Kalosakas, G., Rasmussen, K.Ø., Bishop, A.R., Chem. Phys. Lett. 432, 291 (2006). 22. Kolpakov, R., Bana, G., Kucherov, G., Nuc. Ac. Res., 31, 3672 (2003) 23. Lewin B., Genes VIII, Pearson Prentice Hall 2004. 24. Li, W., Computers Chem. 21, 257 (1997). 25. van Lint, J. H., Wilson, R. M., Po´lya theory of counting. In: A Course in Combinatorics, Cambridge: Cambridge University Press, 1992 26. Nowak-Lovato, K., Alexandrov, L.B., Banisadr, A., Bauer, A.L., Bishop, A.R., Usheva, A., Mu, F., Hong-Geller, E., Rasmussen, K.Ø., Hlavacek, W.S., Alexandrov, B.S., PLoS Comput. Biol. 9, e1002881 (2013). 27. Peyrard, M., Nonlinearity 17, R1 (2004). 28. Peyrard, M., Fargo, J., Physica A 288, 199 (2000). 29. Po´lya G., Read R. C., Chemical Compounds. In: Combinatorial Enumeration of Groups, Graphs, and Chemical Compounds., New York: Springer, 1987. 30. R´egnier, M., Disc. App. Math. 104, 259 (2000). 31. Robin, S., Daudin, J.J., Journ. Appl. Prob. 36, 179 (1999) REGULAR AND CHAOTIC DYNAMICS Vol. 23 No. 2 2018 Po´lya Counting in Periodic DNA Chains 17 32. Robin, S., Schbath, S., Journ. Comp. Biol. 8, 349 (2001). 33. Tabi, C.B., Dang Koko, A., Oumarou Doko, R., Ekobena Fouda, H.P., Kofane, T.C., Physica A 442, 498 (2016). 34. Schbath, S., ESAIM: Probability and Statistics 1, 1 (1995). 35. Schbath, S., Prum, B., de Turckheim, E., Journ. Comp. Biol. 2, 417 (1995). 36. Skokos, Ch., Hillebrand, M., Schwellnus, A., Kalosakas, G., in preparation, (2018). 37. Tapia-Rojo, R., Mazo, J.J., Falo, F., Phys. Rev. E 82, 031916 (2010). 38. Tapia-Rojo, R., Mazo, J.J., Hernandez, J.A., Peleato, M.L., Fillat, M.F., Falo, F., PLoS Comput. Biol. 10, e1003835 (2014). 39. Theodorakopoulos, N., Phys. Rev. E 77, 031919 (2008). 40. Voulgarakis, N.K., Kalosakas, G., Rasmussen, K.Ø., Bishop, A.R., Nano Lett. 4, 629 (2004). 41. Yakushevich, L.V., Nonlinear Physics of DNA, 2nd Ed., Wiley-VCH, 2004. 42. Zariski O., Samuel P., Polynomial and Power Series Rings. In: Commutative Algebra. Graduate Texts in Mathematics, vol 29. Berlin: Springer, 1960. 43. Zoli, M., J. Phys.: Condens. Matter 24, 195103 (2012). 44. Zoli, M., J. Theor. Biol. 354, 95 (2014). REGULAR AND CHAOTIC DYNAMICS Vol. 23 No. 2 2018

Journal

StatisticsarXiv (Cornell University)

Published: May 16, 2018

There are no references for this article.