Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

The structure of approximate groups

The structure of approximate groups by EMMANUEL BREUILLARD, BEN GREEN, and TERENCE TAO ABSTRACT Let K  1 be a parameter. A K-approximate group is a finite set A in a (local) group which contains the identity, is symmetric, and such that A · A is covered by K left translates of A. The main result of this paper is a qualitative description of approximate groups as being essentially finite-by- nilpotent, answering a conjecture of H. Helfgott and E. Lindenstrauss. This may be viewed as a generalisation of the Freiman-Ruzsa theorem on sets of small doubling in the integers to arbitrary groups. We begin by establishing a correspondence principle between approximate groups and locally compact (local) groups that allows us to recover many results recently established in a fundamental paper of Hrushovski. In particular we establish that approximate groups can be approximately modeled by Lie groups. To prove our main theorem we apply some additional arguments essentially due to Gleason. These arose in the solution of Hilbert’s fifth problem in the 1950s. Applications of our main theorem include a finitary refinement of Gromov’s theorem, as well as a generalized Margulis lemma conjectured by Gromov and a result on the virtual nilpotence of the fundamental group of Ricci almost nonnegatively curved manifolds. CONTENTS 1. Introduction ...................................................... 115 2. Coset nilprogressions and a more detailed version of the Main Theorem .................... 124 3. Ultra approximate groups and Hrushovski’s Lie Model Theorem ......................... 130 4. An outline of the argument .............................................. 140 5. Sanders-Croot-Sisask theory . ............................................ 142 6. Proof of the Hrushovski Lie model theorem .................................... 148 7. Strong approximate groups ............................................. 160 8. The escape norm and a Gleason type theorem ................................... 163 9. Proof of the main theorem .............................................. 172 10. A dimension bound . ................................................. 183 11. Applications to growth in groups and geometry .................................. 190 Acknowledgements ..................................................... 200 Appendix A: Basic theory of ultralimits and ultraproducts ............................... 200 Appendix B: Local groups ................................................. 205 Appendix C: Nilprogressions and related objects .................................... 213 References ......................................................... 219 1. Introduction Approximate groups. — A fair proportion of the subject of additive combinatorics is concerned with approximate analogues of exact algebraic properties, and the extent to which they resemble those algebraic properties. In this paper we are concerned with sets that are approximately closed under multiplication, which we do not necessarily assume to be commutative, and more specifically with approximate groups. These are finite non- empty sets A with group-like properties which we shall state precisely later. First we will motivate the definition of an approximate group with some discussion and examples. Suppose first of all that A is a finite subset of some ambient group G = (G, ·). This is the setting considered in essentially all of the existing literature, and the one of DOI 10.1007/s10240-012-0043-9 116 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO importance in applications. However, as we shall see later, our method of proof is in fact more naturally adapted to a more general setting, in which A lies in a local group rather than a global one. It is easy to see that a finite non-empty subset A of G is a genuine subgroup if, and −1 only if, we have xy ∈ A whenever x, y ∈ A. Perhaps the most natural way in which a −1 −1 set A may be approximately a subgroup, then, is if the set A · A := {xy : x, y ∈ A} has cardinality not much bigger than the cardinality of A: for example, we might ask that −1 |A · A |  K|A| for some constant K. Sets with this property or with the closely related property |A |  K|A|,where A := A · A ={xy : x, y ∈ A}, are said to have small doubling, and this is indeed a commonly encountered condition in various fields of mathematics, in particular in additive combi- natorics. It is a perfectly workable notion of approximate group in the abelian setting and the celebrated Freiman-Ruzsa theorem, Theorem 2.1 below, describes subsets of Z with this property. However in [52] it was noted that in non-commutative settings a somewhat different, though closely related, notion of approximate group is more natural: A is an −1 approximate group if it is symmetric in the sense that the identity id lies in A, if a ∈ A whenever a ∈ A, and if A · A is covered by K left-translates of A. As suggested above we consider in this paper a slightly more general (and perhaps more natural, in retrospect) “local” definition of approximate group in which there is no ambient global group G. It will be convenient to introduce the following definition. This requires the concept of a local group, which is discussed at some length in Appendix B. Definition 1.1 (Multiplicative set). — A multiplicative set is a finite non-empty set A con- −1 200 tained in a (symmetric) local group G = (G, ·), such that the product set (A ∪ A ) is well-defined, −1 −1 where A := {a : a ∈ A} is the inverse of A. Strictly speaking, one should refer to the pair (A, G) as the multiplicative set rather than just A, but we will usually abuse notation and omit the ambient local group G. In some (abelian) examples, we will use additive group notation G = (G, +) rather than mul- tiplicative notation G = (G, ·). In such cases, we will refer to multiplicative sets as additive sets instead. Clearly, any finite non-empty subset of a (global) group G is a multiplicative set. The reader should probably keep this model case in mind throughout a first reading of this paper. Indeed the additional generality afforded by the local setting is only needed at a single, albeit critical, place in the argument in Section 9. One should informally think of a multiplicative set A as a set that behaves “as if ” it were in a global group, so long as one only works “locally” in the sense that one only considers products of up to 200 elements of A and their inverses. The exponent 200 in Definition 1.1 is somewhat arbitrary, but for the purposes of studying approximate groups, the exact choice of this exponent is not important in practice, so long as it is at least 8 (see Theorem 5.3 for a precise formalisation of this assertion). For the reader familiar with Freiman homomorphisms (cf. [54, §5.3]), we remark that these are essentially the morphisms in the category of multiplicative sets. THE STRUCTURE OF APPROXIMATE GROUPS 117 Definition 1.2 (Approximate groups). — Let K  1.A K-approximate group is a multi- plicative set A with the following properties: −1 (i) the set A is symmetric in the sense that id ∈ A and a ∈ A if a ∈ A; (ii) there is a symmetric subset X ⊂ A with |X|  K such that A · A ⊆ X · A. We will sometimes refer to actual (global) groups as genuine groups, in order to distinguish them from approximate groups. We define a global K-approximate group to be a K-approximate group A that lies inside a global group G. We refer to K as the covering parameter of the approximate group A. Remark 1.3. — We will also have occasion to deal with infinite K-approximate groups, which are defined exactly as ordinary K-approximate groups, except that they arenolongerrequiredtobefinitesets. AconvexbodyinaEuclideanspace,orasmall ball in a Lie group, are examples of infinite approximate groups. Later we will introduce the important notion of an ultra approximate group, which is another example. However, by default, approximate groups in this paper will be understood to be finite unless otherwise stated. The connection between sets with small doubling and the apparently stronger property of being an approximate group was worked out in [52], building on work of Ruzsa [45]; see Remark 1.5 below. When we speak of an “approximate group” we shall generally imagine that K is fixed (e.g. K = 10) and that |A| is large. Let us give some examples. Example 1 (Finite group). — A 1-approximate group is the same thing as a finite group. Example 2 (Arithmetic/geometric progression). —If N ∈ N is a natural number, then the arithmetic progression P(1; N) := {−N,..., N} (which one can view inside the (additive) global group Z, or the local group {−200N,..., 200N}) is a 2-approximate group. More generally, if G = (G, ·) is any (global) group and g ∈ G then the geometric progression −N N P(g, N) := {g ,..., g } is a 2-approximate group. Example 3 (Generalised arithmetic progression). —Let G = (G, +) be an abelian group, let u ,..., u ∈ Gfor some r  0, and let N ,..., N > 0 be real numbers. We refer to the 1 r 1 r set P(u ,..., u ; N ,..., N ) 1 r 1 r := {n u + ··· + n u : n ,..., n ∈ Z;|n |  N ,..., |n |  N } 1 1 r r 1 r 1 1 r r as a generalised arithmetic progression of rank r. One easily verifies that this is a 2 -approximate group. 118 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO Example 4 (Homomorphic images). —Let φ : G → H be a homomorphism between local or global groups. If A is a K-approximate subgroup of G, then φ(A) is a K- approximate subgroup of H. This observation can be generalised to the case when φ is a Freiman homomorphism (of order 3) rather than a group homomorphism; see [54, §5.3] for more discussion. Indeed, Freiman homomorphisms are very similar to homomorphisms of local groups, although for technical reasons we will rely on the latter concept rather than the former. Conversely, if B is a K-approximate subgroup of H, φ is surjective, and ker(φ) is −1 finite, then φ (B) is a K-approximate subgroup of G. In the latter case one can view the −1 K-approximate group φ (B) as a “finite extension” of the K-approximate group B by the genuine group ker(φ). Example 5 (Large subsets). — Let A be a K-approximate group, and let A be a sym- metric neighbourhood of the identity in A such that A is covered by K left-translates of A .Then A is a KK -approximate group. This hints that approximate groups are con- siderably more numerous than genuine groups, because the latter property is preserved under passage to “large” subsets, whereas the former is not. Example 6 (Heisenberg example). — Let G be the free nilpotent group of step 2 gen- erated by two generators u , u . More concretely, one can take G to be the Heisenberg 1 2 group ⎛ ⎞ 1 ZZ ⎝ ⎠ (1.1)G := 01 Z 00 1 with generators ⎛ ⎞ ⎛ ⎞ 100 110 ⎝ ⎠ ⎝ ⎠ 011 010 u := and u := . 1 2 001 001 Consider also the commutator ⎛ ⎞ −1 −1 ⎝ ⎠ [u , u ]:= u u u u = 010 ; 2 1 2 1 2 1 one has ⎛ ⎞ 1 n n 1 12 n n n 1 2 ⎝ ⎠ 01 n = u u [u , u ] 2 2 1 1 2 00 1 for all integers n , n , n . 1 2 12 THE STRUCTURE OF APPROXIMATE GROUPS 119 Let N , N  10 be real numbers. Define the nilprogression P(u , u ; N , N ) to be 1 2 1 2 1 2 −1 −1 −1 the set of all words in u , u , u , u that involve at most N occurrences of u , u and 1 2 1 1 1 2 1 −1 at most N occurrences of u , u . It is not difficult to verify that P(u , u ; N , N ) is a 2 2 1 2 1 2 symmetric neighbourhood of the identity which contains the set n n 1 2 12 u u [u , u ] :|n |  N /10, |n |  N /10, |n |  N N /10 2 1 1 1 2 2 12 1 2 1 2 and is contained in the set n n 1 2 n u u [u , u ] :|n |  10N , |n |  10N , |n |  10N N . 2 1 1 1 2 2 12 1 2 1 2 One can easily verify that P(u , u ; N , N ) is a K-approximate group for some absolute 1 2 1 2 constant K (for instance, one could take K = 100). Remark 1.4. — The above example was constructed inside the Heisenberg group. Later on we will discuss a generalisation of this example to arbitrary nilpotent groups. These examples, which we will call nilprogressions, will be needed to state the precise ver- sion of our main theorem (Theorem 2.10) below. We will define them later in this intro- duction. Example 7 (Direct products). — The direct product of a K -approximate group and a K -approximate group is a K K -approximate group, and so one may build up examples 2 1 2 of approximate groups using both subgroups and nilprogressions. Example 8 (Helfgott’s example). — The following example of Helfgott is a less obvious way of combining a subgroup and a nilprogression. Let A ⊆ GL (F ) be the following set of 3 × 3 matrices: 3 p ⎧ ⎫ ⎛ ⎞ ⎨ r xz ⎬ ⎝ ⎠ A := 0 s y : x, y, z ∈ F , −N  n  N . ⎩ ⎭ −n 00 (rs) Here, r, s ∈ F are fixed and N is large yet much smaller than p. Then A is a O(1)-approximate group. Note that A has the following form: it admits a subgroup H, normalised by A, such that A/H is a geometric progression. Indeed ⎧ ⎛ ⎞ ⎫ ⎨ 1 xz ⎬ ⎝ ⎠ H = 01 y : x, y, z ∈ F . ⎩ ⎭ In the language of Example 4, A is a finite extension of a geometric progression by the finite group H. See terrytao.wordpress.com/2009/06/21/freimans-theorem-for-solvable-groups/#comment-39705. 120 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO Each of the above examples was rather “algebraic” in nature, whereas the defini- tion of approximate group is somewhat combinatorial. We also have some more combina- torial criteria for generating approximate groups using sets of small doubling or tripling. Remark 1.5 (Relationship between small doubling and approximate groups). —Let A bea non-empty finite subset of a global group G. If |A |  K|A|, then the set H := (A∪{id}∪ −1 2 2 O(1) A ) is aO(K )-approximate group that contains A; see [52, Theorem 3.9]. In a 2 −1 O(1) similar vein if |A |  K|A| or |A · A |  K|A|, then there exists a O(K )-approximate O(1) O(1) groupHof size |H|= O(K |A|) such that A can be covered by O(K ) left-translates gH of H; see [52, Theorem 4.6]. Our aim in this paper is to “describe” the structure of approximate subgroups in an arbitrary ambient group in terms of more explicit algebraic objects such as those listed in the examples. Here is one form of our main result in this regard. Theorem 1.6 (Main theorem, simple form). — Let A be a global K-approximate group, thus it is contained in a (global) group G. Then there exists a subgroup G of G and a finite normal subgroup H of G with the following properties: (i) A can be covered by O (1) left-translates of G ; K 0 (ii) G /H is nilpotent and finitely generated of rank and step at most O (1); 0 K (iii) A contains H and a generating set of G . In particular, the group G is finite-by-nilpotent, and hence also virtually nilpotent. Indeed, the stabiliser in G of the conjugation action on H has finite index in G and is a 0 0 central extension of a finite index subgroup of G /H, and therefore is also nilpotent. By specialising Theorem 1.6 to the combinatorial examples in Remark 1.5 we obtain an analogous structure theorem for sets of small doubling. Corollary 1.7 (Freiman-type theorem). — Let A and B be finite non-empty subsets in a (global) 1 1 2 2 group G such that |AB|  K|A| |B| . Then there exists a subgroup G of G and a finite normal subgroup H of G with the following properties: (i) A can be covered by at most O (1) right translates of G ; K 0 (ii) G /H is nilpotent and finitely generated of rank and step O (1). In particular, G is 0 K 0 finite-by-nilpotent and hence also virtually nilpotent. Here and in the rest of the paper we use X = O (Y),X  Y, or Y X for two (standard) quantities X, Yand K K K a (standard) parameter K to denote the assertion that |X|  C Y for some (standard) quantity C > 0depending only K K on K, and similarly for other choices of subscripted parameters. We also adopt an analogous notation for nonstandard quantities; see Appendix A. The rank of a finitely generated group is the least number of generators required to generate the group. The step is the length of the lower central series, minus 1. THE STRUCTURE OF APPROXIMATE GROUPS 121 O(1) Proof.—By [52, Theorem 4.6], there exists a O(K )-approximate group A of O(1) O(1) size O(K |A|) such that A can be covered by O(K ) right translates of A and B can O(1) be covered by O(K ) left translates of A . We may thus apply Theorem 1.6 to A . Theorem 1.6 (or Corollary 1.7) answers in the affirmative a conjecture that we have been referring to as the Helfgott-Lindenstrauss Conjecture, on account of its having been raised independently in private communications by both Harald Helfgott and Elon Lin- denstrauss. In fact, the conjecture is reasonably explicit in the comments surrounding [31, Theorem 1.1]. Remark 1.8 (The linear case). — Various forms of the main theorem are also known in groups of Lie type of bounded dimension, as a consequence of results of many authors [6–8, 18, 19, 30, 31, 33, 43]. For instance, in [18] an analogue of Theorem 1.6 was established in the case when G is a solvable algebraic group of bounded dimension over a finite field of prime order. In that case, the group G /H has bounded rank, and the number of cosets of G needed to cover A is polynomial in K. We have no examples to rule out the possibility that this polynomiality in K holds in all groups G, perhaps at the cost of weakening the rank and step bounds on G /H. Unfortunately our methods, which rely on ultrafilter arguments, give no quantitative bounds on the covering number whatsoever. Remark 1.9 (Bounds on the nilpotent group). — Our method allows us to give an explicit bound on the dimension (rank and step) of the nilpotent group G /Hin Theorem 1.6 at the expense of replacing A in item (iii) by a larger power of A. Namely, if we allow for H and the generating set of G to be contained in A , then we may ensure that the nilpotent group G /His -nilpotent with  = O(K log K).Ifweare happytogoasfar O (1) as A , then this may be further reduced to   6log K. Here we say that a group is -nilpotent if it admits a generating set u ,..., u such that [u , u ]∈ u ,..., u for all 1  i j j+1 i < j . In particular such a group admits a normal series with cyclic factors of length at most , and so is also nilpotent of step at most . We refer the reader to Theorem 2.12 and to Section 10 for a detailed statement and proof. Remark 1.10. — Note that no bound is provided on the size of the finite group H in Theorem 1.6, other than that it is finite. Indeed, by considering A to be a large finite simple group it is not difficult to see that H can be arbitrarily large. We will in fact prove a much more precise version of Theorem 1.6 involving a slightly complicated type of approximate group which we call a coset nilprogression.We discuss this concept in some detail in the next section. For many applications, however, Theorem 1.6 is quite sufficient. Applications. — We now give a small selection of applications to growth in groups and to Riemannian geometry; a greater variety is assembled in Section 11, which also contains proofs of these statements. 122 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO Polynomial growth conditions and Gromov’s theorem. — Firstly, Theorem 1.6 yields a quick proof of Gromov’s theorem [27] on groups of polynomial growth. Theorem 1.11 (Gromov’s theorem). — Let G be a group of polynomial growth. That is, G is n d generated by a finite symmetric set S, and there are constants C and d such that |S |  Cn for all n ∈ N. Then G is virtually nilpotent. Remark 1.12. — In fact, our arguments show that there is some function n f (n) f : N → N, f (n) →∞, such that, if G does not have polynomial growth, then |S |  n for all n. We do not get an explicit function f . However, if the control parameter O (1) in Theorem 2.10 were known to be polynomial in K, we could take f (n) = c log n. The best (in fact only) lower bound known for this function at present is (log log n) , due to Shalom and the third author [51]. It is conjectured by some, in the absence of any examples to c n c n the contrary, that f (n)> n , and possibly even that |S |  e . In [33] Hrushovski also gave a derivation of Gromov’s theorem from his Lie model theorem (see Theorem 3.10 below). He in fact proved a strengthening of Gromov’s theorem (see [33, Theorem 7.1] or Theorem 11.1 below). We will be able to recover Hrushovski’s result more directly (see Corollary 11.2 below). In fact, our approach can also yield the following other strengthening of Gromov’s theorem, which is uniform in the size of the generating set S and appears to be new. Recall that if  ∈ N then we say that a group is -nilpotent if it admits a generating set u ,..., u such that [u , u ]∈ u ,..., u 1  i j j+1 for all i < j . Theorem 1.13. —Let d > 0. Then there is n = n (d)> 0 such that if G is a group generated 0 0 n d by a finite symmetric set S with 1 ∈ S for which |S |  n |S| for some n  n (d), then G is virtually nilpotent. In fact G has a normal subgroup of index at most O (1) which is finite-by-(O(d)-nilpotent). Proof. — The proof of this (and hence of Theorem 1.11) is a short enough deduction that we can give it here in the introduction. We refer the reader to Section 11 for more details. Let N = N(d) be a large quantity to be specified later, and let n be sufficiently n d large depending on N and d . By the pigeonhole principle and the hypothesis |S |  n |S| we see that if n is sufficiently large depending on N then there exists n ,N  n  n /100, 0 0 100n d n such that |S |  (200) |S |. By Corollary 5.2 (which is quite easy) this implies that 2n O(d) S is a e -approximate group. By our main theorem, Theorem 1.6 (and Remark 1.9), we can thus find a finite-by-(O(d)-nilpotent) and hence virtually nilpotent group G such 2n that S is covered by O (1) left-translates of G . By the pigeonhole principle, if N is large d 0 m+1 m enough, we can find a nonnegative m < 2n such that S G = S G . Multiplying on the 0 0 m+k m left by S repeatedly we conclude that S G = S G for all k  0. Since S generates G, 0 0 m 2n 2n we conclude that G = S G = S G . Since S was covered by O (1) left-translates of 0 0 d G, G has index O (1) in G, and so G is also virtually nilpotent. 0 d THE STRUCTURE OF APPROXIMATE GROUPS 123 Riemannian manifolds. — A. Petrunin suggested to us some years ago that a re- sult such as Corollary 1.13 would give a purely group-theoretical proof of a theorem of Fukaya and Yamaguchi [16] according to which fundamental groups of almost non- negatively curved manifolds are virtually nilpotent. Recall that a closed manifold M is said to be almost non-negatively curved if one can find a sequence g of Riemannian metrics on it for which diam(M, g )  1 while K  −1/n where K is the sectional cur- n M,g M,g n n vature. Indeed, a simple application of the Bishop-Gromov inequalities combined with Corollary 11.5 yields the following improvement assuming only a lower bound on the Ricci curvature and an upper bound on the diameter. Corollary 1.14 (Ricci gap). — Given d ∈ N,there is ε(d)> 0 such that the following holds. Let M = (M, g) be an d -dimensional compact Riemannian manifold with Ricci curvature bounded below by −ε and diameter diam(M)  1.Then π(M) is virtually nilpotent. This result is known to differential geometers and follows from the works of Cheeger-Colding [11] and Kapovitch and Wilking [36]. We refer the reader to Sec- tion 11.1 for more discussion and references concerning the above result. We only note that Corollary 11.5 yields in fact an explicit bound on the nilpotency class, namely that after passing to a subgroup of π (M) with index O (1) and quotienting by a finite normal 1 d subgroup, we obtain a O(d)-nilpotent group. Generalised Margulis lemma. — Another corollary of Theorem 1.6 is a “generalised Margulis lemma” for metric spaces of a type conjectured by Gromov in [28,§5.F].Amet- ric space X is said to have bounded packing with packing constant K if there is K > 0such that every ball of radius 4 in X can be covered by at most K balls of radius 1. Say that a subgroup  of isometries of X acts discretely on X if every orbit is discrete in the sense that {γ ∈  : γ · x ∈ } is finite for every x ∈ X and for every bounded set  ⊆ X. Corollary 1.15 (Generalized Margulis Lemma). — Let K  1 be a parameter. Then there is some ε(K)> 0 such that the following is true. Suppose that X is a metric space with packing constant K, and that  is a subgroup of isometries of X which acts discretely. Then for every x ∈ X the “almost stabiliser”  (x) = S (x) ,where S (x) := {γ ∈  : d(γ · x, x)<ε}, is virtually nilpotent. ε ε ε Note that the space X is not assumed to be a manifold. The traditional Margulis lemma establishes a similar statement for subgroups of isometries of pinched negatively curved manifolds, or more generally under a curvature lower bound. Approximate groups and polynomial growth. — Finally we remark on an additive- combinatorial application, which asserts that approximate groups have large subsets with “polynomial growth”. See also http://mathoverflow.net/questions/11091. 124 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO Theorem 1.16 (Approximate groups are locally of polynomial growth). — Suppose that A is a 4   4 4 global K-approximate group. Then A contains a O (1)-approximate group A with (A ) ⊂ A and m O (1) |A | |A| such that |(A ) | m |A | for all m  1. K K This theorem is an immediate consequence of Theorem 2.10 and Proposition C.5 below. Remark 1.17. — The above argument converted nilpotent structure (or more pre- cisely, coset nilprogression structure, see below) to polynomial growth. In the reverse di- rection, there is the result of Sanders [46] in certain monomial groups, in which poly- nomial growth is shown to imply a metric ball type structure, at least under the (rather strong) restriction that the approximate group A is normal in the ambient group G. 2. Coset nilprogressions and a more detailed version of the Main Theorem This section concerns the more precise variants of our main theorem, whose ex- istence we hinted at in the first introductory section. Let us first recall the fundamental inverse sumset theorem for abelian approximate groups. This was first introduced by Freiman [15], and a simplified argument was subsequently given in the paper [44]of Ruzsa. Here is the theorem in the torsion-free setting. Recall the notion of a generalised arithmetic progression, defined in Example 3 above. Theorem 2.1 (Freiman-Ruzsa theorem). — Let G = (G, +) be a torsion-free (global) abelian group, and let K  2 be a parameter. Suppose that A ⊆ G is a K-approximate group. Then 4A = A + A + A + A contains a generalised arithmetic progression P = P(u ,..., u ; N ,..., N ) 1 r 1 r O(1) O(1) O(1) − log K log K with r  log K and |P| e |A|. In particular A can be covered by O(e ) translates of P. Proof.—See[48] for the main part of this; the final assertion is then a consequence of Ruzsa’s covering lemma, Lemma 5.1. For earlier results of this type with weaker bounds on r and P, see [10, 44]. In [26] it was noted that one can take r as small as log K + ε for any ε> 0, at the cost of decreasing the size of |P| somewhat; see also [3, 4] for prior results along these lines. Roughly speaking, Theorem 2.1 asserts that, in a global torsion-free abelian group such as the integers Z, approximate groups are “controlled” by generalised arithmetic progressions of bounded rank. In the case of abelian groups with torsion, the class of generalised arithmetic progressions is not sufficient, as one must also now deal with the THE STRUCTURE OF APPROXIMATE GROUPS 125 example of finite genuine groups (Example 1). It is thus natural to introduce the con- cept of a coset progression H + P: the sum of a finite genuine group H and a generalised arithmetic progression P = P(u ,..., u ; N ,..., N ). This concept is sufficient for the 1 r 1 r formulation of a Freiman type theorem in an arbitrary abelian group. Theorem 2.2 (Abelian Freiman-Ruzsa theorem). — Let G = (G, +) be a (global) abelian group, and let K  2 be a parameter. Suppose that A ⊆ G is a K-approximate group. Then 4A contains a coset progression H + P,where P = P(u ,..., u ; N ,..., N ) 1 r 1 r O(1) is a generalised arithmetic progression with r  log K, H is a finite abelian subgroup disjoint from O(1) O(1) − log K log K P,and |H + P|=|H||P| e |A|. In particular, A can be covered by O(e ) translates of H + P. Proof. — Again, see [48]; see also [25] for an earlier result in this direction. We turn now to the business of dropping the commutativity assumption. We will also drop the assumption that A is contained in a global group and merely assume that A is a subset of a local group G. Informally, this means that we will not require the mul- tiplication law to be defined everywhere in G, but only in a certain neighborhood of id. We refer the reader to Appendix B for a precise definition and basic properties; see also [50, IV.3] for a discussion of the closely related notion of group chunk. We generalise the concept of a generalised arithmetic progression to this setting as follows. Definition 2.3 (Non-commutative progression). — Let u ,..., u be r elements in a local group 1 r G = (G, ·),and let N ,..., N be r positive real numbers. If all products g ... g are well-defined 1 r 1 n −1 5 in G, where each g is equal to one of u or u and, for each j = 1,..., r, the formal expression u i j j −1 and its inverse u appear at most N times, then we call the set of such products a non-commutative progression of rank r and side lengths N ,..., N and we denote it by P(u ,..., u ; N ,..., N ).We 1 r 1 r 1 r refer to r as the rank of the non-commutative progression. Remark 2.4. — One can view non-commutative progressions as multiparameter variants of balls in a word metric. For instance when all N take the same value N and one is working in a global group, the progression P(u ,..., u ; N,..., N) is comparable 1 r with the word ball B(N) of radius N in the group u ,... u for the word metric with 1 r generating set {u ,..., u } in the sense that B(N) ⊆ P(u ,..., u ; N,..., N) ⊆ B(rN). 1 r 1 r In the global abelian setting, all generalised arithmetic progressions of bounded rank are automatically approximate groups with a bounded covering parameter K. This For this definition, we consider u and u to be distinct formal expressions when i = j,evenif u and u take the i j i j −1 −1 same value in G, and similarly for u , u . Thus, for instance, P(u , u ; 1, 1) contains u u even if u , u are equal. 1 2 1 2 1 2 i j 126 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO is not the case in general non-abelian groups, even in the global setting. For instance, if F is the free non-abelian group on two generators e , e , then the non-commutative 1 2 progression P(e , e ; N, N) (which, as remarked earlier, is essentially the ball of radius N 1 2 in F) grows exponentially in N, and one can easily verify that P(e , e ; N, N) is only a 1 2 K-approximate group for K growing exponentially in N. However, the situation is much closer to the abelian case if the ambient group G is nilpotent. Given the link between pro- gressions and balls, the reader familiar with Gromov’s theorem on groups of polynomial growth [27] (to be discussed later on) will not find this surprising. Indeed, it can be shown (though we will not do so here) that if G is a global nilpotent group of step s, a non- commutative progression P(u ,..., u ; N ,..., N ) in G will be a O (1)-approximate 1 r 1 r r,s group if N ,..., N are sufficiently large depending on r and s. 1 r This motivates the following definition. Given some generators u ,..., u ,let us 1 r recursively define an iterated commutator of degree k involving these generators for a natural ±1 ±1 number k  1 by declaring u ,..., u to be the iterated commutators of degree 1, and [g, h] to be a iterated commutator of degree j + k whenever g, h are iterated commutators −1 −1 of weight j, k respectively for some j, k  1. Thus for instance [[u , u ], [u , u ]] is an 2 4 3 2 iterated commutator of u , u , u , u of degree 4. 1 2 3 4 Definition 2.5 (Nilprogression). — Suppose that G is a local group and that s  0 is an integer. A nilprogression of rank r and s is a non-commutative progression P(u ,..., u ; N ,..., N ) with 1 r 1 r the property that every iterated commutator of degree s + 1 in the generators u ,..., u is well-defined 1 r and equals the identity id. Example 9. — The generalised arithmetic progressions P(u ,..., u ; N ,..., N ) 1 r 1 r in Example 3 is a nilprogression (in additive notation) of rank r and step 1. The set P(u , u ; N , N ) in Example 6 is a nilprogression of rank 2 and step 2. 1 2 1 2 It can be shown (though we shall not do so here) that if N ,..., N are sufficiently 1 r large depending on r, s,and P(u ,..., u ; CN ,..., CN ) is a well-defined nilprogression 1 r 1 r of step s for some sufficiently large C depending on r, s,then P(u ,..., u ; N ,..., N ) is 1 r 1 r aO (1)-approximate group. r,s The concept of a nilprogression as defined above is related to, though not quite identical with, the one given in [5]. As a byproduct of our proof methods, we will be able to work with a more tractable subclass of nilprogressions, which we will call nilprogressions in C-normal form. These generalise the notion of a proper generalised arithmetic progression in the additive combinatorics literature, and are also close in spirit to the nilprogressions introduced in [53]. Definition 2.6 (C-normal form). — Let C  1. A non-commutative progression P(u ,..., u ; N ,..., N ) 1 r 1 r is said to be in C-normal form if the following axioms are obeyed. THE STRUCTURE OF APPROXIMATE GROUPS 127 (i) (Upper-triangular form) For every i, j with 1  i < j  r and for all four choices of signs ± one has CN CN j+1 r ±1 ±1 (2.1) u , u ∈ P u ,..., u ; ,..., . j+1 r i j N N N N i j i j In particular, [u , u ]= id whenever 1  i < r. i r 1 n (ii) (Local properness) The expressions u ... u are distinct as n ,..., n range over integers 1 r with |n |  N ,i = 1,..., r. i i (iii) (Volume bound) One has (2.2) 2N + 1 ··· 2N + 1  |P|  C 2N + 1 ··· 2N + 1 . 1 r 1 r The somewhat ugly expression (2N + 1)··· (2N + 1) is convenient to have 1 r in (2.2) for some minor technical reasons, but it would not do much harm for the reader to mentally substitute N ... N for this expression instead if desired. The volume bound 1 r (2.2) is morally (up to some degradation in the constants C) implied by the other axioms of a nilprogression in C-normal form, when the N ,..., N are sufficiently large, and one 1 r is working in a global group (or at least if one assumes P(u ,..., u ; DN ,..., DN ) to be 1 r 1 r well-defined for some sufficiently large D = D ), but for some further minor technical r,s reasons it is convenient to state this bound explicitly in the definition. Example 10. — The generalised arithmetic progressions P(u ,..., u ; N ,..., N ) 1 r 1 r in Example 3 will be in 1-normal form if it is proper, i.e. if all the expressions n u +···+ 1 1 n u for |n |  N are distinct. r r i i Example 11. — The set P(u , u ; N , N ) in Example 6 is not in C-normal form for 1 2 1 2 any bounded C, because [u , u ] is non-trivial. However, the closely related nilprogression 1 2 P u , u , [u , u ]; N , N , N N 1 2 1 2 1 2 1 2 of rank 3 and step 2 is in 1-normal form. The two sets are “comparable” in a number of ways; for instance, one can easily verify that 1 1 P u , u ; N , N ⊂ P u , u , [u , u ]; N , N , N N 1 2 1 2 1 2 1 2 1 2 1 2 C C ⊂ P(u , u ; CN , CN ) 1 2 1 2 for some absolute constant C (e.g. one can take C = 100). Remark 2.7. — Note that in the global group case, the step of a nilprogression in C-normal form is less or equal to its rank. 128 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO In Lemma C.1 we will show that any non-commutative progression P(u ,..., u ; 1 r N ,..., N ) in C-normal form is “essentially” a O (1)-approximate group. More pre- 1 r r,C cisely, we will show that P(u ,..., u ; εN ,...,εN ) is a O (1)-approximate group 1 r 1 r r,C,ε whenever ε> 0 is sufficiently small and the N ’s are sufficiently large depending on C, r. We will also show that every element of P(u ,..., u ; εN ,...,εN ) canberewrittenin 1 r 1 r 1 n the form u ... u h,where h ∈ Hand |n |= O (εN ), while conversely every such prod- i r,s i 1 r uct with |n |  εN obviously belongs to P(u ,..., u ; εN , ...,εN ). i i 1 r 1 r Just as in the abelian case, we need to account for genuine subgroups. The ana- logue of coset progression is a coset nilprogression, a concept we first define in the simpler setting of global groups. Definition 2.8 (Global coset nilprogression). — Let G be a (global) group. By a coset nilprogres- −1 sion of rank r and step s in G,wemeanaset P of the form π (Q),where G is a subgroup of G, H is a finite normal subgroup of G , π : G → G /H is the quotient map, and Q is a nilprogression of 0 0 0 rank r and step s in G /H. We say that P is in C-normal form if Q is in C-normal form. We can extend this definition to local groups, using the local notion of quotient group reviewed in Lemma B.12. Definition 2.9 ((Local) coset nilprogression). — Let G be a (local) group, which we endow with the discrete topology. By a coset nilprogression of rank r and step s in G,wemeanaset P of the form −1 π (Q),where H is a finite genuine subgroup of G with a cancellative normalising neighbourhood G , W is a neighbourhood of H in G with W ⊂ G , WH = HW = W, π : W → W/H is the 0 0 quotient map defined in Lemma B.12,and Q is a nilprogression of rank r and step s in W/H. We say that P is in C-normal form if Q is in C-normal form. We call H the finite group associated with P,and Q the nilprogression associated with P. If Q = P(u ,..., u ; N ,..., N ), then we write P = P (u ,..., u ; N ,..., N ). 1 r 1 r H 1 r 1 r Example 12. — A subgroup is a coset nilprogression of rank 0 and step 0. More generally, the direct product of a subgroup with a nilprogression of rank r and step s is a coset nilprogression of rank r and step s. The coset nilprogression will be in C-normal form if the associated nilprogression is. Example 13. — The set A constructed in Example 8 is a coset nilprogression of p−1 rank 1 and step 1, and is also in 1-normal form as long as N < . Again, coset nilprogressions in normal form are essentially approximate groups; see Lemma C.1 for a precise version of this statement. We are now ready to state our main technical theorem, which among other things implies Theorem 1.6, and whose proof will occupy the bulk of this paper. THE STRUCTURE OF APPROXIMATE GROUPS 129 Theorem 2.10 (Main theorem). — Let A be a K-approximate group. Then A contains a coset nilprogression P of rank and step O (1) and |P| |A|.Furthermore, P can be taken to be in K K O (1)-normal form. We remark that precursor results to this theorem in the case of nilpotent or solv- able groups were obtained in [5, 6, 14, 18, 52, 53]. Theorem 2.10 also provides an inde- pendent proof of a qualitative version of the abelian results of Theorem 2.1 and Theo- rem 2.2, which, in contrast to the other known proofs of these results, manages to almost completely avoid the use of Fourier analysis. It is easy to see that Theorem 2.10 implies Theorem 1.6, by taking G to be the global group generated by P. The key point here is that a group generated by a set u ,..., u is nilpotent of step at most s if every iterated commutator of the u ,..., u of 1 r 1 r degree s + 1 is trivial. A proof of this assertion may be found in Hall’s book [29]. By standard non-commutative product estimates, we can also establish the follow- ing Freiman-type theorem for sets of bounded doubling. Corollary 2.11 (Freiman-type theorem). — Let K  1.Let G be a (global) group and A, B be 1/2 1/2 finite non-empty subsets of G such that |AB|  K|A| |B| . Then there exists a coset nilprogression P of rank and step O (1) with |P| |A| which is in O (1)-normal form, such that A can be K K K covered by O (1) left-translates of P,and B can be covered by O (1) right-translates of P. K K Proof. — This follows immediately from combining Theorem 2.10 with [52,The- orem 4.6]. In Section 10, we will show the following explicit bounds on the rank and step of P. Theorem 2.12 (Bounds on the rank and step of the nilprogression). — In Corollary 2.11 (and in Theorem 2.10 if A is assumed to be a global K-approximate group), at the expense of replacing 4 12 the conclusion P ⊆ A with the weaker statement that P ⊆ A , the coset nilprogression P can be taken to have rank and step at most O(K log K) while remaining in O (1)-normal form. Moreover, if we O (1) settle for the even weaker inclusion P ⊂ A , one can ensure that P has rank and step at most 6log K (while still remaining in O (1)-normal form). It is likely that the numerical constants 6 and 12 here can be improved, but we will not pursue such improvements here. Local approximate groups can be embedded in global groups. — As we have remarked above, the approximate groups A considered in this paper are local in the sense that we do not However, our argument still uses results relating to Hilbert’s fifth problem which require Fourier-analytic tools, such as Pontryagin duality, even in the abelian setting. 130 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO need to assume that A lies in a global group G. However as a consequence of Theo- rem 2.10, the more detailed version of our main theorem, we have the following state- ment. It asserts that, at least at the qualitative level, there is in fact no loss of generality in dealing with the global case. Theorem 2.13. — Suppose that A is a K-approximate group. Then A contains a O (1)- 4 4 approximate group A with (A ) ⊂ A and |A | |A| which is isomorphic to a subset of a global group G. This theorem follows from Theorem 2.10 and the fact (which we prove in Lemma C.3) that a large portion of a coset nilprogression in normal form can be embedded in a global group. This theorem can be viewed as a discrete analogue to a recent result of Goldbring and van den Dries [56], who established that every locally compact local group is locally isomorphic near the identity to a locally compact global group (thus there is a neighbourhood of the identity in the former group that is isomorphic to a neighbour- hood of the identity in the latter group). One should also compare this result with Lie’s third theorem that every local Lie group is locally isomorphic to a global Lie group (see Theorem B.16 and the discussion in Serre’s book [50]). 3. Ultra approximate groups and Hrushovski’s Lie Model Theorem In the next section we will give an outline of the argument we shall use to prove Theorem 2.10. An extremely important component of it will be a Lie Model Theorem that implicitly appears in a remarkable paper of Hrushovski [33, Theorem 4.2], which provided the foundation for much of the work here, and for which we will give a self- contained proof later in this paper. We can state this theorem very informally as follows: Theorem 3.1 (Hrushovski’s Lie Model Theorem, informal version). — In a suitable limit, an approximate group is virtually modelled by a precompact neighbourhood of the identity in a Lie group. Of course, to make this theorem more precise, one has to formalise terms such as “suitable limit”, “virtually”, and “modelled”. We shall do so presently, but first we point out that Theorem 3.1 is very similar in spirit to a key step [27, §7] in Gromov’s proof of his celebrated theorem on groups of polynomial growth, which we state informally as follows. Theorem 3.2 (Gromov’s Lie Model Theorem, informal version). — In a suitable limit, a group of polynomial growth can be modeled by a finite-dimensional locally compact space with a transitive isometric action of a Lie group. To deepen the analogy between the two results, we note that Theorem 3.1 and Theorem 3.2 both require the deep body of results surrounding the solution to Hilbert’s THE STRUCTURE OF APPROXIMATE GROUPS 131 fifth problem on the topological description of the category of Lie groups (see [40]) in order to bring into view the Lie structure, which is not manifestly present when one first takes a limit. There are however some technical differences between the precise formu- lations of Theorem 3.1 and Theorem 3.2. In the latter theorem, one has a group G (of polynomial growth) generated by a finite set S. This gives a metric on G, the word metric given by the generating set S. Gromov then looks at the discrete balls S , n = 1, 2, 3 ... “from a distance” to get some continuous limit metric space X. For example if G = Z and S ={−1, 0, 1},then S ={−n,..., n}, and it is heuristically clear that these discrete intervals S , after rescaling by n, “converge” in a suitable sense to the continuous interval [−1, 1]⊆ R. To effect this limit, Gromov introduced what is now known as Gromov-Hausdorff con- vergence of a sequence of metric spaces. In subsequent work of van der Dries and Wilkie [58] a slightly different approach, using ultralimits (or non-standard analysis) was pio- neered. This construction is now known, in the geometric group theory literature, as the asymptotic cone. The asymptotic cone, then, is (a quotient of) an ultraproduct of the sequence of n 7 balls (S ) . We will use a similar limit in order to formalise Theorem 3.1,namelyan n∈N ultraproduct A of an arbitrary sequence (A ) of K-approximate groups, an object we n n∈N call an ultra approximate group. We now define this term more precisely. Definition 3.3 (Ultra approximate group). — Throughout this paper, we fix a non-principal ultrafilter α ∈ β N\N (see Lemma A.1 for a definition of this concept). If K > 0 is a real number then an ultra K-approximate group is an ultraproduct A := A ,where each A is a (standard) n n n→α K-approximate group. Thus, A is the space of all formal limits lim a with a ∈ A , where two n→α n n n formal limits lim a and lim a are considered equal if a = a for all n sufficiently close to α n→α n n→α n n n (i.e. for all n in an α-large subset of N). See Appendix A for more discussion on ultraproducts. Often we will not need to refer to K explicitly, in which case we speak simply of an ultra approximate group. Note that we allow the approximate groups A to lie in different ambient groups G (much as the notion of Gromov-Hausdorff convergence also does not require the spaces X involved to all live in a common ambient space). Ultraproducts are a model- theoretic limit, in contrast to the more geometric notion of a limit defined by Gromov- Hausdorff convergence. There are two key properties of these model-theoretic limits that make them convenient to use for our purposes. The first is Łos’s theorem, which roughly speaking asserts that any property that can be stated in the language of first-order logic holds for an ultraproduct A = A if and only if it holds for those A with n suf- n n n→α ficiently close to α; see Theorem A.6. The second is countable saturation, which we will use to establish the completeness of a certain (pseudo)metric space associated to an ultra approximate group; see Proposition 6.1. In [33], more saturated limits (not necessarily constructed using ultrafilters) were also considered, but we will not need such constructions here. 132 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO Next, we discuss what it would mean to “model” an ultra approximate group A. Informally, a model would seek to describe the “coarse-scale” behaviour of A, and in par- 2 3 ticular be able to predict when an orbit id, a, a , a ,... of an element a of A will “escape” A, while ignoring the “fine-scale” behaviour of A. Such a model will be formalised by a homomorphism φ : A → L of local groups that obey certain good properties (see Defi- nition 3.5 below). Before we present this formal definition, though, we first discuss some key examples of ultra approximate groups and their models. Example 14 (Nonstandard finite groups). — Suppose that A is a sequence of (standard) finite groups; then the ultraproduct A := A is an ultra approximate group. In this n→α case, A is in fact a genuine group, with group operation given by the law lim a · lim b := lim (a b ). n n n n n→α n→α n→α We will refer to such groups as nonstandard finite groups. A typical example of a nonstandard finite group is the nonstandard cyclic group Z/NZ := Z/nZ, n→α where N ∈ N is the nonstandard natural number (3.1)N := lim n. n→α In a nonstandard finite group A, there are no elements that ever escape A:if a ∈ A,then one has a ∈ A for all n ∈ N. As such, it will turn out that A can be modeled by a trivial homomorphism φ : A →{id} to the trivial group. Example 15 (Nonstandard intervals). — Now consider the sequence A := P(1; n) = {−n,..., n} of (standard) arithmetic progressions in Z. The ultraproduct A := A n→α can be viewed as the nonstandard arithmetic progression A = P(1; N) ={−N,..., N} in the nonstandard integers Z := Z, where N was defined in (3.1). Then A is an ultra n→α approximate group, and it can also be viewed as a local group inside the nonstandard integers Z. Consider now the map π : A → R defined by π lim a := st lim , n→α n→α Our use of the term “model” here is not, strictly speaking, the precise notion that is used in model theory, but is closer to the notion of a “Freiman model” from additive combinatorics, as used for instance in [25, 44]. This group is the analogue of the profinite completion Z = lim Z/nZ of the integers, but is built using the machinery of ultralimits rather than inverse limits. The two groups are however not identical. For instance, Z is torsion- free, whereas Z/NZ can contain torsion; for example if N is even, or equivalently if the set of even natural numbers is α-large, then Z/NZ contains the element N/2 mod N, which has order 2. But see Remark 3.4 below for a link between ultraproducts and inverse limits. THE STRUCTURE OF APPROXIMATE GROUPS 133 where stx is the standard part of a nonstandard real x (see Appendix A). Thus, for every standard ε> 0, one has π lim a − ε   π lim a + ε n n n→α n→α for all n sufficiently close to α. One may also write π(a) = st for all a ∈ A.The map π is a homomorphism of local groups from A into [−1, 1].Itis surjective since, for any γ ∈[−1, 1], the nonstandard integer x := γ N= lim γ n, n→∞ where  is the integer part function, has image π(x) = γ . The kernel ker(π ) is the set of x ∈ A with x = o(N) (thus if x = lim x and ε> 0 is standard, then |x |  εn an n→α n n α-large set of n). For instance, every standard integer lies in ker(π ),asdosomenon- standard integers such as  N= lim  n. n→∞ 10 m There are similar maps from A to [−m, m] for any fixed natural number m, which by abuse of notation we also call π . Informally, these maps model A by the interval [−1, 1], and more generally model A by [−m, m]. In this particular case, the model π : A →[−1, 1] of the ultraproduct A can be viewed as a limiting object for models π : A →[−1, 1] of the individual factors A , by defining π (a) := .However,inmore n n n n general situations, the model for the ultraproduct is only a limit for approximate models of the factors, and this is one reason why we need to work in the ultraproduct setting as much as we do. The model π : A →[−m, m] is not injective: if π(a) is trivial, this does not imply that a is trivial. However, π does have an injectivity-like property which will be impor- tant later, which roughly speaking asserts that if π(a) is small,then a is small. For instance, 1000 11 observe that if a ∈ A is such that π(a) ∈ (−1, 1),then a ∈ A. This property on the model π can be used to derive some important facts about the ultraproduct A; for in- 2 100 10 stance it implies the escape property that if a, a ,..., a all lie in A ,then a lies in A. These sorts of escape properties will play a major role in our arguments in later sections. Example 16 (Generalised arithmetic progression). — We still work in the integers Z,but now take A to be the rank two generalised arithmetic progression 10 10 A := P 1, n ; n, n := a + bn : a, b ∈{−n,..., n} . Strictly speaking, as we are currently in an additive setting, one should write mA = A + ··· + A rather than A = A · ··· · A here. This claim is not quite true when π(a) is −1or +1, as can be seen for instance by considering a = N + 1 = lim n + 1. n→α 134 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO Then the ultraproduct A := A is the subset of the nonstandard integers Z of the n→α form 10 10 A = P 1, N ; N, N = a + bN : a, b ∈{−N,..., N} . This is an ultra approximate group which can be modeled by the Euclidean plane R , m 2 using the model maps π : A → R defined for each standard m by the formula a b π a + bN := st , st N N m 2 whenever a, b = O(N). The image π(A ) is then the square [−m, m] .Asbefore, if 1000 2 a ∈ A is such that π(a) ∈ (−1, 1) ,then a ∈ A; this can be used to conclude that 2 100 10 if a, a ,..., a ∈ A ,then a ∈ A. Note here that while A lives in a “one-dimensional” ∗ 2 group Z,the model R is “two-dimensional”. This is also reflected in the volume growth of the powers A of A for small m and large n, which grow quadratically rather than lin- early in m. Example 17 (Heisenberg box, I). — This example is related to the Heisenberg example in Example 6.Wetakeeach A to be the “nilbox” ⎧ ⎫ ⎛ ⎞ ⎛ ⎞ 1 x z 1 ZZ ⎨ ⎬ n n ⎝ ⎠ ⎝ ⎠ (3.2)A := 01 y ∈ 01 Z :|x |, |y |  n, |z |  n . n n n n n ⎩ ⎭ 00 1 00 1 This is not quite an approximate group because it is not quite symmetric (cf. Example 6), but we will ignore this technicality for sake of exposition. In any case it can be repaired in −1 a number of ways, for instance by replacing A with A ∪ A . Once again we consider n n the ultraproduct A := A ; this is a subset of the nilpotent (nonstandard) group n→α ∗ ∗ 1 xz 1 Z Z 01 Z , consisting of all elements 01 y with |x|, |y|  Nand |z|  N ; again, this is a 00 1 00 1 (discrete) local group. Consider now the map ⎛ ⎞ 1 RR ⎝ ⎠ π : A → 01 R 00 1 defined by ⎛ ⎛ ⎞ ⎞ ⎛ ⎞ x z 1 xz 1st st N N ⎝ ⎝ ⎠ ⎠ ⎝ ⎠ (3.3) π 01 y := 01 st . 001 00 1 THE STRUCTURE OF APPROXIMATE GROUPS 135 This is easily seen to be a homomorphism (of local groups) to the Heisenberg group, whoseimage is thecompact set ⎧ ⎛ ⎞ ⎛ ⎞ ⎫ 1 xz 1 RR ⎨ ⎬ ⎝ ⎠ ⎝ ⎠ 01 y 01 R (3.4) ∈ :|x|, |y|, |z|  1 . ⎩ ⎭ 001 00 1 Informally, π models A (or A ) by what is essentially a unit ball in this Lie group. As before, we have the injectivity-like property that if a ∈ A is such that π(a) is sufficiently close to the identity, then a ∈ A; as such, one can again establish the escape property that if 2 100 10 a, a ,..., a all lie in A ,then a lies in A. Example 18 (Heisenberg box, II). — This is a variant of the preceding example, in which the (not quite) approximate groups A now take the form ⎧ ⎫ ⎛ ⎞ ⎨ 1 x z ⎬ n n ⎝ ⎠ (3.5)A := 01 y :|x |, |y |  n, |z |  n n n n n ⎩ ⎭ 00 1 so that the ultralimit A := A takes the form n→α ⎧ ⎛ ⎞ ⎛ ⎞ ⎫ ∗ ∗ ⎨ 1 xz 1 Z Z ⎬ ∗ 10 ⎝ ⎠ ⎝ ⎠ A := 01 y ∈ 01 Z :|x|, |y|  N, |z|  N . ⎩ ⎭ 001 00 1 Now consider the map 8 3 π : A → R defined by ⎛ ⎛ ⎞ ⎞ 1 xz x y z ⎝ ⎝ ⎠ ⎠ π 01 y = st , st , st . N N N The image of this map is the unit cube [−1, 1] , and is in particular compact. It is also a homomorphism of local groups, since ⎛ ⎛ ⎞ ⎛ ⎞ ⎞ 1 xz 1 x z x + x y + y z + z + xy ⎝ ⎝ ⎠ ⎝ ⎠ ⎠ π 01 y 01 y = st , st , st , N N N 001 00 1 10 2 10 but the nonstandard real xy /N = O(N /N ) is infinitesimal, and so the previous ex- pression is equal to x + x y + y z + z st , st , st N N N 136 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO which establishes the homomorphic nature of π . 8 3 Here we note that the homomorphism π : A → R is not associated to any exact 8 3 homomorphisms π from A to R . Instead, it is only associated to approximate homomor- phisms ⎛ ⎛ ⎞ ⎞ 1 x z n n x y z n n n ⎝ ⎝ ⎠ ⎠ π 01 y := , , n n n n n 00 1 into R . Such approximate homomorphisms are somewhat less pleasant to work with than genuine homomorphisms; one of the main reasons why we work in the ultraproduct setting is so that we can use genuine group homomorphisms, or at least local group homomorphisms, throughout the paper. Note that the preceding example (3.2) admits a homomorphism π˜ onto the abelian 1 RR group R by composing the map (3.3) with the natural map from 01 R to its abelian- 00 1 isation R . However the kernel of π˜ is, for us, too “big”. In particular it contains every 10 z 01 0 , and in particular contains elements of A not in A. By contrast there are no such 00 1 elements in the example (3.5). In particular, we can still use the model π to establish the 100 10 same escape property for A as before, namely that whenever a,..., a ∈ A , one has a ∈ A. We also note the sets A for small m and large n grow cubically in m in this example, and quartically in m in the previous example. This is consistent with the model groups having homogeneous dimension 3 in the current example and 4 in the previous example. In allthe aboveexamples, themodel group L wasaLiegroup.Wegivenow give some examples to show that the model need not initially be of Lie type, but can then be replaced with a Lie model after some modification. Example 19 (Nonstandard cyclic group, revisited). — The first example is the nonstandard N n cyclic group A := Z/2 Z = Z/2 Z. This is a nonstandard finite group and can n→α thus be modeled by the trivial group {id} as discussed in Example 14.However,itcan also be modeled by the compact abelian group Z of 2-adic integers using the model π : A → Z defined by the formula π(a) := lim a mod 2 n→∞ n n−1 where for each standard natural number n, a(mod 2 ) ∈{0,..., 2 } is the remainder of a modulo 2 (this is well-defined in A) and the limit is in the 2-adic metric. Note that the image π(A) of A is the entire group Z , and conversely the preimage of Z in A = A 2 2 is trivially all of Z/2 Z; as such, one can quotient out Z in this model and recover the trivial model of A. THE STRUCTURE OF APPROXIMATE GROUPS 137 Example 20 (Nonstandard abelian 2-torsion group). — In a similar spirit to the preced- N n ing example, the nonstandard 2-torsion group A := (Z/2Z) = (Z/2Z) can be n→α modeled by the compact abelian group (Z/2Z) by the formula π(a) := lim π (a) n→∞ where π : A → (Z/2Z) is the obvious projection, and the limit is in the product topology N N of (Z/2Z) . As before, we can quotient out (Z/2Z) and model A instead by the trivial group. Remark 3.4. — The above two examples can be generalised to model any nonstan- dard finite group G = G equipped with surjective homomorphisms from G to n n+1 n→α G by the inverse limit of the G . n n Example 21 (Lamplighter group). —Let F be the field of two elements. Let G be the Z Z Z lamplighter group Z  F ,where Z acts on F by the shift T : F defined by T(a ) := n n∈Z 2 2 2 (a ) . Thus the group law in G is given by n−1 n∈Z (i, x)(j, y) := i + j, x + T y . For each n,wethenset A ⊆ G to be the set A := (i, x) ∈ G : i ∈{−1, 0, +1}; x ∈ F , n Z whereweidentify F with the space of elements (a ) of F such that a = 0only for n n∈Z n 2 2 n ∈{1,..., n}. These sets A are not quite approximate groups because they are not sym- metric, but they are close enough to approximate groups for this discussion. For instance, −1 they have bounded doubling or bounded tripling, and A ∪ A is an approximate group. ∗ Z We model the ultraproduct A := A ⊂ Z  F by the group n→α 2 G × G := (i, x), (j, y) ∈ G × G : i = j , 0 Z 0 0 0 where G is the modified lamplighter group Z  F ((t)),where F ((t)) is the ring of 0 2 2 formal Laurent series a t over F with only finitely many non-zero a for n negative, n 2 n n∈Z and the shift given by the multiplication map T : f → tf .Wegive F ((t)) (and hence G 2 0 and G × G ) a topology by declaring the norm of a non-zero element a t of 0 Z 0 n n∈Z −n F ((t)) to be 2 ,where n is the least integer for which a is non-zero. The model map 2 n π : A → G × G is then given by the formula 0 Z 0 (n) (n) n (n) n π i, lim a := i, lim a t , i, lim a t . n n n−n n∈Z n→α n→α n→α n∈Z n∈Z Roughly speaking, π(a) captures the behaviour of a at the two “ends” of F . The image π(A) of A under this model is then the compact neighbourhood of the identity π(A) = (i, x), (i, y) ∈ G : i ∈{−1, 0, +1}, x, y ∈ F [t] 0 2 138 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO where F [[t]] ⊂ F ((t)) is the ring of formal power series a t over F .One can 2 2 n 2 n=0 also compute the images π(A ) for larger values of m, although they are a bit more 2 100 10 complicated. One can verify the escape property that if g, g ,..., g ∈ π(A ) for some g ∈ G × G ,then g ∈ π(A); here it is essential that we use both of the two factors of 0 0 G × G ,asthe claimisfalse if we project π to just one of the two factors G, or to the 0 Z 0 base group Z. So, in this case, one needs a moderately complicated (though still locally 12 m compact) group G × G to properly model A and its powers A .However,ifwepass 0 Z 0 to the large subset A of A defined by A := A ,where n→α n A := (i, x) ∈ G : i = 0; x ∈ F n 2 then A is now a nonstandard finite group (isomorphic to the group F considered in Example 20) and can be modeled simply by the trivial group {id}. Thus we see that we can sometimes greatly simplify the modeling of an ultra approximate group by passing to a large ultra approximate subgroup. Let us formalise the properties enjoyed by the above examples in the following definition, which will play a key role in this paper. Definition 3.5 (Good models). — Let A be an ultra approximate group. A good model for A is a symmetric local topological group L (see Definition B.1), together with a homomorphism π : A → L of local groups with the following properties: (i) (Thick image) There exists an open neighbourhood of the identity U in L such that −1 π (U ) ⊆ A and U ⊆ π(A). In particular ker π ⊆ A; 0 0 (ii) (Compact image) π(A) is contained in a compact set. (iii) (Approximation by “internal” sets) Suppose that F ⊆ U ⊆ U ,where F is compact and U is open. Then there is an ultraproduct A = A of finite sets A ⊆ A such n n n→α −1  −1 that π (F) ⊆ A ⊆ π (U). We will often abuse notation and refer to just L or π as the good model for A, rather than the pair (L,π). Remark 3.6. — Properties (i) and (ii) together imply that L is locally compact. We leave it to the reader to check that the examples given above have all of the properties of this definition. One can think of a good model as accurately describing the “coarse-scale” structure of the ultra approximate group A, without directly controlling the “fine-scale” structure. For instance, in the example (3.5) which is “abelian at coarse scales” but “2-step nilpotent at fine scales”, the model π only detects the abelian structure and not the 2-step nilpotent structure. 12 m m This can also be seen from volume growth considerations: A grows like 4 , which is also the rate of volume growth of π(A) in G × G , whereas the volume growth in a single factor G would only grow like 2 ,and thevolume 0 Z 0 0 growth in Z is only linear in m. THE STRUCTURE OF APPROXIMATE GROUPS 139 Remark 3.7. — In (iii), if F and U are symmetric neighbourhoods of the identity, −1 then A can be chosen to be symmetric (since one can replace A with A ∩ (A ) ). As L is locally compact, we may shrink U to be precompact; then U can be covered by finitely many translates of F, and thus A is then an ultra approximate group. Finally, we need to explain the adjective “virtually” in Theorem 3.1.Ingroup the- ory, “virtually” means “after passing to a finite index subgroup”. Note that a subgroup G of a group G has finite index if and only if G can be covered by finitely many left- translates—or, equivalently, right-translates—of G . This motivates the following defini- tion. Definition 3.8 (Large approximate subgroups). — Let A, A be ultra approximate groups. We 4 4 say that A is a large ultra approximate subgroup of A if one has (A ) ⊂ A ,and A can be −1 covered by finitely many left-translates of A (by elements of A · (A ) , of course). Remark 3.9. — It would be more aesthetically pleasing to have A ⊂ A instead of 4 4 (A ) ⊂ A , but we need the exponent 4 in the inclusion for some minor technical reasons. Note that the property of being a large ultra approximate subgroup is transitive. We are now in a position to state Hrushovski’s Lie Model Theorem. Theorem 3.10 (Hrushovski Lie Model Theorem). — Let A be an ultra approximate group. Then there is a large ultra approximate subgroup A of A such that A admits a local Lie group as a good model. We will prove this theorem in Section 6. As stated above, the basic idea of the proof is to first establish that A itself admits a locally compact local group as a good model. Here results of multiplicative combinatorics, and in particular a lemma of Sanders [47] (see also [13]), are critical. Once this is done, Theorem 3.10 follows relatively quickly from the deep results in the literature on Hilbert’s fifth problem. This theorem will then play a key role in the proof of Theorem 2.10 in two ways: firstly by allowing us to establish certain “escape” properties on (ultra) approximate groups that will be used to build useful metric structures on these groups; and secondly by giving a natural notion of the “dimension” of an (ultra) approximate group which we will need to induct on. Note that one can invoke Lie’s third theorem (Theorem B.16) to upgrade the local Lie group in Theorem 3.10 to a connected, simply connected, global Lie group, but for technical inductive reasons it will be more convenient to keep the model in the category of local Lie groups for now. Theorem 3.10 will be proven in Section 6. We will also establish a “global” variant of this theorem later, first in a weak form as Proposition 6.12 and then in a stronger form as Theorem 10.10. 140 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO 4. An outline of the argument In the previous section we introduced the notion of a (Hrushovski) Lie model, one of the key technical tools we will use to prove Theorem 2.10. In this section we outline the argument for this proof as a whole. Our aim is to show that every K-approximate group is controlled in some sense by a coset nilprogression of rank and density O (1). We shall prove this by contradiction, assuming that there is a sequence (A ) of K-approximate groups for which the state- n n∈N ment fails in the limit for any given choice of implied constant in the O (1) notation. In particular, the cardinality |A | will go to infinity as n →∞. We assemble these approx- imate groups into an ultra approximate group A := A . Our assumption implies n→α that A is not “controlled” in a certain sense by what we call an ultra coset nilprogression, which we now define. Definition 4.1 (Ultra coset nilprogression). — An ultra coset nilprogression is an ultraprod- uct P = P of coset nilprogressions P = P(u ,..., u ; N ,..., N ) of fixed (standard) n n 1,n r,n 1,n r,n n→α rank r and step s. We then say that P has rank r and step s. If the P are also all in C-normal form for some (standard) C independent of n, we say that the ultra coset nilprogression is in normal form.We call N := lim N for i = 1,...,rthe lengths of the ultra coset nilprogression, and say that the i n→α i,n nilprogression is nondegenerate if all the N are unbounded. We define the concept of an ultra nilprogression similarly, but replacing “coset nilprogression” by “nilprogression” throughout. As with all ultraproducts, it suffices to have the P obey the stated properties for all n sufficiently close to α, as one can redefine P arbitrarily on the remaining values of n without affecting the ultraproduct P. Note that an ultra nilprogression P can be expressed as P = P(u ,..., u ; N ,..., N ) 1 r 1 r where r is the rank, u ,..., u are elements of the ambient nonstandard local group, and 1 r N ,..., N are nonstandard positive reals. 1 r To obtain the contradiction, then, it is sufficient to establish the following ultra- product version of our main theorem. Theorem 4.2. — Suppose that A is an ultra approximate group. Then A contains a nondegen- erate ultra coset nilprogression P in normal form with |P| |A|. Here |P| |A| means that the non-standard numbers |A| and |P| satisfy |A|= O(|P|), or in other words that there is a (standard) number C > 0such that |A |  C|P | for an α-large set of n ∈ N. See the end of Appendix A for more infor- n n mation. THE STRUCTURE OF APPROXIMATE GROUPS 141 The Hrushovski Lie model theorem, Theorem 3.10, will be a key tool in establish- ing this, as we discuss below. In addition to this theorem, a further fundamental additional concept in our argument will be the notion of an escape norm. Definition 4.3 (Escape norm). — Let A be a multiplicative set. For a group element g ∈ A , we define the escape norm g ∈[0, 1] to be the quantity e,A g := inf : n ∈ N; g ∈ A for all 0  i  n . e,A n + 1 i i Recall that by convention, the statement g ∈ A is false if g is not well-defined. Now suppose that A is a nonstandard multiplicative set, i.e. an ultraproduct A = A of standard multiplicative sets A . n n n→α 10 ∗ If g = lim g is an element of A ,wedefine the escape norm g ∈ [0, 1] to be the quantity n→α n e,A g := lim g  . e,A n e,A n→α The escape norm can always be defined, but there are some remarkable lemmas essentially due to Gleason [21] concerning its properties when A is an approximate group. Specifically we will show in Section 8 that there is a set A controlling A for which the escape norms satisfy (precise versions of) the following estimates: (i) (Product property) If g ,..., g ∈ A then g ... g   g   + ··· + 1 n 1 n e,A 1 e,A g   ; n e,A −1 (ii) (Conjugation property) If g, h ∈ A then h gh  g  ; e,A e,A (iii) (Commutator property) If g, h ∈ A then [g, h]  g h  . e,A e,A e,A These estimates, which we shall informally term “Gleason’s lemmas” , will be proven in Section 8. They are valid in both the finitary and the ultralimit settings; the latter will be deduced, quite straightforwardly, from the former. The remarks in the following paragraph pertain to the finitary situation. To prove the Gleason lemmas, the set A must be what we call a strong approximate group. The precise definition of this is Definition 7.1. It is by no means obvious that there is a large strong approximate group A contained in A , but this will follow from the Hrushovski Lie model theorem (Theorem 3.10), basically because small balls in a Lie group are automatically strong approximate groups, and can then be pulled back by the model map. One A has been defined, Gleason’s lemmas are proven by an argument closely analogous to that of Gleason himself [21]. We shall say nothing further about the details here; the argument is self-contained and is discussed in Section 8. With Gleason’s lemmas in hand, let us describe the rest of the argument. Firstly, the set H ={g :g = 0} of elements which do not escape is a normal e,A (genuine) subgroup of A ; this follows from (i) and (ii). We may quotient by H to get an ultra approximate group A := A /H, all of whose non-identity elements have nonzero 0 142 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO escape norm. We shall call such approximate groups NSS approximate groups,inanalogy with the no small subgroups property in the theory of locally compact groups. Now, if g ∈ A is an element other than the identity with smallest (nonzero) · - 1 e,A 0 0 escape norm then we shall see that in fact, if A is chosen appropriately, g ∈ A . Item (iii) 1 0 then implies that for any h ∈ A , [g , h]∈ A hassmallerescapenormthan g , and hence 0 1 1 must be the identity. In other words, g is central in A and we may quotient again to get 1 0 a new approximate group A := A / g . We are being quite fuzzy at this point; in fact, 1 0 1 the quotienting takes place in the category of local groups and one is quotienting not by the entire group g but by an appropriate geometric progression within it. Continuing in this vein we pick g ∈ A other than the identity with smallest · - 2 e,A norm. We shall see that this norm is automatically nonzero, a consequence of the local nature of the quotienting operation. Continuing further, we pick g , g ,... . 3 4 All of this makes sense at the level of ultralimits as well, and in this setting one can show that A has a Hrushovski Lie model L with dim L < dim L for all i. Because i i i i−1 of this, the quotienting procedure terminates in finite time with an element g and one concludes by reversing these finitely many quotienting operations that A is controlled by an ultra coset nilprogression with “generators” H, g ,..., g , thereby leading to a proof 1 k of Theorem 4.2. This concludes our brief summary of the argument. Let us summarise the content of the remaining core sections of the paper. • In Section 5, we discuss results from multiplicative combinatorics, essentially due to Sanders and Croot-Sisask, which are relevant to the proof of Hrushovski’s Lie model theorem. • In Section 6, we prove the Hrushovski Lie model theorem. • In Section 7, we use the Hrushovski Lie model theorem to construct strong approximate groups. • In Section 8, we state and prove Gleason’s lemmas. • In Section 9, we give details of the inductive strategy outlined above for con- structing H and g ,..., g , and conclude the proof of Theorem 2.10 (except for 1 k the rank bound). • In Section 10 we show that the rank and step of the coset nilprogression can be bounded by 6 log Kin the global case. • Section 11 is devoted to various applications to the growth of groups and to Riemannian geometry. We prove there the corollaries stated in the introduction. 5. Sanders-Croot-Sisask theory In the next section we will establish Hrushovski’s Lie Model Theorem (Theo- rem 3.10), in which an ultra approximate group is related first to a locally compact THE STRUCTURE OF APPROXIMATE GROUPS 143 metrisable local group and then, via Goldbring’s solution [24] of the local Hilbert’s Fifth problem, to a local Lie group. In locally compact metrisable local groups we have total boundedness, which means that the unit (say) ball B(id, 1) := {x ∈ G : d(x, id)  1} may be covered by O (1) smaller balls B(x ,ε) := {x ∈ G : d(x, x )  ε}. On the other hand, ε i i by continuity of the group operation, B(id, 1) will contain high powers like B(id,ε) for suitably small ε. It is not surprising, then, that we need tools for showing (roughly speaking) that approximate groups A contain high powers of somewhat smaller, but still quite large, approximate subgroups A , which do not immediately escape A in the sense that (A ) is contained inside A (or perhaps a slightly larger set such as A ) for a reasonably large value of m. Such a tool is provided by a result from multiplicative combinatorics due to Sanders [47] and to Croot-Sisask [13, Theorem 1.6], namely Theorem 5.3 below. We shall also need a “normal” variant of this result, which essentially follows by combining Theorem 5.3 with [49, Lemma 13.1]. Our version of this is Theorem 5.6 below, and once again we provide a self-contained proof. Let us remark that by appealing to these results from multiplicative combinatorics we differ fairly substantially from the approach taken by Hrushovski [33], although one may perceive structural similarities in the model-theoretic arguments he uses. All of the results below are essentially already in the literature, but always for sub- sets A of some ambient (global) group G. As it turns out, though, the proofs of these results end up being equally valid for the more local setting of multiplicative sets. Indeed, most of the tools used in multiplicative combinatorics (with the notable exception of the Fourier transform) are already “local” in nature in that they only require one to do O(1) multiplications. Our first such tool is Ruzsa’s covering lemma, which essentially allows one to select a “complete set of coset representatives” in the approximate group setting. Lemma 5.1 (Local Ruzsa covering lemma). — Let A, B be finite sets, and suppose that A ∪ B is a multiplicative set. Then there exists a finite set X ⊆ B with |X|  |AB|/|A| and B ⊆ A X. Similarly there exists a finite set Y ⊆ B such that |Y|  |BA|/|A| and B ⊆ YA . Proof. — Let X be a subset of B such that the sets A · x for x ∈ X are disjoint, and such that X is maximal with respect to set inclusion; then we have |X|  |AB|/|B|.If b ∈ B, then A · b and A · x must intersect for some x,thus a · b = a · x for some a, a ∈ A. −1 −1 Multiplying on the left by a , we conclude that b = a · a · x, and the claim follows. A corollary of this is the following result, which allows one to produce an approxi- mate group from a set with small growth. Corollary 5.2. —Let A be a symmetric multiplicative set, and suppose that |A |  K|A|. Then A is a 2K-approximate group. 144 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO 2 5 Proof. — Clearly A is a symmetric set containing the identity. Since |A |  K|A| 2 4 K|A |, we see from Lemma 5.1 that there exists X ⊆ A with |X|  Ksuch that 4 2 4 4 2 A ⊆ A X, and there similarly exists Y ⊆ A with |Y|  Ksuch thatA ⊆ YA . Taking the union of X and Y we obtain the claim. We turn now to the result of Sanders [47] that drives our whole approach. Theorem 5.3 (Small neighbourhoods). — Suppose that A is a K-approximate group, and let m  1 be an integer. Then there is a O (1)-approximate group S with |S| |A| such that K,m K,m m 4 S ⊆ A . Remark 5.4. — Explicit bounds for the implied constants are given in, for example, [13, Theorem 1.6]. As much of the remainder of the argument is not explicitly effective with respect to bounds, we do not worry about such quantitative issues here. Similar remarks can be made in connection with the normal variant, Theorem 5.6 below. Proof. — We use the argument from [47], generalised to the setting of multiplicative sets. For the convenience of the reader, we reproduce it here. A somewhat different proof of Theorem 5.3 can also be obtained by using the techniques of [13]. For each 0 < t < 1, let f (t) denote the quantity |AB| f (t) := inf : B ⊆ A;|B|  t|A| . |A| Since |A |  K|A|,wehave 1  f (t)  Kfor all0 < t < 1. By the pigeonhole principle, we can thus find t 1such that K,m t 1 (5.1) f  1 − f (t). 2K 100m Fix this t. As there are only finitely many sets B that make up the infimum for f ,wecan find a B ⊂ Awith |B|  t|A| such that (5.2) |AB|= f (t)|A|. For each a ∈ A, the set Ba has cardinality |B| and is contained in A . 1 (x) =|A||B| Ba a∈A x∈A and hence by Cauchy-Schwarz we obtain 2 2 |A| |B| 1 (x) = . Ba |A | a∈A x∈A THE STRUCTURE OF APPROXIMATE GROUPS 145 The left-hand side can be rewritten as Ba ∩ Ba , a∈A a ∈A and so by the pigeonhole principle, there exists a ∈ Asuch that |A||B| |Ba ∩ Ba |  . |A | a∈A Since |B|  t|A| and |A |  K|A|,wethushave |Ba ∩ Ba |  |A| , a∈A and hence we can find a subset C of A of cardinality 2 2 (5.3) |C|  t /2K |A| −1 2 2 −1 such that |Ba ∩ Ba |  t |A|/2K for all a ∈ C. Multiplying by a and by a , we see 2 2 −1 −1 that |Bh ∩ B|  t |A|/2K for all h ∈ S ,where S := a C ∪ C a ∪{id} is a symmetric 0 0 0 subset in A containing the identity. From (5.1), we conclude that A(Bh ∩ B)  1 − f (t)|A|. 100m From (5.2), we conclude that |ABh ∩ AB|  1 − |AB|. 100m Using induction (and the hypothesis that A is well-defined, noting that B ⊂ Aand S ⊂ 2  m A ) we then see that for any 1  m < 100m, the set S is well-defined and |ABh ∩ AB|  1 − |AB| 100m m m 4 for all h ∈ S , which in particular implies that S ⊂ A . On the other hand, from (5.3) 0 0 we have |S | |A|. From Corollary 5.2 we see that S := S is a O (1)-approximate 0 K,m K,m m 2m 4 group. Since S = S ⊂ A , we obtain the first claim of the lemma. The second claim follows by applying the Ruzsa covering lemma (with B := S ). Remark 5.5. — Let us pause to note a consequence of this result. We defined mul- tiplicative sets to be ones in which one was at liberty to take up to 100 multiplications (i.e. A is well-defined), and the associative law would hold to this extent. Theorem 5.3,or 146 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO more accurately a close examination of the proof of it, says that if A is an approximate group and a multiplicative set in which merely 8 multiplications are allowed (i.e. A is well-defined) then A is O (1)-controlled by an O (1)-approximate group A = Sin m,K m,K which up to m multiplications are defined an associative. For this reason Theorem 2.10 holds if only 8 multiplications are allowed. We shall not dwell on such details further in this paper, allowing ourselves the luxury of 100 multiplications. We turn now to proving a “normal” variant of Theorem 5.3.Here, we usethe notation b −1 a := b ab and B b A := a : a ∈ A, b ∈ B for elements a, b and subsets A, B of a local group. Theorem 5.6 (Small normal neighbourhoods). — Suppose that A is a K-approximate group, and let m  1 be an integer. Let S ⊆ A be a K -approximate group with |S|= δ|A|. Then there is an m A 4 ˜ ˜ ˜ O (1)-approximate group S with |S| |A| such that (S ) ⊆ S . m,K,K ,δ K,K ,m,δ Theorem 5.6 will be deduced from Theorem 5.3. To motivate the argument, let us first recall a standard lemma from group theory. Lemma 5.7. —Let A be a finite group, and let S be asubgroupof A with |S|  |A|/K. Then ˜ ˜ there exists a further subgroup S ⊂ S of A with |S| |A| which is normal in A. Note that this lemma would easily yield Theorem 5.6 from Theorem 5.3 in the special case when A and S are genuine groups and not merely approximate groups. Proof.—Let x ,..., x be a complete set of right coset representatives for S in A, 1 k and set −1 −1 S = x Sx = x Sx. x∈A All the claims of the lemma are immediate, except for the claim that |S| |A|.However, this follows from iterating the fact that if H , H  G are subgroups of small index in a 1 2 group G then so is H ∩ H ; in fact we have the well-known inequality 1 2 (5.4) [G : H ∩ H ]  [G : H ][G : H ]. 1 2 1 2 To adapt this argument to the approximate setting we need an analogue of (5.4) for approximate groups. This is provided by the following lemma. THE STRUCTURE OF APPROXIMATE GROUPS 147 Lemma 5.8. — Suppose that A is a K-approximate group and that A , A ⊆ A are sets with 1 2 −1 −1 −1 |A |= δ |A|.Then A A ∩ A A contains a set BB with B ⊆ A and |B|  δ δ |A|/K. i i 1 2 1 2 1 2 −1 −1 Proof. — Since A A ⊆ A ,wehave |A A |  K|A|. It follows that there is some 2 2 1 1 −1 x with at least δ δ |A|/K representations as a a . Let B be the set of all values of a that 1 2 2 2 −1 −1 appear. Obviously BB ⊆ A A . Suppose that a , a ∈ B. Then there are a , a such 2 2 1 2 1 −1 −1 −1 −1 −1    −1 that x = a a = (a ) a ,and so a a = a a .Thus BB lies in A A as well. 2 1 1 1 2 1 1 2 1 2 By iterating the above lemma we obtain the following corollary. Corollary 5.9. — Suppose that A is a K-approximate group and that A ,..., A ⊆ A are 1 k −1 sets with |A |  δ|A| for each i. Then | A A | |A|. i i δ,k,K i=1 i Now we can prove Theorem 5.6. Proof of Theorem 5.6.—By Theorem 5.3, there is an O  (1)-approximate sub- l,K,K group S ⊆ S , |S | |A|, 0 m,K,K ,δ such that 4m+4 4 (5.5)S ⊆ S . The Ruzsa covering lemma allows us to do the analogue of picking a complete set of coset representatives in the approximate group setting. Specifically, there are x ,..., x , 1 k k = O (1),suchthat m,δ,K 4 2 (5.6)A ⊆ S x . i=1 Let us assume without loss of generality that x = id. By Corollary 5.9, the set 2 −1 T := x S x 0 i i=1 has cardinality |T| |A|. m,K,K ,δ We claim that the set S := T has the required properties. First of all note that, by Corol- lary 5.2, S is indeed an O  (1)-approximate group. m,K,K ,δ 148 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO Next observe that, since x = id, −1 2 (5.7) x Tx ⊆ S i 0 for each i. 4 2 Suppose that x ∈ A . Then, by (5.6), there is some i with 1  i  k and some s ∈ S such that x = sx . It follows from this, (5.7)and (5.5)that 2m −1 m −1 2m −1 −1 2m −1 −1 4m+4 4 x S x = x T x = s x T x s = s x Tx s ⊆ S ⊆ S . i i i i 0 This concludes the proof. 6. Proof of the Hrushovski Lie model theorem In this section we establish Theorem 3.10. The reader may wish to reread Sec- tion 3, which gave an overview of this theorem. We will deduce this theorem from the following two propositions. Proposition 6.1 (Locally compact model). — Let A be an ultra approximate group. Then A admits a model π : A → G by a metrisable locally compact local group G. Proposition 6.2 (From locally compact models to Lie models). — Let A be an ultra approximate 4 32 group and suppose that A admits a model π : A → G into a locally compact local group G. Then 4 4 8 ˜ ˜ ˜ there is a large ultra approximate group A of A (thus A ⊂ A ) which admits a model π˜ : A → L into a connected, simply-connected Lie group L. It is clear that the above two propositions together imply Theorem 3.10. We will give a self-contained proof of Proposition 6.1, using the multiplicative com- binatorics results of the previous section, together with the countable saturation property of ultraproducts. In contrast, the proof of Proposition 6.2 requires deep material related to (the local version of) Hilbert’s fifth problem, for which we provide suitable references. Building metrics on local groups. — We now begin the proof of Proposition 6.1. Suppose that we have a pseudometric d : G × G →[0, ∞) on some local group G, that is to say d satisfies the axioms of a metric, except that we may have d(x, y) = 0when x = y. Then we may of course define the balls B(id,ε) := {x ∈ G : d(x, id)<ε}, and these will be nested in the sense that B(id,ε) ⊆ B(id,ε ) if ε< ε . We now examine ways to reverse this construction, beginning with a quite general way to construct pseudometrics on symmetric local groups; this will be needed to prove Proposition 6.1. Let G be a symmetric local group. For any function ψ : G → R and g ∈ G, we define the shift T ψ : G → R by setting −1 T ψ(x) := ψ g x g THE STRUCTURE OF APPROXIMATE GROUPS 149 −1 if g x is well-defined in G, and T ψ(x) = 0 otherwise. We then define the “derivative” operator ∂ ψ := ψ − T ψ. g g The expression ∂ ψ := sup ∂ ψ(x) g  (G) g x∈G can be viewed heuristically as a “norm” of g relative to ψ , and this makes it natural to consider the function (6.1) d(g, h) := T ψ − T ψ ∞ =∂ −1 ψ ∞ . g h  (G) h g  (G) One can view d as the pullback of the metric on  (G) to G using the translation action g → T ψ of G on ψ . Lemma 6.3 (Using functions to build (pseudo-)metrics). — Let G be a local group, and let A be a symmetric neighbourhood of the identity such that A is well-defined in G.Let ψ : G → R be non-negative and supported on A. 128 2 ∞ ∞ (i) We have ∂ ψ  ψ for all g ∈ A , with equality holding when g ∈ A . g  (G)  (G) (ii) Whenever g, h ∈ A , one has (6.2) ∂ ψ ∞  ∂ ψ ∞ +∂ ψ ∞ . gh  (G) g  (G) h  (G) (iii) For any g ∈ A , we have −1 ∞ ∞ (6.3) ∂ ψ =∂ ψ . g  (G) g  (G) 64 64 + (iv) The function d : A × A → R defined by the formula (6.1) is a left-invariant pseudo- metric on A . Remark 6.4. — To spell out what we mean in (iv), we are asserting that d(g, g) = 0, that d(g, h) = d(h, g),and that d(g, k)  d(g, h) + d(h, k) for all g, h, k ∈ A . Furthermore 64 128 it has the left-invariance property d(gh, gk) = d(h, k) whenever h, k ∈ A , g ∈ A ,and gh, gk ∈ A . Later on, when proving Gleason’s lemmas, we shall require some slightly more exotic properties of these cocycle “norms”, related to commutation and a certain “Taylor expansion”. Proof. — The property (i) is clear from construction. For g, h ∈ A we have the representation property T T ψ = T ψ and hence the cocycle identity g h gh ∂ ψ = ∂ ψ + T ∂ ψ gh g g h 150 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO which gives (6.2). Similarly, for g ∈ A we have the inverse identity −1 −1 ∂ ψ =−T ∂ ψ g g which gives (6.3). The claims in (iv) follow easily from (ii) and (iii). In the next lemma we give a variant of the Birkhoff-Kakutani construction [40, §1.22], in which a function ψ is constructed so that the pseudometric d(g, h) = ∂ −1 ψ ∞ is adapted to a given nested sequence of symmetric sets which are sup- h g  (G) posed to resemble “balls” in this pseudometric. Lemma 6.5 (Birkhoff-Kakutani construction). — Suppose that G is a local group and that we have a sequence of symmetric neighbourhoods A , A ,... of the identity in G with the nesting property 0 1 2 200 that A ⊆ A for i = 0, 1, 2,... , and with A well-defined. Then there is a pseudometric i+1 0 64 64 d : A × A →[0, 1] 0 0 such that we have the inclusions 64 −k 64 −k (6.4) g ∈ A : d(g, id)< 2 ⊆ A ⊆ g ∈ A : d(g, id)  2 · 2 0 0 for all nonnegative integers k. In particular x → x in the pseudometric d if and only if, for each k ∈ N, −1 we have x x ∈ A for all sufficiently large n. n k −i −i 1 k Proof. — Suppose that q = 2 + ··· + 2 ,0 < q < 1, is a dyadic rational, and define B := A A ... A . q i i i k k−1 1 Even though the definition uses a potentially large number k of multiplications, the nest- ing property of the A means that these sets B are well-defined in the local group G. i q −k We claim that B ⊆ B whenever q is a dyadic rational with denominator divid- q+2 ing 2 ; this easily implies that (6.5)B ⊆ B  whenever 0 < q < q < 1. q q The claim follows by repeated use of the nesting A ⊆ A (the number of times it will be i+1 −k required is the number of carries when 2 is added to q in binary). In particular, B ⊆ A ⊂ A . q i −1 0 Define ψ : A →[0, 1] by ψ(x) := sup{1 − q : 0 < q < 1; x ∈ B }∪{0}, q THE STRUCTURE OF APPROXIMATE GROUPS 151 and consider the pseudometric d(g, h) := ∂ −1 ψ ∞ as discussed in Lemma 6.3.Note h g  (G) 64 ∞ that for g, h ∈ A , ∂ −1 ψ is supported in A , and so one can replace  (G) here with h g 0 0 A if desired. −k k −k If d(g, id)< 2 then |∂ ψ(id)| < 2 , which implies that ψ(g)> 1 − 2 and there- fore g ∈ B −k and hence g ∈ A . 2 k −k Conversely, suppose that g ∈ A : we are to show that d(g, id)  2 · 2 . To show this 1−k we must confirm that |∂ ψ(h)| < 2 for all h ∈ G. As discussed before, we may assume 192 −k −k that h ∈ A . Suppose that h ∈ B ,where 0 < q < 1− 2 is an integer multiple of 2 ,but −k −1 that h ∈ / B −k.Then ψ(h)  1 − q + 2 . On the other hand, g h ∈ A B ⊆ B −k,by q−2 k q q+2 −1 −k the claim established above, and therefore ψ(g h)  1− q− 2 . It follows that ∂ ψ(h) = −1 −k −k ψ(g h) − ψ(h)  −2 · 2 . Similarly, ∂ ψ(h)  2 · 2 . Since h was arbitrary it follows −k that d(g, id) =∂ ψ ∞  2 · 2 , and the claim follows. g  (G) If the sets A satisfy a certain normality condition, the group operations are con- tinuous with respect to the pseudometric d : Lemma 6.6 (Normal Birkhoff-Kakutani construction). — Suppose that G is a local group and that we have a sequence of symmetric sets A , A ,... in G with A well-defined and with 0 1 2 A the nesting property that (A ) ⊆ A for i = 0, 1,... (and so, in particular, we certainly have i+1 the weaker nesting property A ⊆ A required by the preceding lemma). Consider the pseudometric i+1 64 64 32 32 64 d : A × A →[0, 1] defined in the preceding lemma. Then the product map ·A × A → A 0 0 0 0 0 −1 32 32 and the inversion map : A → A are both continuous with respect to d . 0 0 Proof. — Suppose that g → g and that h → h. We wish to show that g h → gh,to n n n n −1 which end it suffices to establish that (gh) g h ∈ A for all sufficiently large n.However, n n k −1 for n sufficiently large in terms of k we have g g ∈ A , and hence n k+2 −1 −1 n 0 h g g h ∈ A ⊆ A ⊆ A . n n k+1 n k+2 k+2 −1 Furthermore, h h ∈ A for n sufficiently large, and so n k+1 −1 −1 −1 −1 2 (gh) g h = h h h g g h ∈ A ∈ A , n n n n n k n k+1 as required. The statement about the inverse map is easier. Suppose that g → g.Then −1 g g ∈ A for n sufficiently large, and so n k+1 −1 −1 −1 −1 g g = g g g g ∈ A ⊆ A ∈ A . n n k k+1 k+1 −1 But this means that g → g as n →∞. The previous lemma showed how to get a local topological group given a sequence of balls satisfying a suitable normalisation condition. The normal variant of the Croot- Sisask-Sanders lemma, Theorem 5.6, allows us to find precisely such a sequence of balls 152 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO given any K-approximate group A. Of course, these balls are just finite sets and, for sufficiently large i,A may well consist only of the identity element e. This will be the case, for example, when A =[−N, N]. However when transferred to the setting of an ultra approximate group A = A , these balls have “finite index” in A, and this n→α ultimately leads to the important conclusion that the metric d gives A the structure of a locally compact local group. Lemma 6.7. —Let A be an ultra approximate group. Then there is a sequence of ultra approx- 4 A imate groups A , A ,... such that A = A , we have the nesting property that (A ) ⊆ A for 0 1 0 i i+1 i = 0, 1,... ,and each A is large in the sense that A can be covered by finitely many left-translates of A . Proof. — By definition, one has A = A for some K-approximate groups A n n n→α and some fixed K. Applying Theorem 5.6 repeatedly we see that there are, for each n, n,0 O (1)-approximate groups S , i = 1, 2, 3 ... ,suchthat S := A and (S ) ⊆ K,i n,i n,0 n n,i+1 4 4 S for each i. Furthermore we have S ⊆ A and |S | |A | for each i. Setting n,i n,i K,i n n,i n A := S , n,i n→α all of the properties except the assertion about covering are immediate. To check that each A is large, we need only check that S is covered by O (1) left-translates of S , i n,0 K,i n,i for each i. This, however, is an immediate consequence of Lemma 5.1 and the lower bound on |S |. n,i Lemma 6.8. —Let A be an ultra approximate group. Consider a sequence of ultra approximate 32 32 groups A , A ,... as found in the preceding lemma, and let d : A × A →[0, 1] be the pseudometric 0 1 associated to these sets as in Lemma 6.5. Then A is locally compact with respect to the topology generated by d . Proof. — By the Heine-Borel theorem (which is usually stated for metrics, but which 13 32 extends without difficulty to pseudometrics) it suffices to show that A is complete and totally bounded. We deal with the latter task first. From the inclusion A ⊆{x : d(x, id) −k 32 2· 2 } and the left-invariance of d , this follows from the fact that A is covered by finitely many left-translates of A . We turn now to completeness. Suppose that (x ) is a Cauchy sequence. By re- n n∈N fining the sequence if necessary we may assume that it is rapidly Cauchy in the sense that −n−1 d(x , x )  2 . n m We claim that the sets x A are nested in the sense that x A ⊆ x A whenever n n m m n n −1 −n−1 m > n. To see this note that by left-invariance we have d(id, x x )  2 and hence, Indeed, one can deduce the pseudometric case from the metric case by quotienting out by the equivalence relation x ∼ y defined by the equation d(x, y) = 0. THE STRUCTURE OF APPROXIMATE GROUPS 153 −1 2 by the inclusions of Lemma 6.5, x x ∈ A . Since A A ⊆ A ⊆ A , it follows that m n+1 n+1 m n n n+1 −1 x x A ⊆ A , thereby confirming the claim. m m n Now each set x A is an ultraproduct S , by construction. The nesting m m m,n n→α property just established of course implies that, for any positive integer M, x A = m m mM ∅.Let y be an element of this intersection; this means that there is a set  ∈ α such that M M (y ) ∈ S for all n ∈  . By replacing  with  ∩  if necessary, and so on, M n m,n M 2 1 2 mM and using the basic properties of ultrafilters, we may assume that  ⊇  ⊇  ⊇··· . 1 2 3 By removing 2 from  ,3from  and so on, if necessary, we may also assume that no 2 3 integer lies in infinitely many  . Now define a sequence x by setting x = (y ) , where M is the largest integer for n M n which n ∈  . Then, by construction, x ∈ S for all n ∈  ,thatistosay fora M n m,n M mM set of n tending to α. This means that x ∈ x A for every M, and hence x ∈ x A . m m m m mM −1 −m In particular we have x x ∈ A for every m and hence d(x, x )  2 · 2 . It follows that m m x → x, thereby confirming that A is complete with the metric d . Remark 6.9. — The last part of this argument, in which an element is found in the infinite intersection x A given that each finite intersection x A is m m m m m mM nonempty, is an instance of the countable saturation property of the ultraproduct construc- tion. The completeness that is afforded by the countable saturation property is one of the main reasons why we work in the ultraproduct setting. Note that a similar complete- ness also appears in the ultralimit (X, d)/ ∼ of bounded metric spaces (X , d ),where n n X := X , d := st lim d ,and ∼ is the equivalence relation defined by setting n n→α n n→α x ∼ y whenever d(x, y) = 0. Indeed, it is not difficult to use countable saturation to verify that such ultralimits are automatically complete, even if the original spaces X are not. Proof of Proposition 6.1. — We have shown that A has the structure of a locally compact local group with respect to the metric d . To complete the proof of Proposition, we need only quotient by the equivalence relation ∼ on A ,definedby x ∼ y if and only if d(x, y) = 0. The quotient L := A / ∼ is then a metrisable, locally compact, local group 32 4 and there is a natural map π : A → L. We must check that L is a good model for A in the sense of Definition 3.5. Property (i) requires us to show that there is some open neighbourhood U of −1 4 4 the identity in L such that π (U ) ⊆ A and U ⊆ π(A ),orinother wordssomeball 0 0 32 4 {x ∈ A : d(x, id)<ε} lies in A . This again follows from (6.4) and the fact that each of the sets A constructed in Lemma 6.7 lies in A . Finally, property (ii) in the definition of good model requires us to show that π(A ) is compact. This is immediate. To prove property (iii), we first establish the following weaker property: (iii) : for any open neighbourhood U of the identity in L there is some U ⊆ Uand −1   −1 some ultra finite set A = A with π (U ) ⊆ A ⊆ π (U). n→α n 154 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO −k This is quite easily established: suppose that U contains the ball B(id, 2 ).Thenit follows immediately from the inclusions of Lemma 6.5 that we may take A := A and k+1 −k−1 then U := B(id, 2 ). We now upgrade this to property (iii) in the definition of good model. Suppose that F ⊆ U ⊆ U with F compact and U open. Then there is some open neighbourhood of the identity U such that FU ⊆ U. Applying (iii) , we may locate a further open set U ⊆ U −1   −1 and an ultra finite set A such that π (U ) ⊆ A ⊆ π (U ). By compactness there are elements x ,..., x such that F ⊆ x U ; we may assume that these elements lie 1 M m m=1 −1 in F(U ) = FU ⊆ U ⊆ U , and hence each is of the form x = π(a ) with a ∈ A. To 0 i i i conclude the proof of property (iii) simply take A := a A . This completes the proof m=1 of Proposition 6.1. To complete the proof of Theorem 3.10, we invoke results about Hilbert’s fifth problem, and specifically the structural theorem of Goldbring [23] describing locally compact local groups, which we state as Theorem B.18 in Appendix B. 32 32 Proof of Proposition 6.2. — Suppose that we have a model π : A → Gfrom A to a locally compact local group G, and let U be the open neighbourhood of the identity −1 4 featuring in the definition of good model (Definition 3.5), thus π (U ) ⊂ A and U ⊂ 0 0 π(A ).ByTheorem B.18, there are symmetric neighbourhoods U ⊆ U ⊆ U ⊆ Gwith 2 1 0 U ⊆ U (say) and a compact normal subgroup H of U such that U /H is isomorphic 1 2 1 to a local Lie group L. Let φ : U → U /H be the projection map. 1 1 By property (iii) of Definition 3.5 (applied to π : A → G) there is a symmetric ultra 4 −1 2 −1 3 ˜ ˜ finite set A ⊆ A with π (U ) ⊆ A ⊆ π (U ). Certainly, the map π˜ := φ ◦ π is well- 2 2 8 −1 3 4 −1 4 defined and gives a homomorphism from A to L; since π (U ) ⊂ π (U ) ⊂ A ,we 2 2 4 4 ˜ ˜ have A ⊆ A ,and by Remark 3.7, A is an ultra approximate group. We verify that this is a good model by checking (i), (ii) and (iii) of Definition 3.5 in turn. For (i), first note ˜ ˜ that π( ˜ A) contains U := φ(U ) = U H/H ⊆ L, which is an open neighbourhood of the 0 2 2 identity in L since U H ⊆ Gis open. Furthermore we have −1 −1 −1 −1 −1 2 ˜ ˜ π˜ (U ) = π φ φ(U ) ⊆ π (U H) ⊆ π U ⊆ A. 0 2 2 Turning to (ii), π( ˜ A) is contained in the compact set φ(U ). Finally, we check the “approximation by internal sets” property, which is (iii) in Def- −1 ˜ ˜ ˜ ˜ ˜ ˜ ˜ inition 3.5. Suppose that F ⊆ U ⊆ U ,with Fcompact and Uopen. Then φ (F) = FH −1 ˜ ˜ is compact, whilst φ (U) = UH is open. The approximation by internal sets property then follows from that fact that π : A → G is a good model. Finally, we check that A is a large ultra approximate group. To see this note that π( ˜ A ) is contained in a compact subset of L; therefore there are finitely many elements THE STRUCTURE OF APPROXIMATE GROUPS 155 ˜ ˜ x ,..., x such that π( ˜ A ) ⊆ π( ˜ x )U . It follows that 1 k k 0 i=1 k k 2 −1 ˜ ˜ A ⊆ x π˜ U ⊆ x A, k 0 k i=1 i=1 thereby confirming that A is an ultra approximate group. By essentially the same argu- ˜ ˜ ment, A may be covered by finitely many translates of A;thus A is indeed large. We now record some analogues of the above results in the setting of global ultra approximate groups (i.e. ultraproducts of global K-approximate groups for some fixed K), which are closer to the results of Hrushovski [33]. Define a global model π : A → G 8 8 to be the same notion as a good model π : A → G from Definition 3.5,exceptthat A is replaced by the whole group A generated by A, and G is now required to be a global groupratherthanalocalgroup. Proposition 6.10 (Global locally compact model). — Let A be a global ultra approximate group. Then A admits a global model π : A → G by a metrisable locally compact global group G. Proof. — This is obtained by a modification of the proof of Proposition 6.1.The 2 A one main change is that the nesting condition (A ) ⊆ A appearing in Lemma 6.7 i+1 100(i+1) 2 A needs to be strengthened to (A ) ⊆ A , but this is easily accomplished. i+1 Proposition 6.11 (From locally compact models to Lie models). — Let A be a global ultra approximate group and suppose that A admits a global model π : A → G into a locally compact global group G. Then there is a large ultra approximate group A of A which admits a global model π˜ : A → L into a connected Lie group L. Proof. — This is obtained by a modification of the proof of Proposition 6.2.The one main change is that one needs to replace Theorem B.18 with Theorem B.17. Note that in contrast to Proposition 6.2 that we do not assert that the global Lie group L is simply connected (as this is not provided by the global Gleason-Yamabe the- orem (Theorem B.17), which only promises connectedness). And indeed, in general we do not have simple connectedness of the model. For instance, if A ={−N,..., N}⊂ Z/100NZ for some unbounded nonstandard natural number N, then the obvious global model here is the map π : Z/100NZ → R/Z defined by π(x) = st( ) mod 1, and of 100N course the unit circle R/Z is not simply connected. On the other hand, A = Z/100NZ is globally modeled by the trivial group; and so one can still recover simple connected- ness by passing from A to a suitably large power. See [33, Remark 4.11] for some further discussion of this point, as well as Theorem 10.10 below. Combining Proposition 6.10 and Proposition 6.11 we obtain the following result, originally due to Hrushovski [33]. 156 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO Proposition 6.12 (Weak global Lie model theorem). — Suppose that A is a global ultra ap- proximate group. Then there is a large ultra approximate group A of A which admits a global model π˜ : A → L into a connected Lie group L. We will strengthen this proposition in Theorem 10.10 below. Remark 6.13. —Let π : A → L be a good model for an ultra approximate group A = A by a locally compact local group L, and let U be the neighbourhood in n 0 n→α Definition 3.5.Let U be a symmetric neighbourhood of the identity such that U ⊂ U . 1 0 For any continuous function f : L → R with compact support in U ,wecan definea functional I(f ) by the formula F (a) a∈A I(f ) = inf st |A| + + + where F = lim F is the ultralimit of functions F : A → R, with the nonstandard n→α n n n real F and nonstandard natural number |A| defined in the usual fashion as a∈A + + F (a) := lim F (a ) n→α a∈A a ∈A n n and |A|:= lim |A |, n→α + + and the infimum is over all F for which F (a)  f (π(a)) for all a ∈ A. Using Definition 3.5(iii) it is not difficult to also obtain the equivalent formula F (a) a∈A I(f ) = sup st a∈A − − where the supremum is over all F for which F (a)  f (π(a)) for all a ∈ A. From these two definitions we see that I(f ) is both super-linear and sub-linear, and is thus a contin- uous linear functional on the space C (U ) of continuous compactly supported functions c 1 in U . By the Riesz representation theorem, there thus exists a Radon measure μ on U 1 1 such that I(f ) = fdμ for all f ∈ C (U ). From the translation invariant properties of c 1 I(f ), we see that μ(gE) = μ(E) for any measurable subset E of U ,and any g ∈ Lsuch that gEare defined in U ,and similarlyfor gE replaced by Eg.Thus μ is a bi-invariant −1 Haar measure on U ; since A can be covered by finitely many left-translates of π (F) for any compact neighbourhood F of the identity, we see that μ is non-trivial (which im- plies in particular by bi-invariance that the locally compact local group L is unimodular). This Haar measure can then be used to estimate the (nonstandard) cardinality of various THE STRUCTURE OF APPROXIMATE GROUPS 157 nonstandard finite sets that are “close” to A in some sense. Indeed, from the definitions (and the regular nature of Radon measures) we see that μ(F)|A|  A  μ(U)|A| whenever F ⊆ U ⊆ U , F is compact, U is open, and A is a nonstandard set with −1  −1 π (F) ⊂ A ⊆ π (U). We will not use this measure μ in this paper, but see [33] for some further discussion of this measure and its relationship to Kiesler measures from model theory. One can also use μ to relate the volume growth of A to the volume growth of the model group L, giving some rigorous substance to some of the volume growth heuristics invoked in the examples in Section 3, but we will not formalise this relationship here. Remark 6.14. — As remarked in [33], the Lie Model theorem is not only valid in the context of nonstandard finite ultra approximate groups, i.e. the ultraproduct of finite K-approximate groups for a fixed K, but also for “continuous” ultra approximate groups, that is to say the ultraproduct of precompact open subsets of a locally compact local group that obey all of the approximate group axioms other than finiteness. See [52] for the basic theory of such continuous approximate groups. Indeed, one can check that the machinery in Section 5 can be adapted to this setting by replacing the cardinality of finite sets with the Haar measure of various precompact open subsets of a locally compact local group, as in [52]. Some other components of this paper, such as the construction of strong approximate groups and Gleason metrics, can also be extended to this setting after some minor notational changes. However, there will be a key place in the argument in Section 9 in which the (nonstandard) finiteness of the ultra approximate groups is used in an absolutely crucial way, namely to locate an element in such a group element of minimal non-zero “escape norm”. As such, the main result of this paper, Theorem 2.10, does not immediately extend to the continuous setting. Indeed, the basic example of a small ball in a Lie group shows that continuous approximate groups need not resemble coset nilprogressions at all. We will not pursue this matter further here. Some finitary consequences of the Lie Model Theorem. — To illustrate the power of the Lie Model Theorem in the analysis of approximate groups, we offer two fairly quick applications. The reader interested in the proof of our main results may skip ahead to the next section. The first application is a special case of our main theorem (Theorem 2.10), follow- ing Hrushovski [33, Corollary 4.18]. Another, much more minor, place where ultra finiteness is used in Remark 6.13 above, as we implicitly used the trivial fact that counting measure is bi-invariant. In general, one can only conclude that the measure associated to a good model is bi-invariant if each of the individual approximate groups in the ultraproduct is also equipped with a finite bi-invariant measure. 158 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO Theorem 6.15 (Hrushovski). — Suppose that G be a group of exponent m and suppose that A ⊆ G is a K-approximate group. Then A contains a genuine subgroup H of G with |H| |A|. K,m In particular, by Lemma 5.1, A is covered by O (1) left-translates of H. K,m Remark 6.16. — When m = 2 the group G must be abelian, and in this case the theorem is due to Imre Ruzsa [45]. Proof. — Suppose for sake of contradiction that the claim failed. Then we may find fixed K, m and a sequence of K-approximate groups A ⊆ G in groups G of expo- n n n nent m,suchthatfor each n,A does not contain a genuine subgroup H of cardinality |H |  |A |/n. As usual we form the ultra approximate group A := A . The ultra- n n n n→α product group G := G also has exponent m, and by Hrushovski’s Lie model theo- n→α 4  8 rem we can find a large approximate group A ⊆ A with a local Lie model π : (A ) → L. By Definition 3.5(i), we may find a neighbourhood U of the identity in L such that −1 π (U ) ⊆ A and U ⊆ π(A ). Using the fact that the exponential map is a homeo- 0 0 morphism near the identity of L, we may then find a neighbourhood U of the identity with U ⊆ U such that U contains no elements of order m other than the identity. If 0 1 −1 m  m m a ∈ π (U ), then we conclude that a is well-defined in A with π(a) = π(a ) = id, −1 and so π(a) is trivial, which means that π (U ) = ker(π ).As π(A ) is precompact, we conclude that A is covered by a finite number of translates of ker(π );as A is large, A is also covered by M such translates for some (standard) finite M. −1 From Definition 3.5(iii), we see that the set π (U ) = ker(π ) is a nonstandard finite set, and so ker(π ) = H for some finite subsets H of G . Since ker(π ) ⊆ n n n n→α A is a group and A is covered by M translates of ker(π ), we see from Łos’s theorem (Theorem A.6)thatfor all n sufficiently close to α,H ⊆ A is a group and A is covered n n by M translates of H . However if one takes n larger than M then this contradicts the construction of A , and the claim follows. Remark 6.17. — The astute reader will notice that the only properties of the local Lie group L that were really used in the above argument were that L was locally compact and had the NSS (no small subgroups property). Thus, one could prove Theorem 6.15 using a weaker form of the Gleason-Yamabe theorem (Theorem B.17), in which the model group is merely locally compact NSS rather than Lie. (The machinery of Hilbert’s fifth problem implies that these two concepts coincide, but this is considerably deeper.) However, we do not know of a proof of Theorem 6.15 that avoids the machinery of Hilbert’s fifth problem completely, and in particular some variant of the Gleason lemmas is required. Next, we prove (a slight variant of) the main theorem from Hrushovski’s paper [33, Theorem 1.1], which uses the Lie structure (via the Baker-Campbell-Hausdorff formula) more thoroughly than the preceding application. THE STRUCTURE OF APPROXIMATE GROUPS 159 Theorem 6.18 (Hrushovski’s structure theorem). — Let A be a K-approximate group, and let F : N × N → N be a function. Then there exist natural numbers L, M, N with N  F(L, M) and L, M  1, and nested sets K,F {id}⊂ A ⊆ ··· ⊆ A ⊆ A N 1 with the following properties: (i) For each 1  n  N, A is symmetric; (ii) For each 1  n < N, A ⊆ A ; n+1 (iii) For each 1  n  N, A is contained in M left-translates of A ; n n+1 (iv) For 1  n, m, k  N with k < n + m, the set [A , A ]:={[g, h]: g ∈ A , h ∈ A } is n m n m contained in A ; (v) A can be covered by L left-translates of A . Proof. — Suppose this is not the case. Carefully negating all the quantifiers, we (n) conclude that there exist K, F and a sequence A of K-approximate groups, such that for (n) (n) each n and each L, M  n, there does not exist N  F(L, M) and A ,..., A obeying 1 N the conclusions of the theorem. (n) As usual, we form the ultraproduct A := A , which is an ultra approx- n→α imate group. By Theorem 3.10, we may find a large ultra approximate subgroup (n) 8 ˜ ˜ ˜ A = A which has a good model φ : A → Lby a local Lie group. n→α Let l be the Lie algebra of L, and fix an open bounded convex symmetric body B in L.Let ε> 0 be a sufficiently small (standard) real number depending on B, L to be chosen later; in particular we may assume that the exponential map is a homeomorphism from εBtoexp(εB),and that exp(εB) is contained in the neighbourhood U appearing in Definition 3.5. For each standard natural number n  1, we apply Definition 3.5 and Remark 3.7 to find an ultra approximate group A with −1 −n −1 −n π exp 10 εB ⊆ A ⊆ π exp 2 × 10 εB ; In particular we have the nesting ··· ⊆ A ⊆ A ⊆ A . 2 1 From the Baker-Campbell-Hausdorff formula, we have −n−1 −n exp 2 × 10 εB ⊆ exp 10 εB if ε is small enough, and thus A ⊆ A . In a similar spirit, we can find an M depending n+1 −n only on the dimension of L or l such that each ball 10 εB is covered by at most M trans- −n−1 lates of 4 × 10 εB, which by the Baker-Campbell-Hausdorff formula again implies, for small enough ε,thateach A is covered by at most M left-translates of A . Finally, n n+1 another application of the Baker-Campbell-Hausdorff formula reveals that −n −m −k exp 2 × 10 εB , exp 2 × 10 εB ⊆ exp 10 εB 160 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO whenever k < n + m, and hence [A , A ]⊆ A . n m k Finally, since one can cover π(A) by a finite number of translates of exp(εB),we see that A can be covered by at most L left-translates of A for some standard L ∈ N. (n) (n) Now set A = A for some finite sets A , and set N := F(L, M). Applying n→∞ n Łos’s theorem (Theorem A.6) repeatedly (but only finitely many times), we see that for n (n) (n) (n) sufficiently close to α the sets A ,..., A , A obey all the properties in the conclusion 1 N (n) of the theorem. This contradicts the construction of the A for n larger than L, M, and the claim follows. Remark 6.19. — One can also use the Lie Model Theorem to establish a stronger statement than Theorem 6.18, which roughly speaking asserts that given a (finite) K- approximate group A, one can find a large sub-approximate group A which has an ap- proximate homomorphism π : (A ) → L into a local Lie group L with bounded range that obeys an approximate version of Property (i) in Definition 3.5, where the accuracy of these approximations exceeds the “complexity” of the model by any given function F. The precise formulation of this statement, which is in fact a logically equivalent “finiti- sation” of Theorem 3.10, is somewhat complicated. We will not need it elsewhere in the paper, and so we leave it as an exercise to the reader. 7. Strong approximate groups We now give a combinatorial consequence of the Lie Model Theorem (Theo- rem 3.10) which will be important later, involving a concept which we will call a strong approximate group. Definition 7.1 (Strong Approximate Group). — Let A be a K-approximate group for some K  1. We say that A is a strong K-approximate group if it admits a symmetric subset S such that 1000K (7.1) S ⊆ A and for which the following two trapping conditions are satisfied: 2 3 1000 100 (i) (First trapping condition) If g, g , g ,..., g ∈ A then g ∈ A; 6 3 2 10 K (ii) (Second trapping condition) If g, g ,..., g ∈ A then g ∈ S. An ultra strong approximate group is an ultraproduct A = A of strong K- n→∞ approximate groups A ,for some K  1 independent of n. The complexity, which we do not define here, would be some quantity taking account of the dimension and structure constants on the Lie algebra l of L, the diameter of the range of π and the inradius of the neighbourhood U appearing in Definition 3.5(i). THE STRUCTURE OF APPROXIMATE GROUPS 161 At present this definition will seem somewhat unmotivated, although it can be demystified to some extent by remarking that these properties suggest that S and A are behaving like very small neighbourhoods of the identity in a Lie group L, with S much smaller than A. This point should become clearer shortly. The reader should not pay too 3 6 3 much attention to exponents such as 1000K or 10 K in the definition; they are chosen for the sake of concreteness. The main reason for introducing this concept is that we will be able to show, in the next section, that the escape norm g (defined in Definition 4.3) for an ultra strong e,A approximate group A has the pleasant properties outlined in Section 4. There is scope for varying the parameters in the definition of strong approximate group, but the ones we have given here are strong enough to prove the desired properties of the escape norm. It is easy to give examples of strong approximate groups. For instance, if A = {−N,..., N} (and K = 3) then we may take S ={−N ,..., N } with N ∼ N/1000K . If A is a subgroup, then we may simply take S = A. On the other hand, if one randomly 0.01 removes a small number (e.g. N ) of elements symmetrically from {−N,..., N},the resulting set is likely to remain a O(1)-approximate subgroup, but not a strong O(1)- approximate subgroup. The main result of this section implies the following. Proposition 7.2 (Finding a ultra strong approximate group). — Let A be an ultra approximate group. Then there is a large ultra approximate subgroup A of A which is a strong ultra approximate group. For use in Section 9 we will require the following somewhat more precise result. Proposition 7.3 (Balls are ultra strong approximate groups). — Let A be an ultra approximate group with a good model π : A → L to a local Lie group L.Let B be an open bounded convex symmetric subset of the Lie algebra l of L. Then there exists a standard radius r > 0 such that for all 0 < r < r , 0 0 any symmetric nonstandard finite set A with −1 −1 (7.2) π exp(rB) ⊆ A ⊆ π exp(2rB) is a large strong ultra approximate subgroup of A. It is clear that Proposition 7.2 follows from Proposition 7.3,Theorem 3.10,and Definition 3.5(iii); we will, however, only need Proposition 7.3 in the sequel. We now prove Proposition 7.3.Let r > 0 be a sufficiently small quantity depending on A,π, L, B to be chosen later; in particular, we take r so small so that the exponential map is a homeomorphism from 2r Btoexp(2r B),and exp(2r B) is contained inside 0 0 0 the open neighbourhood U of L from Definition 3.5. 0 0 162 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO ˜ ˜ Let A be as in the proposition. In particular A ⊂ A.ByRemark 3.7 we may take A to be a ultra K-approximate group for some K, and therefore an ultra approx- imate subgroup of A. Since π(A) is precompact, it may be covered by finitely many left-translates of exp(rB),and so A can be covered by finitely many left-translates of A. Thus A is a large ultra approximate subgroup of A. It remains to establish that A is a strong ultra approximate subgroup. Suppose that 100 1000 100 g ∈ A is such that g,..., g ∈ A . Applying π , we see that 1000 100 π(g), ...,π(g) ∈ exp(2rB) . Working in exponential coordinates and using the Baker-Campbell-Hausdorff formula we conclude, if r is small enough, that π(g) ∈ exp(rB) and thus g ∈ A.Wehavethus shown the first trapping condition for A. Next, we use Definition 3.5 to find a symmetric nonstandard finite set S with −5 −3 −4 −3 exp 10 K rB ⊆ S ⊆ exp 10 K rB . From the Baker-Campbell-Hausdorff formula we see that 4 3 exp(2rB) 1000K −4 −3 exp 10 K rB ⊆ exp(rB) and thus ˜ 1000K (7.3) (S) ⊆ A. 6 3 10 K ˜ ˜ Finally, suppose that g ∈ A is such that g,..., g ∈ A. Applying π , we conclude that 6 3 10 K π(g), ...,π(g) ∈ exp(2rB). −5 −3 Working in exponential coordinates, we conclude that π(g) ∈ exp(10 K rB) and hence g ∈ S. Thus we have verified the second trapping condition for A. Finally, we need to push the trapping conditions from the ultraproduct A back ˜ ˜ to the finitary setting. Write A = A , A = A and S = S for some n n n n→α n→α n→α finite sets A , A , S . By Łos’s theorem (Theorem A.6), we see that for n sufficiently close n n n 4 4 2 ˜ ˜ ˜ to α, A is symmetric and contains the identity with A ⊂ A ,with A covered by K n n n left-translates of A ,that 4 1000K (S ) ⊆ A , n n ˜ ˜ and that the first and second trapping properties hold for A and S .Thuswesee that A n n n is a strong K-approximate group for n sufficiently close to α. After redefining A suitably for all other values of n, we conclude that A is an ultra strong approximate group as required. This concludes the proof of Proposition 7.3 and hence Proposition 7.2. Indeed, by using the Baker-Campbell-Hausdorff formula one can take K to depend only on the dimension of L, if r is small enough, but we will not need this fact here. 0 THE STRUCTURE OF APPROXIMATE GROUPS 163 Remark 7.4. — This proposition represents by far the most serious use of Hrushovski’s Lie Model Theorem in our paper. Although we use that theorem else- where in the paper, it is only for this proposition that we do not currently have a plausible alternative approach. 8. The escape norm and a Gleason type theorem In this section we prove a variant of “Gleason’s lemmas” in the setting of approx- imate groups. These show that if A is a strong approximate group then the escape norm has pleasant properties with respect to product, conjugation and commutation. The role of these lemmas was briefly discussed in Section 4. Here is a precise statement. Theorem 8.1 (Gleason-type theorem). — Suppose that A is a strong K-approximate group. Consider the escape norm g := inf : n ∈ N; g ∈ A for all 0  i  n , e,A n + 1 with the convention that g = 1 when g is undefined. This has the following properties: e,A 10 h (i) (Conjugation) If g, h ∈ A then g   1000g ; e,A e,A O(1) (ii) (Product) We have g ... g   K (g  +··· +g  ) if 1 n e,A 1 e,A n e,A g ,..., g ∈ A ; 1 n 10 O(1) (iii) (Commutators) If g, h ∈ A then we have [g, h]  K g h . e,A e,A e,A Note that, as a consequence of (i) and (ii), the set of g ∈ Awith g = 0is a e,A subgroup normalised by A . Remark 8.2. — Note that this lemma is trivial when the ambient local group is abelian. For that reason, this section can be ignored by those readers interested in seeing our alternative proof of the (abelian) Freiman’s theorem. Motivation and heuristic discussion. — We will shortly give a self-contained proof of Theorem 8.1, but as motivation we first offer some comments and discussion of the con- text in which these ideas were first invented: the solution of Hilbert’s fifth problem by Gleason, Montgomery-Zippin and Yamabe [20, 21, 39, 40, 60, 61] (see also [23]for the local group analogue of these lemmas). In that context, the Gleason lemmas show the existence, in an arbitrary locally compact group G, of arbitrarily small compact neighborhoods A of the identity whose associated escape norm satisfies properties (i) to (iii) as above. The Gleason lemmas lie at the heart of Hilbert’s fifth problem and are used at several places in its proof, both in 164 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO the reduction step from general locally compact groups to NSS (No Small Subgroups) groups, and in order to deal with NSS groups. For example, if G is NSS, the Gleason lemmas are needed in order to es- tablish that the set of one-parameter subgroups of G forms a vector space. If X(t) and Y(t) are two one-parameter subgroups, then a natural candidate for X + Yis lim (X(t/n)Y(t/n)) . In order to show that such a limit does exist, the bound (ii) n→+∞ on the escape norm of a product is precisely what is needed. For the full story, the reader may wish to consult the classical references [34, 40], themorerecentnon-standardtreat- ments of the Gleason lemmas by Hirschfeld [32] and by Goldbring and van den Dries [57], or the blog posts of the third author. To give a flavour of how the Gleason lemmas are proven, let us discuss a simple case of the product estimate, namely (8.1) uv  C u +v . e,A e,A e,A Here, A is a ball B(id, 1) about the identity in a locally compact group G with the NSS property, where the ball is with respect to some left-invariant distance d,and C issome finite quantity depending on A. In the discussion below we will make use of the following points concerning this situation: (i) We may construct a distance d with the additional property that d(id, x ) Cd(id, x) for g, x ∈ B(id, 2) (for example by the Birkhoff-Kakutani construction [40, §1.22]). (ii) The balls in G enjoy an escape property quite similar to that in the definition of a strong approximate group. More precisely, given ε> 0 there is an M ∈ N 2 M such that if g, g ,..., g ∈ B(id, 1) then g ∈ B(id,ε). The proof of this is by contradiction—taking a limit of putative “bad” gs, one can contradict the NSS property. The key idea behind the proof of the product estimate (8.1) is to relate the escape −1 norm g to the auxillary quantity ∂  ,where ∂ (x) = (g x) − (x) and  is e,A g ∞ g a non-negative “bump” function supported on B(id, 1), let us say with  = (id) = 1. As noted in Lemma 6.3, such a “norm” automatically satisfies the product inequality (with C = 1), and so we need only show that g ∼∂  in a suitable sense, and for e,A g ∞ asuitable  . In one direction, it is easy to link the two quantities. Indeed if ∂   δ for g ∞ some δ> 0, then a simple telescoping sum argument confirms that (g )> 0, and hence g ∈ A whenever i < 1/δ. Therefore (8.2) g  ∂  . e,A g ∞ http://terrytao.wordpress.com/tag/hilberts-fifth-problem/. THE STRUCTURE OF APPROXIMATE GROUPS 165 2 n Suppose, conversely, that we know that g, g ,..., g ∈ A = B(id, 1). Then certainly, 2 n by the escape property, we have g, g ,..., g ∈ B(id,ε) for some n n.Now if G were aLie group, andif  were smooth with bounded derivatives, we would have (8.3) ∂  ≈ n ∂ , the approximation being better as ε gets smaller. This immediately gives the bound ∂   1/n, and thus we have linked the escape norm and the auxiliary norm g ∞ ε ∂  in both directions. g ∞ Now unfortunately (8.3) is only an approximate identity and, more seriously, G is not known to be a Lie group. In fact, as noted above, these Gleason lemmas are required to prove statements of that form. On a more positive note, observe that we only need to bound ∂  above in terms of g when g = u or g = v, and not for all g.Weare at g ∞ e,A liberty to design the auxillary function  with this in mind. Now the exact version of (8.3) is basically Taylor’s formula, and it reads n−1 (8.4) ∂ n  = n∂  + ∂ ∂ i . g g g g i=0 (We replace n by n for ease of notation.) This makes it desirable to bound the second derivatives ∂ ∂ i  . At this point another key idea enters: it is possible to get good con- g g trol on these second derivatives when  = φ ∗ ψ is the convolution of two “Lipschitz” functions, that is −1 (x) = φ ∗ ψ(x) = φ xz ψ(z) dz, the integral being with respect to Haar measure on G. This is because of the formula −1 (8.5) ∂ ∂ (φ ∗ ψ) = ∂ φ(z)∂ ψ z x dz. g h g h To make this useful, φ is chosen to be somewhat Lipschitz with respect to shifts by g = u and g = v,and ψ is chosen to be Lipschitz with respect to the distance d.Weomit the details. Rigorous argument. — We turn now to the details of such a strategy in the discrete setting, that is to say a rigorous proof of Theorem 8.1. Proof of Theorem 8.1. — To simplify the notation, we will abbreviate  in this e,A proof as  . e 166 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO We start with (i), which is a relatively easy consequence of the first trapping prop- erty in the definition of strong approximate group (Definition 7.1). Indeed suppose that 2 n h h 2 h n h 12 g, g ,..., g ∈ Afor some n; then certainly g ,(g ) ,...,(g ) ∈ A ⊆ A .Bythe first h h 2 h n trapping property this implies that g ,(g ) ,...,(g ) ∈ Afor any n  n/1000, and this confirms (i). The proof of (ii) is significantly trickier and is based on the construction of Glea- son that was briefly discussed earlier. In order to facilitate a certain technical “bootstrap argument”, it will be convenient to temporarily replace the escape norm g by the (ε) regularised version g := g + ε,where ε> 0 is a small quantity. We shall obtain estimates uniform in ε,and then let ε → 0. It is natural to introduce the norm-like quantity (ε) (ε) d (g) := inf g  : g = g ... g , n  1 . i 1 n i=1 It is clear that (ε) (ε) (8.6) d (g)  g . We shall prove an estimate in the opposite direction, namely O(1) (ε) (8.7) g  K d (g). The exponent O(1) will be independent of ε. This implies that, for each positive integer n and all g ,..., g , 1 n O(1) (ε) (ε) g ... g   K g  +··· +g  . 1 n e 1 n e e Letting ε → 0, we recover the product estimate (ii). (ε) In order to establish this we shall, as in Gleason’s argument, relate g and d (g) to an auxillary quantity ∂  ,where  : A →[0, ∞) is a certain “smooth” function g ∞ supported on A .Wewillspecify  shortly; as in Gleason’s argument it will be con- structed as a convolution of two functions φ and ψ . The former is taken to be a kind of (ε) smoothed version of 1 defined using the metric d and Lipschitz for this metric, and the latter constructed using the set S appearing in the definition of strong approximate group (Definition 7.1) and Lipschitz with respect to the word metric on S. One link between these quantities is relatively easy to establish for any function with (id)  1. Indeed suppose that ∂  = δ for some g ∈ A . Then certainly g ∞ i i+1 i 100 |(g ) − (g )|  δ for all i with g ∈ A , which implies by an easy telescoping sum i i argument that (g )  1 − δi for all i. In particular g lies in the support of  , and hence 4 i 100 in A ,for i < 1/δ; note that the hypothesis g ∈ A can be removed by induction. By the first trapping condition in Definition 7.1 this implies that g ∈ Afor i < 1/1000δ,and hence g  1000δ.Thus (8.8) g  1000∂ e g ∞ THE STRUCTURE OF APPROXIMATE GROUPS 167 whenever g ∈ A . To establish (8.7) and hence the product estimate (ii) it therefore suffices to prove a bound O(1) (ε) (8.9) ∂   K d (g) g ∞ 100 100 in the opposite direction for all g ∈ A (the claim for g ∈ A being an easy conse- quence). This argument will depend crucially on the specific form of  . The following two lemmas describe the construction of the functions φ and ψ . (ε) 1000 Lemma 8.3 (Properties of φ). — There is a function φ : A →[0, 1] such that (ε) (i) φ (x) = 1 for x ∈ A; (ε) 2 (ii) φ (x) = 0 if x ∈ / A ; (iii) (Lipschitz bound) For all g ∈ A , one has (ε) # # d (g) (ε) # # ∂ φ  . (ε) c d (id, A ) (ε) (ε) −1 c Here d (y, B) := inf{d (b y) : b ∈ B},and A is the complement of A in G. Proof.—Define (ε) d (x, A) (ε) φ (x) := 1 − . (ε) c d (id, A ) (ε) c Note that this is well-defined since d (id, A ) = 0; this would be an issue without the fudge factor of ε that we have introduced. (ε) (ε) (ε) −1 (ε) Obviously φ (x) = 1for x ∈ A. If φ (x) = 0then d (id, x A) = d (x, A)< (ε) c −1 c 2 d (id, A ),and so x A contains a point outside of A . This implies that x ∈ A . The Lipschitz bound is easily established. Lemma 8.4 (Properties of ψ ). — There is a function ψ : A →[0, 1] such that (i) ψ(x) = 1 for x ∈ A; (ii) ψ(x) = 0 if x ∈ / A ; 4 3 4 (iii) (Lipschitz bound) ∂ ψ  1/10 K for h ∈ S and y ∈ A . h ∞ Proof.—Let Q := S ; recall from the definition of strong approximate group N 4 3 N that Q ⊆ A, where N := 10 K .Define ψ(g) = 0if g ∈ / Q A, ψ(g) = 1if g ∈ Aand i+1 i ψ(g) = 1 − i/Nif g ∈ Q A \ Q Afor i = 0, 1,..., N − 1. The claimed properties of ψ are easily checked.  168 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO We now define  to be the convolution (ε) −1 (x) := φ (y)ψ y x |A| y∈A 100 100 for all x ∈ A , with the convention that (x) = 0for x outside A . We note that 1 1 (ε) −1 (ε) −1 (id) = φ (x)ψ x  φ (x)ψ x = 1, |A| |A| x x∈A a property required in the proof of (8.8). Note also that since φ and ψ are both at most 1 pointwise and are supported on A we have, for all x such that (x) = 0, 1 1 |A | (ε) −1 (ε) −1 3 (x) = φ (y)ψ y x = φ (y)ψ y x   K , |A| |A| |A| y∈A that is to say (8.10)   K . 100 (ε) c Let g ∈ A . Now since id ∈ A we have the crude bound d (id, A )  ε. It follows (ε) (ε) from Lemma 8.3 that ∂ φ   d (g)/ε. From the identity g ∞ (ε) −1 ∂ (x) = ∂ φ (y)ψ y x , g g |A| y∈A we have that 1 K −1 (ε) ∂ (x)  ∂ φ ψ y x  d (g). g g ∞ |A| ε y∈A This immediately yields the crude bound (ε) (8.11) ∂   d (g) g ∞ in the direction of (8.9), the statement we are trying to prove. Denote by P(X) the bound (8.12) ∂   Xd(g) g ∞ 100 3 O(1) for all g ∈ A . We have just demonstrated P(K /ε), and we wish to prove P(K ), which is (8.9). To this end we will implement a bootstrapping argument, showing that THE STRUCTURE OF APPROXIMATE GROUPS 169 P(X) implies a stronger version of itself, namely P(X ) with some X < X, under appro- priate conditions. The hypothesis P(X) (cf. (8.12)) implies an improved Lipschitz bound on φ.To (ε) see this note that if d (g)< 1/1000X then from assumption P(X) we have ∂  < g ∞ 1/1000 and hence, from (8.8), that g < 1. By definition of the escape norm this implies (ε) c that g ∈ A. Phrased in the contrapositive, it follows that d (id, A )  1/1000X, and therefore the Lipschitz bound in Lemma 8.3 implies that (ε) (8.13) ∂ φ  1000Xd (g). g ∞ The bootstrapping argument hinges on the Taylor expansion identity n−1 ∂ n  = n∂  + ∂ i ∂ , g g g g i=0 n 200 valid whenever g,..., g ∈ A (say). This identity implies, using the triangle inequality and (8.10), that n−1 n−1 1 1 2K 1 n i i (8.14) ∂   ∂  + ∂ ∂   + ∂ ∂  . g ∞ g ∞ g g ∞ g g ∞ n n n n i=0 i=0 To use this, we need to focus attention on the first and second derivatives of .To bound the first derivative we use the identity −1 ∂ (x) = φ(y)∂ ψ y x , h h |A| valid for h ∈ A . Since φ  1, this and the Lipschitz bound on ψ given in Lemma 8.4 imply that 4 2 (8.15) ∂   ∂ ψ  1/10 K h ∞ h ∞ |A| y∈A if h ∈ S. We turn to the second derivative ∂ ∂  for g ∈ Aand h ∈ S. Here we use the h g identity −1 ∂ ∂ (x) = (∂ φ)(y)∂ ψ y x . h g g h |A| y 170 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO Recalling that φ, ψ are supported on A and using the Lipschitz bound (8.13)on φ together with the Lipschitz bound on ψ giveninLemma 8.4, we obtain the bound 1 1 (ε) (8.16) ∂ ∂   ∂ φ ∂ ψ  Xd (g) h g ∞ g ∞ h ∞ |A| 10 y∈A if g ∈ Aand h ∈ S. 2 n These bounds are useful in (8.14)providedthat n is such that g, g ,..., g ∈ S. However, the second trapping property in the definition of strong approximate group ensures that this is so for a reasonably large value of n, indeed for n as large as . 6 3 10 K g Taking n this large and substituting into (8.14) yields 7 6 (ε)  (ε) ∂   10 K g + Xd (g)  X g , g ∞ e 7 6 1 where X = 10 K + Xand g ∈ S. The claim also trivially holds when g ∈ S. It is easy to improve this to the stronger statement P(X ) using the triangle inequal- ity ∂   ∂  +∂  , already observed in (6.2). Indeed for every η> 0there gh ∞ g ∞ h ∞ are, by the definition of d , g ,... g such that g = g ... g and 1 n 1 n (ε) (ε) (ε) d (g)> g  +··· +g  − η. 1 n e e Therefore (ε) (ε) ∂   ∂  + ··· + ∂   X g  +··· +g g ∞ g ∞ g ∞ 1 n 1 n e e (ε) X η + d (g) . Since η was arbitrary, we do indeed obtain the bound ∂   X d(g), which is g ∞ the statement P(X ). By repeating this deduction of P(X ) from P(X) many times, we see that the crude 3 9 6 bound P(K /ε), established in (8.11), eventually implies P(10 K ), and hence (8.9). By earlier remarks, this concludes the proof of (ii), the inequality for products. Finally, we turn to the commutator bound (iii). Now that we have the product inequality (ii), we may define a function φ obeying the properties in Lemma 8.3 but using (ε) g instead of the fudged quantity g =g + ε,thatistosay with e e d(g) := inf g  ; g = g ... g , n  1 . i e 1 n i=1 c −O(1) This is because (ii) implies the lower bound d(id, A )  K , and in particular d(id, A ) = 0. Moreover we have the Lipschitz bound O(1) (8.17) ∂ φ  K d(g). g ∞ THE STRUCTURE OF APPROXIMATE GROUPS 171 We will use this function φ in establishing (iii), the bound for commutators. Once again we consider an auxillary function , defined now to be the convolution −1 (x) := φ(y)φ y x |A| y∈A again with the convention that  vanishes outside of A .Weobserve theidentity ∂ ∂  − ∂ ∂  =−T ∂ , g h h g hg [g,h] 10 −1 −1 for g, h ∈ A ,where T denotes the shift defined by T f (x) := f (g x) if g x is well- g g defined, and 0 otherwise. It follows that ∂   ∂ ∂  +∂ ∂  . [g,h] ∞ h g ∞ g h ∞ By the first bound in (8.16) (which holds equally well for this )wehave ∂   ∂ φ ∂ φ . [g,h] ∞ g ∞ h ∞ |A| y∈A From (8.17)weobtain # # O(1) y O(1) y # # ∂   K d(g) sup d h  K g sup h . [g,h] ∞ e 4 4 y∈A y∈A By part (i) , this implies O(1) ∂   K g h . [g,h] ∞ e e To conclude, we note that (8.8) holds for this new auxillary function  as well, since the only fact we used in establishing that other than trapping properties of A was the lower bound (id)  1. This, at last, concludes the proof of Theorem 8.1. To conclude this section we assemble the main results of it and the previous section in a portable form. The following is the only result we shall need from Section 7 and the present section going forward to the next (and final) part of the paper. Proposition 8.5. — Suppose that A is an ultra approximate group and that π : A → L is a good model for A into a connected Lie group L with Lie algebra l.Let B be an arbitrary compact convex neighbourhood of 0 in l. Then, for sufficiently small r, r with 2r > r > r > 0, we may find a large strong ultra approximate subgroup A of A such that −1  −1 (i) π (exp(rB)) ⊂ A ⊂ π (exp(r B)); (ii) (A ) is well defined; 172 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO (iii) The escape norm g  satisfies e,A 10 −1 (a) (Conjugation) If g, h ∈ (A ) then h gh = O(g ); e e (b) (Product) If n is a nonstandard natural number and g ,..., g ∈ (A ) is a non- 1 n standard finite sequence of elements of (A ) (i.e. an ultraproduct of standard finite sequences, see Section A) then g ... g  = O( g  ); 1 n e i e i=1 (c) (Commutators) If g, h ∈ (A ) then we have [g, h] = O(g h ). e e e (iv) The set H := {g ∈ A ;g = 0} is a global internal subgroup, that is to say it is of the form H = H ,where H ⊂ A contains id and is stable under multiplication and n n n n→α inverse, which is contained in A and is normalised by A . Proof. — The existence of A satisfying (i) and (ii) follows from part (iii) of Definition 3.5.If r, r are small enough then Proposition 7.3 ensures that A is a ultra strong approx- imate group in the sense of Definition 7.1. Properties (iii)(a), (b) and (c) then follow imme- diately from Theorem 8.1 and taking ultraproducts, and (iv) then follows from (iii). Remark 8.6. — Observe that if A is a strong ultra approximate group, that is to say an ultraproduct of K-strong finite approximate groups, and if L is a locally compact model of A as given for example by Proposition 6.1, then from the strong approximate group hypothesis made on A we see that the standard part of the escape norm st(g ) e,A and the escape norm of π(g) ∈ L with respect to the neighborhood of the identity π(A) of L are comparable. Namely π(g)  st(g )  π(g) . As a consequence, if e,π(A) e,A e,π(A) we take the standard parts of the escape norm in properties (i) to (iii), then what we obtain is precisely the analogous properties for the escape norm in L with respect to π(A).In that case, the three properties are essentially equivalent to the original Gleason lemmas in the literature on Hilbert’s fifth problem, applied to the locally compact (local) group L. In the sequel however, it will be very important that the three bounds (i) to (iii) obtained in Proposition 8.5 hold at the ultra level in R and not only at the level of standard parts. 9. Proof of the main theorem In this section, we complete the proof of our main theorem, Theorem 4.2.We will do so by first reducing to the case when A has no global internal subgroup. For convenience, we introduce the following definition. Definition 9.1 (No small subgroups). — An ultra approximate group A has the NSS property if A does not contain any non-trivial global internal subgroup. By a global internal subgroup of A = A , we mean a subset of the form n→α H ,where H ⊆ A is a genuine subgroup. Note that A is NSS if and only if, n n n n→α for any g ∈ A\id, the escape norm g is non-zero (though it may be infinitesimal). We e,A remark that an analogous NSS condition for locally compact groups plays a key role in the theory of Hilbert’s fifth problem. THE STRUCTURE OF APPROXIMATE GROUPS 173 Example 22. —Let N ∈ N be an unbounded (nonstandard) integer. Then the interval A := [−N, N] (in the nonstandard integers Z) is NSS. Note that while A contains global subgroups such as Z or {x ∈ Z : x = o(N)}, such subgroups are not internal (they are not the ultralimits of standard sets). Clearly, any ultra approximate subgroup of an NSS ultra approximate group is also an NSS ultra approximate group. Using the Gleason lemmas from Section 8 we can reduce the proof of our main theorem to consideration of the NSS case. Proposition 9.2 (NSS reduction). — Let A be an ultra approximate group. Then there exists a 1000 4 large ultra approximate subgroup A of A, with (A ) well-defined and contained in A , and a global internal subgroup H contained in A and normalised by (A ) , such that A /H is an NSS ultra approximate subgroup, which admits a connected Lie group as a good model. We refer the reader to Definition 3.5 for the definition of a good model. Here A /H denotes the quotient local group as defined in Lemma B.12. Proof. — By Proposition 7.2 there is a ultra (strong) approximate group A ⊆ A 10 4 which is large relative to A, for which (A ) is well-defined and contained in A ,and a good model π : (A ) → L, where L is a connected Lie group. Let B be an open bounded convex symmetric neighbourhood of the identity in the Lie algebra of L. Then for suffi- ciently small r > 0, exp(rB) contains no non-trivial subgroups of L. Let H denote the global internal subgroup H ={g ∈ A ;g = 0} given by Propo- sition 8.5. Since H is normalised by A , it is also normalised by (A ) .Wemay then 100  8 apply Lemma B.12 and consider the quotient local group (A ) /H. Then (A ) /H = (A /H) is well-defined. Since A , H are nonstandard finite symmetric sets, A /His also; 2  2 since (A ) can be covered by finitely many left-translates of A, (A /H) can be covered by finitely many left-translates of A /H. We conclude that A /H is an ultra approximate group. Since exp(rB) contains no non-trivial subgroups, the image of H under π has to be trivial, thus the homomorphism π descends to a homomorphism of A /HtoL, which satisfies the conditions for a good model (see Definition 3.5). By construction, every element g ∈ A that is not in H has positive (but nonstandard) escape norm g .If g ∈ A e,A and [g] ⊆ A /H, where [g] is the class of g in A /H, then g ⊆ A . On the other hand A is a strong ultra approximate group, and thus g  is non-zero if and only if g 2 e,A e,A is non-zero. This implies that every non-identity element [g] in A /H also has positive escape norm [g]  .Thus A /H is NSS and the claim follows. e,A /H Let us now state Theorem 4.2 in the special case of NSS groups, and show how the general case of Theorem 4.2 follows from it. Theorem 9.3 (NSS approximate groups contain large nilprogressions). — Let A be an NSS ultra approximate group which admits a connected Lie group L as a good model. Then A contains a 174 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO nondegenerate ultra nilprogression P in normal form, which is large relative to A. Furthermore, the rank andstepof P are no greater than the dimension of L. Proof that Theorem 9.3 implies Theorem 4.2.—Let A be an ultra approximate group. We may find a large ultra approximate subgroup A of A which satisfies the conclu- sions of Proposition 9.2. We may then apply Theorem 9.3 to A /Hand find in (A ) /H a nondegenerate ultra nilprogression P in normal form with |P | |A/H|.Wecan 0 0 write P = P (u ,..., u ; N ,..., N ),where N ∈ N are unbounded and u ∈ A /H. We 0 s 1 r 1 r i i may then pick arbitrary lifts u ∈ A and set P = P(u ,..., u ; N ,..., N ).Then HP is i 1 r 1 r 5 4 a nondegenerate ultra coset progression in normal form contained in (A ) ⊆ A ,and |HP|  |H||P | |A| as desired. We turn now to the proof of Theorem 9.3, which will occupy the remainder of this section. We begin with a brief sketch, fleshing out a little more the overview given in Section 4. The proof will proceed by induction on the dimension of the connected Lie group L. The base case of the induction, when dim L = 0, is trivial as in this case the NSS ultra approximate group A is also trivial. To treat the induction step we will consider an element u of A with smallest possible escape norm. The existence of such an element is guaranteed by our standing hypothesis that approximate groups are finite objects, i.e. that each A in A = A is finite. Then we will mod out A by the n n n→α geometric P := {u , |n|  1/u },where u is an element of A which the smallest possible e,A escape norm u . The quotient local group A/P (in the sense of Lemma B.12)willbe e,A shown to be both NSS and to admit a Lie group with dimension at most dim L − 1 as a good model. It is at this step that we crucially rely on the fact that we are only quotienting out by a local group, the progression P, rather than a global one such as the group u generated by u. We do this in order to avoid accidentally creating torsion with an excessively large quotient. Indeed, it is because of this component of the induction that it was necessary to cast the entire argument in the setting of local groups rather than global groups, even if one had been willing to restrict the main results of the paper to the global group case. Finally, making key use of the properties of the escape norm given by the Gleason lemmas, we will lift the nilprogression from A/Pto A. Let us turn to the details. Proof of Theorem 9.3.—Let A be an NSS ultra approximate group which admits a connected Lie group L as a good model π : A → L. We proceed by induction on dim L and first dispose of the trivial case when L has dimension zero. As L is connected, it must thus be trivial. Applying Definition 3.5(iii), we conclude that A is a large global internal subgroup of A. Since A is NSS, this kernel must therefore be trivial. Therefore A is trivial. Now suppose that dim L  1, and that the claim has already been proven for con- nected Lie groups of smaller dimension. To complete the proof of Theorem 9.3 it suffices to establish the following lemma. THE STRUCTURE OF APPROXIMATE GROUPS 175 Lemma 9.4 (Induction step). — Suppose that A is an ultra approximate group admitting a connected Lie group L of positive dimension as a good model. Then A contains large ultra approximate subgroups A ⊆ A ⊆ A ⊆ A with the following properties. Let u ∈ A be such that u is minimal e,A n  10 and non zero, and set P := {u :|n| < 1/u }.Then P commutes with (A ) and obeys the e,A following properties: (i) the quotient A /P is an ultra approximate group which admits a connected Lie group of dimension dim L − 1 as a good model, whose Lie algebra is formed from the Lie algebra of L by quotienting out by a one-dimensional central subalgebra; (ii) if A is NSS,sois A /P; (iii) to any large ultra nilprogression Q in A /P in normal form, one can associate a large ultra nilprogression Q in A in normal form, whose rank exceeds the rank of Q by at most one, and similarly for the step; and (iv) (A ) ⊆ A . Proof of Theorem 9.3. — Indeed apply the induction hypothesis to A /P, which we can do by (i) and (ii). We may then conclude, using (iv), that A /P contains a large ultra nilprogression. Finally, apply (iii) to conclude. Proof of Lemma 9.4. — Take B to be some small convex neighbourhood of 0 in the Lie algebra l of L. We shall take A , A , A to be such that −1  −1 (9.1) π exp(B) ⊆ A ⊆ π exp(1.001B) and −1  −1 (9.2) π exp(δB) ⊆ A ⊆ π exp(1.001δB) and δ δ −1  −1 (9.3) π exp B ⊆ A ⊆ π exp 1.001 B , 10 10 where δ> 0 is a small (standard) real number to be specified later. It follows from Proposition 8.5 that large ultra approximate subgroups of A exist with these properties, and furthermore that, if B is small enough, the escape norm · e,A satisfies the conjugation, product and commutator inequalities laid out in (iii) of that proposition. Note also that A , A admit L as a good model and have the NSS property. Property (iv) of the lemma is essentially immediate; we turn to the more substantial (i), (ii) and (iii). We begin with the proof of (i). Recall that u ∈ A is chosen so that u  is minimal e,A and nonzero. Observing that x  100δ for x ∈ (A ) , it follows from the commuta- e,A tor estimate of Proposition 8.5 (iii)(b) that for such x we have # # # # [x, u] = O x u < u e,A e,A e,A e,A provided that δ is chosen sufficiently small in terms of the implied constant O(·). 176 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO Note that [x, u] lies in A , rather than merely (A ) , since its escape norm is less than 1. From the extremal property of u, it follows that [x, u]= id, that is to say x com- mutes with u, whenever x ∈ (A ) . Recall that we are taking P := {u : n  1/u }. e,A Since (A ) is well defined, we may apply Lemma B.12 and form the quotient local group A /P  A /P , which is clearly also an ultra approximate group. We now n→α n show that A /P admits a proper quotient of L as a good model. To do this, we first verify that π(P) is a non-trivial central one-parameter local subgroup of L. Since dim L  1, the groups A , A are non trivial, and this implies that u  is e,A infinitesimal, i.e. that M := 1/u is unbounded. Let n ∈ N be such that n = o(M ). e,A 0 0 n kn We must have u ∈ ker π , because u ∈ A for all k ∈ N.Define amap φ :[−100, 100]→ L by setting tM (9.4) φ(t) := π u , where · is the (nonstandard) greatest integer function. Then π(u ) = φ(st(n/M )) for all n ∈[−100M , 100M ],and φ is a local homomorphism in the sense that φ(t)φ (s) = 0 0 φ(t + s) whenever t, s, t + s ∈[−100, 100]. Also π(P) = φ([−1, 1]). Finally, we verify that φ is continuous. Because of the local homomorphism property, it is enough to check this k 1 at 0. If t is small, then (φ (t)) = φ(tk) ∈ exp(1.001B) for every integer k ∈[0, ],hence φ(t) ∈ exp(1.001B/k) is close to the identity in L, which gives the desired continuity. As φ is a continuous homomorphism from [−1, 1] to the Lie group L, there exists an element X of the Lie algebra l such that φ(t) = exp(tX) for all t ∈[−1, 1].Moreover X ∈ 1.001B. On the other hand by the definition of the escape norm we have u ∈ / A , and hence φ(1) = exp(X)/ ∈ exp(B) and thus X ∈ / B. In particular X is non-zero, and it follows that φ([−1, 1]) is a non-trivial local one-parameter subgroup of L. Finally it is tM central in L, because L is connected and φ(t) = π(u ) commutes with the neighbour- hood of identity π(A ) as shown above. Thus X lies in the centre of the Lie algebra l. If we choose a neighbourhood U of the identity in L small enough, then by Lemma B.12 we may form the quotient space U/φ ([−1, 1]), which one easily verifies to be a local Lie group of dimension dim L − 1, whose Lie algebra is obtained from the Lie algebra of L by quotienting out by a one-dimensional central subalgebra. By Lie’s third theorem every local Lie group is locally identifiable with an open neighbourhood of a global connected Lie group L , which in our case still has dimension dim L − 1. Thus, by shrinking U if necessary, we may find a local homomorphism η : U → L whose kernel lies in φ([−1, 1]). The local homomorphism η ◦ π : (A ) → L then pushes down to a local homomorphism ψ : (A ) /P → L . Choosing δ smaller if necessary, we may assume that ψ is defined on all of (A /P) , thus making L a good model for A /P. Note we may also ensure that π(A ) contains no non-trivial subgroup of L , a property that will be needed in Lemma 9.5 below. This completes the proof of (i). We turn now to (ii), which asserted that A /P isNSS.Infactweshall provethe same statement for A /P, from which the statement for A /P follows (or note that an THE STRUCTURE OF APPROXIMATE GROUPS 177 identical proof works). Key to this endeavour is the following lifting lemma, which we will require again in the proof of (iii). 8  8 Lemma 9.5 (Lifting lemma). — Let g ∈ A /P,and let κ : (A ) → (A ) /P be the projection map. Then there exists g˜ ∈ A such that κ(g˜) = gand ˜ g  = O(g  ). e,A e,A /P Letusfirstremarkonwhy theNSS property of A /P follows quickly from this. Indeed suppose that g ∈ A /P is not the identity. Then the element g˜ generated by the above lemma is not the identity either, and hence has positive escape norm since A is NSS. By the lemma, g also has positive escape norm. Since g = id was arbitrary, this establishes the NSS property for A /P. Proof of Lemma 9.5.—Fix g ∈ A /P. Let g˜ be a lift of g in A which minimizes the escape norm ˜ g among all possible lifts of g.If g˜ is trivial, then so is g and there is e,A nothing to prove. Therefore we may assume that g˜ is not the identity and hence, since A is NSS, that it has positive escape norm. Suppose, by way of contradiction, that g = e,A /P o(˜ g ). Our goal will be to reach a contradiction by finding another lift of g with e,A strictly smaller escape norm than g˜. Set M := 1/˜ g ∈ N. e,A We now make an important deduction from our hypothesis. For every n ∈ N such that n = O(M ),wehave g ∈ A /P. In particular, for every (standard) integer k ∈ N, kM  M 1 1 g ∈ A /P. This implies that the group generated by g lies in A /P. However, in projection to the Lie model, A /P gets mapped into a neighbourhood of the identity in L , which we chose small enough so as not to contain any non-trivial subgroup. We M  M 1 1 thus conclude that g maps to the identity in L , and therefore g˜ maps into the local one-parameter subgroup φ([−1, 1]). Now there is another element which maps to φ([−1, 1]),namely u, the element for which u  is minimal. e,A In order to motivate the rest of the argument, let us temporarily work in a heuristic setting (using informal notation such as ≈), returning to tighten the argument rigorously later. Since −1 (9.5) A ≈ π exp(δB) , and since M is the least n for which g˜ escapes A ,wehave (9.6) π g˜ ≈ φ(δ) = exp(δX). Similarly (9.7) π u ≈ φ(δ). 178 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO n n  −1 Now u takes at least as long as g˜ to escape from A ≈ π (exp B). Hence (roughly) −1 it takes as least as long to escape from A ≈ π (exp δB) as well, which means that (9.8)M  M . 1 0 We are trying to find a lift of g with smaller escape norm than that of g˜. To do this −m ∗ it is sensible to look for elements of the form h := gu ˜ , m ∈ N.Providedthat m is chosen judiciously, h will also be a lift of g since (by definition) u lies in P. Since, measured in φ([−1, 1]) by applying π , the element u is “shorter” than g˜, it seems reasonable that by an appropriate choice of m we can make h shorter than g˜ as well. Being a little more precise, suppose that m, n ∈ N. Since u is central in A n n −mn n we have h =˜ g u whenever these expressions are well-defined, and hence π(h ) = n −mn π(g˜ )π(u ).From(9.6)and (9.7)wehave n  −mn (9.9) π g˜ ≈ φ st δn/M and π u˜ ≈ φ st −δmn/M . 1 0 These expressions will be legitimate if m, n are chosen so that the arguments of the φ ’s always lie in [−50, 50] (say). It follows that 1 m (9.10) π h ≈ φ st − δn . M M 1 0 However (by the Euclidean algorithm) there is a choice of m ∈ N such that |1/M − m/M |  1/2M . Comparing with (9.10) we see that for n = 1,..., 2M we have 0 0 0 n   −1 π(h ) = φ(δ ) with δ  δ. Since π (φ ([0,δ])) ⊆ A ,wemustraise h to at least the power 2M before it escapes A . Since 2M > M  M , this h is a lift of g with smaller 0 0 0 1 escape norm than g˜. Note that the computations (9.10) are legitimate for this choice of m and for n  2M . We now perform the above argument rigorously. Instead of the heuristic statement (9.5), we must work with the inclusions −1  −1 (9.11) π exp(δB) ⊆ A ⊆ π exp(1.001δB) . To get a precise form of (9.6), note that by definition of the escape norm we M −1  M  M −1 1 1 1 have g˜ ∈ A , whilst g˜ ∈ / A . In particular, as a consequence of (9.11), π(g˜ ) ∈ exp(1.001δB), whilst π(g˜ )/ ∈ exp(δB). Since M is unbounded, the first of these actu- ally implies that π(g˜ ) ∈ exp(1.001δB). M −1 M 0 0 Similarly π(u ) ∈ exp(1.01δB), whilst π(u )/ ∈ exp(δB). Once again, the first of these implies that π(u ) ∈ exp(1.001δB). Since B is convex, comparison of these facts shows that π(g˜ ) = φ(t) and π(u ) = φ(t ) with (9.12) t, t ∈[0.9δ, 1.1δ]. THE STRUCTURE OF APPROXIMATE GROUPS 179 Suppose that M and M are the escape times of g˜ and u from A , respectively. Since 1 0 u ∈ A was assumed to have minimal escape norm, M  M . On the other hand (9.11) 1 0 implies that M /M , M /M ∈[0.99δ, 1.01δ],and so 0 0 1 1 (9.13)M  1.1M . 1 0 −m ∗ ∗ As in the heuristic discussion above, take h := gu ˜ ,for some m ∈ N. Let n ∈ N.Then we have n  −mn π g˜ = φ st tn/M and π u = φ −st t mn/M 1 0 provided that the arguments of the φ ’s are in [−50, 50], which will always be the case later on in the argument. Since u is central we have t mt (9.14) π h = φ st − δn . δM δM 1 0 Roughly as before, we use the Euclidean algorithm to find m ∈ N such that |1/M − mt /δM |  t /2δM .By(9.12)and (9.13) it follows that 1 0 0 t mt t t 1 0.9 −  + 1 − < . δM δM 2δM δ M M 1 0 0 1 1 n  n It follows from this and (9.14)that π(h ) ∈ φ([0,δ]) for n  M , and hence h lies in A for these same values of n. As a consequence, h has smaller · escape norm than g˜, e,A contrary to assumption. Finally we prove item (iii) of Lemma 9.4. Suppose then that Q is a nondegen- erate large ultra nilprogression in A /P in normal form; we wish to lift this to a large ultra nilprogression Q in A of at most one higher rank and step, while preserving the nondegeneracy and normal form properties. The main difficulty is that if one lifts the generators of Q arbitrarily then there is no guarantee that the progression they generate, or even a significant part of it, will be contained in A . The key to ensuring that we do achieve this lies in making judicious use of the lifting lemma (Lemma 9.5)and theprod- uct and commutator properties of the escape norm (Proposition 8.5(iii) (b) and (c)). At this point we advise the reader to quickly review Definition 2.6 and Appendix C,where nilprogressions in C-normal form are discussed. We may write the non-degenerate ultra nilprogression in normal form as Q = P(u ,..., u ; N ,..., N ), 1 r 1 r where the u are in A /P, the N ∈ N are unbounded, and r is the rank of Q, and some i i standard step s. From the normal form hypothesis (and taking ultraproducts), we have the following properties: 180 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO (i) (Upper-triangular form) For every 1  i < j  r and  , ∈{−1, +1}, one has i j N N j+1 r (9.15) u , u ∈ P u ,..., u ; O ,..., O . j+1 r i j N N N N i j i j 1 n (ii) (Local properness) The expressions u ... u for nonstandard integers n ,..., n 1 r 1 r with |n |  N for all 1  i  r are all well-defined and distinct, if C is a suffi- i i ciently large standard real. (iii) (Volume bound) One has N ... N |Q| N ... N . 1 r 1 r (Note that as the N are unbounded, 2N + 1and N are comparable.) i i i Also, since u ∈ Q ⊆ A /Pfor all1  i  r and |n |  N ,wehave i i u   . i e,A /P By Lemma 9.5,wemay findlifts u ∈ A which project to u in the quotient local i i group A /P, and are such that u   = O u i e,A i e,A /P and thus (9.16) u    . i e,A In order to include P in the lifted progression, we set u := u, the generator of P, r+1 and N := 1/u .From(9.4) we see that r+1 e,A (9.17)M  N  M . r+1 0 0 We then define Q := P(u ,..., u , u ; εN ,...,εN ) 1 r r+1 1 r+1 for some sufficiently small standard ε> 0. We claim that Q is well-defined in A as a nondegenerate ultra nilprogression in normal form, of rank (r + 1) andstepatmost s + 1. We begin with the claim that Q is well-defined in A .From(9.16) and Proposition 8.5 one has g  ε e,A for all g ∈ Q, and in particular every product in Q lies in A as required. THE STRUCTURE OF APPROXIMATE GROUPS 181 It is clear that Q is a nondegenerate ultra non-commutative progression of rank (r + 1). To show that it is a nilprogression of step at most s + 1, it suffices to show that ±1 ±1 any iterated commutator g of length s + 2 in the generators u ,..., u is trivial. Using 1 r+1 commutator identities such as the Hall-Witt identity zy xy −1 −1 −1 −1 (9.18) z, [x, y] = y , z , x z, x , y y −1 where x := y xy (using the unbounded nature of the N to justify all operations) we may ±1 restrict attention to iterated commutators g of the form g =[h, u ] where h is an iterated commutator of length s + 1and 1  i  r + 1. But by projecting down to A /P, we know that the image of h vanishes and thus h ∈ P. Since P is central in A , the claim follows. Finally, we need to show that Q is in normal form. We begin by establishing the upper triangular form (2.1), i.e. that N N j+1 r u , u ∈ P u ,..., u ; O ,..., O j+1 r i j N N N N i j i j whenever 1  i < j  r + 1and  , ∈{−1, +1}. i j If j = r + 1, then u = u commutes with every element of A , and in particular with u , so the claim follows in this case. Now suppose that j  r.From(9.15)wethenhave j+1 r i j u , u ∈ P u ,..., u ; O ,..., O i j j+1 r N N N N i j i j which lifts to j+1 r u , u ∈ P u ,..., u ; O ,..., O · P. j+1 r i j N N N N i j i j i n Thus we may write [u , u ]= gu ,where i j j+1 r g ∈ P u ,..., u ; O ,..., O j+1 r N N N N i j i j and n  1/u  = M .From(9.16) and Proposition 8.5 one has e,A # # # # u , u  , i j e,A N N i j and therefore e,A N N i j and hence # # # # (9.19) u  . e,A N N i j 182 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO n n In particular, u   is infinitesimal, which implies that π(u ) = id and hence n = o(M ) e,A by (9.4). Since u  = 1/N , we conclude that e,A r+1 r+1 |n| N N i j and thus N N j+1 r+1 u , u ∈ P u ,..., u ; O ,..., O . j+1 r+1 i j N N N N i j i j Noting that ε> 0 is standard and thus can be absorbed into the O() notation, this gives the desired upper triangular property. Next, we establish the local properness. Suppose that n n 1 r+1 1 r+1 u ... u = u ... u 1 r+1 1 r+1 for some |n |, |n |  εN . Quotienting by P, we conclude that i i n n n n 1 r 1 r u ... u = u ... u . 1 r 1 r This quotienting can be justified because all products here lie in Q and hence in A .By the local properness of Q, we conclude if ε is small enough that n = n for all 1  i  r; we may then cancel and conclude that n −n r+1 r+1 u = id. Since u  = 1/N and |n − n | < N , this implies that n = n , giving the e,A r+1 r+1 r+1 r+1 r+1 r+1 desired local properness. From local properness one immediately has the lower bound |Q| N ... N N . 1 r r+1 Now we establish the matching upper bound |Q| N ... N N . 1 r r+1 We first recall from the normal form of Qthat |Q| N ... N . 1 r From construction it is also clear that the image of Q under projection by P lies in Q. It therefore suffices to show that the preimage of any element in Q contains at most O(N ) elements of Q. By construction of the quotient map, we see that the preimage is contained in a translate of P, and thus has cardinality O(M ); the claim then follows from (9.17). This concludes the proof of Lemma 9.4 and thus Theorem 9.3.  THE STRUCTURE OF APPROXIMATE GROUPS 183 We can now conclude the proof of Theorem 2.10, the most basic form of our main theorem. Proof of the first part of Theorem 2.10. — We argue by contradiction. Negating the quantifiers, we see that there exists some K  1 and an infinite sequence of local groups G and finite K-approximate groups A ⊆ G , n ∈ N, for which the conclusion of the n n n theorem fails, namely for which A does not contain any coset nilprogression of rank and step at most n in n-normal form and of cardinality at least |A |. Now form the ultraproduct A = A inside G = G .ByŁos’s Theo- n n n→α n→α rem (Theorem A.6), G is a local group and A an ultra approximate subgroup. We can now apply Theorem 4.2, whose proof we just completed, to conclude that A contains an ultra coset nilprogression P in normal form with |P| |A|. Using Łos’s theorem again we conclude that P = P ,where foran α-large set of n,P is a 1/c-proper coset n n n→α nilprogression contained A of rank andstepatmost 1/c and of size at least c|A | for some standard positive number c > 0. But this contradicts the construction of the A , thereby yielding the claim. To conclude this section we record another useful conclusion from the above anal- ysis: Hrushovski’s Lie model is nilpotent. Proposition 9.6 (Nonstandard finite approximate groups have nilpotent Lie models). — Suppose that A is an ultra approximate group and that π : A → L is a good model for A into a connected Lie group L with Lie algebra l. Then l and L are nilpotent. Proof. — By Proposition 8.5 we may find a large strong ultra approximate subgroup A of A obeying the conclusion of that proposition. By quotienting out the elements H of A of zero escape norm as in the proof of Proposition 9.2, we obtain an NSS ultra approximate group A /H. Now one runs the argument in Theorem 9.3. An inspection of this argument shows that if one unfolds the induction from Lemma 9.4, the Lie algebra l of L is repeatedly quotiented out by central algebras until it becomes trivial. Thus, l can be obtained from the trivial Lie algebra by a finite tower of central extensions and is therefore nilpotent as required. The nilpotence of L is an immediate consequence of this and basic Lie theory. 10. A dimension bound In this section we prove Theorem 2.12, in which it is shown that the rank of the O(1) nilprogression P in the main theorem may be taken to be O(K ). We will also show 4 O (1) that so long as we work in a global group G, and replace A with A ,itmay be taken to be O(log K). By the usual ultraproduct argument, it will suffice to establish the following nonstandard analysis formulation of the theorem. 184 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO Theorem 10.1. — Suppose that A is an ultra global K-approximate group, thus A = A for some finite K-approximate groups, each contained in a global group G . Then A con- n n n→α tains an ultra coset nilprogression P in normal form with |P| |A| and rank at most O(K log K). Moreover, there exists a standard natural number m such that A contains an ultra coset nilprogression P in normal form with |P| |A| and rank at most 6log K. Recall that the step of a nilprogression in normal form is always less of equal to its rank. The derivation of Theorem 2.12 from Theorem 10.1 proceeds analogously to the derivation of Theorem 2.10 from Theorem 4.2 and is omitted. It remains to establish Theorem 10.1. The arguments here are inspired by some remarks of Hrushovski in [33, §4]. In particular, a key tool will be the following lemma from [33, Lemma 4.9]. Lemma 10.2 (Doubling in a simply connected nilpotent Lie group). — Let G be a connected, simply connected nilpotent Lie group of dimension d , and let A be a measurable subset of G.Let μ be a Haar measure on G(note that nilpotent groups are automatically unimodular, and so there is no 2 d distinction between left and right Haar measure).Then μ(A )  2 μ(A). Proof. — We use an argument of Gelander from [33, Lemma 4.9]. As is well known (e.g. see [12]), in a simply connected nilpotent Lie group, the exponential map exp : g → G is a diffeomorphism, which pushes forward the Lebesgue measure μ on the d - dimensional vector space g, the Lie algebra of G, to the Haar measure μ on G. Thus it 2 d will suffice to show that μ (log(A ))  2 μ (log A), where log is the inverse of exp. But g g 2 2 2 as A contains {a : a ∈ A},log(A ) contains the dilate 2 · log A of log A, and the claim follows. One is tempted to combine this theorem with the Hrushovski Lie Model Theo- rem directly (i.e. Theorem 3.10), to get some dimensional control on the Lie group L. However, there is a technical obstruction; the Lie model is only available for an ultra ap- proximate subgroup A of A, and the covering parameter K of this subgroup A may be much worse than the covering parameter K of the original ultra approximate subgroup A. To get around this problem, we need to choose the subgroup A more carefully. A clue as to how to proceed is provided by the following basic observation (cf. [31, Lemma 7.3]). Lemma 10.3 (Slicing approximate groups by genuine subgroups). — Let A be a (possibly infinite) K-approximate group in a global group G,and let G be a genuine subgroup of G. Then A := A ∩ G 3 4  3 is a K -approximate subgroup and A ∩ G can be covered by at most K left translates of A . 2 4  4 Proof. — Since (A ) ⊆ A ∩ G , it suffices to show that A ∩ G can be covered 3  4 3 by K left-translates of A .But A ,can be coveredby K left-translates of A since A is a K-approximate group. Next, observe that if a left-translate gA of A intersects A ∩ G in THE STRUCTURE OF APPROXIMATE GROUPS 185 at least one point g ,then gA ∩ A ∩ G ⊆ gA ∩ G ⊆ g A . 4  3 Thus A ∩ G can be covered by K left-translates of A , as required. Lemma 10.3 suggests that we should look for Lie models of approximate groups A that are formed by slicing A with a genuine subgroup G of G. We turn to the details. Let A be a sequence of K-approximate groups in global groups G ,and let A = A be their ultraproduct; thus A is a ultra K-approximate n n n→α group that lies inside an ultra genuine group G .ByProposition 6.10,wemay find n→α amodel π : A → Gof A by a locally compact group G. −1 4 Let U be the neighbourhood in Definition 3.5.Wehave π (U ) ⊆ A and 0 0 U ⊆ π(A ).ByTheorem B.17, there is an open subgroup G of G and a closed subgroup H of G contained in U and normalized by G such that L := G /H is a connected Lie group. Let U ⊆ G be an open subset such that H ⊆ U ⊆ U ⊆ U and let φ : G → L 1 1 0 denote the quotient map. 4 −1  2 Now set A := A ∩ π (G ). From Lemma 10.3 applied to A , we see that A is aK -approximate group. We now also claim that A is a nonstandard finite set, which would make A an ultra K -approximate group. To see this, observe first from Definition 3.5(ii) that π(A ) is contained in some compact set F. As G is an open subgroup of G, it is also closed and F ∩ G is compact. We then see from Definition 3.5(iii) that we can find −1  −1   4 a nonstandard finite set A such that π (F ∩ G ) ⊆ A ⊆ π (G ).Thus A = A ∩ A , ∗ ∗ ∗ and so A is a nonstandard finite set as required. Note that π(A ) contains the open set U ∩ G and is itself contained in a compact subset of G . Hence the set E := φ ◦π(A ) is precompact and contains a neighbourhood of −1 the identity in L. Moreover (φ ◦ π) (φ (U )) ⊆ A , hence it follows that φ ◦ π : A → L is a good model for A . From Lemma 9.6, we conclude that L is nilpotent. Every con- nected nilpotent Lie group admits a unique maximal compact subgroup which, more- over, is central. Let N be the maximal compact subgroup of L and θ : L → L/Nbe the quotient map. We claim that dim(L/N)  6log K. To see this note that, as A is a K -approximate 2 6 2 group, we see that E is covered by at most K left-translates of E. Therefore θ(E) can be covered by at most K left-translates of θ(E), and hence θ(E) can be covered by at most K translates of θ(E),where θ(E) is the topological closure of θ(E), a compact set with non-empty interior. If we let μ be a Haar measure on L/N, it follows that μ θ(E)  K μ θ(E) . On the other hand, from Lemma 10.2 one has dim(L/N) μ θ(E)  2 μ θ(E) . 186 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO Since θ(E) has non-empty interior, μ(θ(E)) = 0 and so comparison of these two inequal- ities implies that (10.1)dim(L/N)  6log K. We now explain how to derive the second part of Theorem 10.1 from the above; we −1 will turn to the first part later. We consider φ (N), which is the kernel of the projection map from G to L/N(= (G /H)/N). This is a compact subgroup of G . Since π(A ) contains an open neighbourhood of the identity, we conclude that there exists a standard −1 m−1 m natural number m such that φ (N) ⊆ π(A ), and this implies that A contains the 8m kernel of θ ◦ φ ◦ π , which implies that θ ◦ φ ◦ π : A → L/N is a good model. By 4m Proposition 9.2 we conclude that A contains a large approximate subgroup A with 1000 4m (A ) well-defined and contained in A , and a global internal subgroup H of A such that A /H is an NSS approximate subgroup with a connected Lie group as a good model. 1000 m An inspection of the proof of that proposition reveals that we may take (A ) inside A andthatwemay take theconnected Liegroup to be L/N, which we have shown to have dimension at most 6 log K. Applying Theorem 9.3, we see that (A /H ) contains a large ultra nilprogression in normal form of rank at most 6 log K, and thus (A ) contains a large ultra coset nilprogression in normal form of rank at most 6 log K. As (A ) is m 4m contained in A , which is in turn contained in A , the second part of Theorem 10.1 follows (after redefining m). We now turn to the first part of Theorem 10.1. As we see from the last paragraph, −1 thedifficultyhereisthat φ (N) may not be contained in π(A ). We will show that −1 nevertheless π(A ) still contains a subgroup φ (N ),where N is a closed subgroup of 0 0 N with small codimension. For this the key is the following lemma, which is potentially d d d of interest its own right. Here, and below, we write T := R /Z for the d -dimensional torus. By a subtorus we mean a closed connected subgroup of T . Lemma 10.4. —Let K, d  1 and A be a closed K-approximate group in T containing a neighbourhood of 0. That is, A is closed, contains a neighbourhood of 0, is centrally symmetric, and there is a finite set X ⊆ T , |X|  K, such that A + A ⊆ A + X. Then 4A := A + A + A + A contains d 2 a subtorus T ⊆ T of codimension at most O(K log K). Before proving Lemma 10.4, we explain how to conclude the proof of Theo- 2 −1 rem 10.1 with Lemma 10.4 in hand. First we observe that setting A := A ∩ π (G ), 4 4 π(A ) is a neighbourhood of id in G. Indeed A is and A ⊆ XA for some finite X, so that π(A) has non-empty interior, and hence π(A ) is a neighborhood of id. We −1  2 now apply the lemma to A = φ ◦ π(A ), and conclude that φ (T) ⊆ π(A ) ⊆ π(A ). 1 1 Writing θ : L → L/T for the projection map, we see that A contains the kernel of θ ◦ φ ◦ π and this implies that A admits the connected nilpotent Lie group L/Tas a d d 2 good model. Moreover dim L/T = dim L/T + dim T /T = O(K log K) by (10.1)and by Lemma 10.4. The rest of the proof is then identical to the previous case: by Proposition THE STRUCTURE OF APPROXIMATE GROUPS 187 9.2 and it’s proof we conclude that A contains a large approximate subgroup A with 1000 3 (A ) well-defined and contained in A , and a global internal subgroup H of A such that A /H is an NSS ultra approximate subgroup admitting L/N as a good model. By Theorem 9.3, we see that (A /H ) contains a large ultra nilprogression in normal form 2  4 of rank at most O(K log K),and thus (A ) contains a large ultra coset nilprogression 2  4 3 12 in normal form with rank at most O(K log K).As (A ) ⊆ A ⊆ A , the first part of Theorem 10.1 follows. We now turn to the proof of Lemma 10.4.Let μ be the normalized Haar measure d d d d on T . Note that the group T of characters of T identifies with Z . Our main tool is the notion of the (α-)large spectrum of an additive set A, defined by Spec (A) := ξ ∈ Z : 1 (ξ )  αμ(A) . See [54, Definition 4.34] for this definition and a further discussion. If S ⊆ Z is a set of characters, we write S := ker ξ. ξ∈S ⊥ d d Note that S is a closed subgroup of T with codimension the rank of the subgroup of Z generated by S. To prove Lemma 10.4 we first reduce to the case in which μ(A) is somewhat large by establishing the following lemma. Lemma 10.5. —Suppose that A ⊆ T is a closed K-approximate group containing a neigh- d d bourhood of 0. Then there is a subtorus T ⊆ T with dim(T )  d − O(log K) and some x ∈ T 0 0 0 −O(Klog K) such that, writing μ for the Haar measure on T , we have μ ((A + x ) ∩ T ) e . 0 0 0 0 0 Then we handle the case in which μ(A) is somewhat large by proving the follow- ing, which is a straightforward continuous analogue of the so-called Bogolyubov-Chang lemma [10]. Lemma 10.6. — Suppose that A ⊆ T is measurable, that μ(A)  α, and that μ(2A) Kμ(A).Then 2A − 2A contains a subtorus T ⊆ T of codimension at most O(Klog(1/α)). To deduce Lemma 10.4 from Lemmas 10.5 and 10.6, we proceed as follows. Lo- cate an x as in Lemma 10.5, and suppose furthermore that for this x the measure of 0 0 (A + x ) ∩ T is close to maximal in the sense that 0 0 (10.2) μ (A + x ) ∩ T  μ (A + x) ∩ T 0 0 0 0 0 for all x ∈ T .Set A := (A + x ) ∩ T . Then , since A + A ⊆ A + X, we have 1 0 0 A + A ⊆ (A + X + 2x ) ∩ T . 1 1 0 0 188 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO By (10.2) it follows that μ(2A )  2Kμ(A ). We are now in a position to apply Lemma 1 1 −O(Klog K) 10.6 to A ,with α = e . We conclude that there is a further subtorus T ⊆ T of 1 0 codimension O(K log K) inside 2A − 2A . Since 2A − 2A ⊆ 4A, this concludes the 1 1 1 1 proof of Lemma 10.4. For the proofs of both Lemmas 10.5 and 10.6 we will require the following lemma of Bogolyubov type. Lemma 10.7 (Bogolyubov-type lemma). — Let A ⊆ T have positive measure and let k  2 be a natural number. Suppose that 1/(2k−2) δ  μ(A)/2μ(kA) . Then kA − kA contains (Spec (A)) . Proof. — It suffices (in fact, it is equivalent) to show that if x ∈ (Spec (A)) then ∗k ∗k f (x)> 0, where f = 1 ∗ 1 = 1 ∗···∗ 1 ∗ 1 ∗··· 1 is the convolution of k copies A A −A −A A −A of 1 and k copies of 1 . Now by the Fourier inversion formula we have A −A 2k 2k 2k $ $ $ (10.3) f (x) = 1 (ξ ) ξ(x)  1 (ξ ) − 1 (ξ ) A A A ξ∈Spec (A) ξ/ ∈Spec (A) ξ∈Z δ δ 2k 2k $  $ 1 (ξ ) − 2 1 (ξ ) , A A ξ/ ∈Spec (A) ξ∈Z where we have used the fact that ξ(x) = 1if ξ ∈ Spec (A) and x ∈ (Spec (A)) .Now δ δ Parseval’s identity and the Cauchy-Schwarz inequality imply that " " 2k ∗k 2 ∗k 1 (ξ ) = 1 (x) dμ(x)  1 (x)dμ(x) A A d μ(kA) d x∈T x∈T ξ∈Z 2k μ(A) = . μ(kA) On the other hand, by a second application of Parseval’s identity, we have 2k 2 2k−2 2k−2 2k−2 2k−1 $  $ 1 (ξ ) <δ μ(A) 1 (ξ ) = δ μ(A) . A A ξ/ ∈Spec (A) ξ∈Z Substituting these inequalities into (10.3) yields 2k 2k−1 μ(A) μ(A) 2k−2 2k−1 2k−2 f (x)  − 2δ μ(A) = μ(A) − 2δ μ(kA) . μ(kA) μ(kA) The lemma follows immediately.  THE STRUCTURE OF APPROXIMATE GROUPS 189 Lemma 10.6 is an immediate consequence of the case k = 2 of this lemma and (the continuous variant of) “Chang’s lemma” [10], which is the following statement. For a proof, see [54, Lemma 4.36]. Lemma 10.8 (Chang’s lemma). — Suppose that α< 1/2 and that A ⊆ T is a measurable d −2 set with μ(A)  α. Then Spec (A) generates a subgroup of Z of rank at most O(δ log(1/α)). Proof of Lemma 10.6. — Noting that μ(A)/μ(2A)  1/K, the lemma follows from Lemma 10.7 with k = 2and δ := 1/2 K followed by an application of Lemma 10.8. To prove Lemma 10.5 we will apply Lemma 10.7 with a much larger value of k,as well as the following result. Lemma 10.9. — There is an absolute constant c > 0 with the following property. Suppose that A ⊆ T is a closed K-approximate group containing a neighbourhood of 0. Then Spec (A) 1−c/ log K generates a subgroup of Z of rank O(log K). Proof.—Let ε = c/ log K, where c > 0 is to be chosen later. Suppose that ξ ∈ d 2iπη(x) Spec (A) and that ξ = 0. Let η : T → R/Z be such that ξ(x) = e .Then 1−ε 1 (x)ξ(x) dμ(x) is real, since A is symmetric, and at least (1 − ε)μ(A).Thus 1 (x) cos 2πη(x) dμ(x)  (1 − ε)μ(A), and hence the (symmetric) subset A (ξ ) ⊆ A, consisting of those x for which cos(2πη(x)) 99/100, has measure μ(A (ξ ))  (1 − 100ε)μ(A). In particular η(x) < whenever x ∈ A (ξ ),where θ:= inf |θ − z|. z∈Z Suppose now that Spec (A) contains elements ξ ,...,ξ which are linearly in- 1 m 1−ε dependent over Q,and let η ,...,η be the corresponding R/Z-valued characters. Con- 1 m sider the set A := A (ξ );providedthat m < 1/100ε, this will have μ(A )  μ(A). i=1 Note that A =−A . d m Consider now the homomorphism ψ : T → T given by x → (η (x), ...,η (x)). 1 m The image of A under ψ lies in a box of diameter 1/5. m m Now any subset U of T = (R/Z) which lies in a box of diameter < is Freiman 2-isomorphic to an open subset of R , and thus by the abelian case of Gelander’s Lemma 10.2 (which, in this case, is just a very simple case of the Brunn-Minkowski inequality), we have μ (2U)  2 μ (U), m m 190 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO m m where μ is the normalized Haar measure on T = (R/Z) .However,wehave 4 4 μ(5A )  μ(5A)  K μ(A)  2K μ(A ), and an application of the Ruzsa covering lemma (here our Lemma 5.2)shows that 2A is a 4K -approximate group, and con- sequently so is U := ψ(2A ). Therefore, noting that μ (U) = 0 since μ(A )  μ(A)> 0, we obtain m 4 8 2  4K = (2K) , a contradiction if m > 8log 2K. Such a choice of m is acceptable if ε< c/ log K with c sufficiently small, and so we are forced to conclude that ξ ,...,ξ cannot exist. The 1 m lemma follows. |X| K Proof of Lemma 10.5. — Note that kA ⊆ (k − 1)X+ A, and that |mX|  m = m for all natural numbers m. It follows that μ(kA)  k μ(A), and so Lemma 10.7 is applicable with δ = 1 − c/ log K for some k  K . The conclusion is that 2kA contains the subtorus T := (Spec (A)) which, by Lemma 10.9, has codimension O(log K).However 1−c/ log K K O(Klog K) 2kA is covered by at most (2k) = e translates of A, and so one of these translates −O(Klog K) has μ (A + x ) e , which was precisely what we claimed. 0 0 To conclude this section we record the observation that the above arguments also yield the following more precise version of Proposition 6.12, the weak global Lie model theorem. This builds upon a previous result in this direction by Hrushovski: see [33, Theorem 4.2] and the discussion before [33, Lemma 4.9]. Theorem 10.10 (Strong global Lie Model Theorem). — Suppose that A is a global ultra K- approximate group. Then there is a large ultra approximate subgroup A of A for some standard m  1 which admits a global model π˜ : A → L into a connected, simply connected nilpotent Lie group L of dimension at most 6log K. Furthermore, there exists a large ultra approximate group A of A which admits a global model π : A → L , a connected nilpotent Lie group, whose maximal (central) compact subgroup N verifies L /N  L. 11. Applications to growth in groups and geometry In this section we collect a variety of applications of our main results, in particular proving the various results stated in the introduction. As an application of his method Hrushovski [33] established the following strength- ening of Gromov’s theorem on groups with polynomial growth. Theorem 11.1. —Let G be a finitely generated group and let K  1. Suppose G = A , n1 where A is an increasing union of finite subsets of G such that |A |  K|A | for all n  1. Then G n n is virtually nilpotent. THE STRUCTURE OF APPROXIMATE GROUPS 191 This is indeed a strengthening of Gromov’s theorem because if G has polynomial growth with respect to some generating set S then the A may be taken to be some subsequence of the word metric balls relative to S. Unsurprisingly, our main theorem also admits an application of this kind. The following is a corollary of Theorem 2.10 and subsumes Theorem 11.1 above. Corollary 11.2 (Gromov-type theorem). — Let K  1. Then there is some K ,depending on K, such that the following holds. Assume G is a group generated by a finite symmetric set S containing 2 K the identity. Let A be a finite subset of G such that |A |  K|A| and S ⊆ A. Then there is a finite normal subgroup N  G and a subgroup G  G containing N such that (i) G has index O (1) in G; 1 K (ii) G /N has step and rank O (1). 1 K In particular G is virtually nilpotent. Proof of Corollary 11.2. — First we make the following simple observation. Suppose be a subgroup of index G is a group generated by a finite symmetric set S and let G n =[G : G ].Thenfor every k < n the ball S meets at least k + 1 different left cosets of i i+1 G in G. Indeed if not then by the pigeonhole principle we have S G = S G for some 0 0 0 k k+1 i < k, and so by multiplying on the left with S it follows that S G = S G . Multiplying 0 0 on the left by further copies of S implies that S G = S G = G, and so G has index at 0 0 0 most k in G, contrary to assumption. Now, we apply Corollary 1.7. Thus there exists a subgroup G of G and a nor- mal subgroup H of G such that A may be covered by K left-translates G for some 0 0 K = O (1) depending only on K ,and G /H is nilpotent of step and rank O (1).In K 0 K particular, G is finite-by-nilpotent. Using this value of K , we see by assumption that S is contained in A and thus S is covered by at most K cosets of G . From our initial observation we conclude that [G : G ]  K . Note that for some s = O (1) the s-th term of the central descending series C (G ) K 0 −1 is contained in H. Moreover, G := gG g is a normal subgroup of G with index at 1 0 g∈G most O (1) contained in G .Hence N := C (G ) is a normal subgroup of G contained K 0 1 in H. On the other hand, G /N is nilpotent of complexity bounded in terms of K only and it has index O (1) in G/N. To conclude from this that G is virtually nilpotent, it suffices to show that G is. However G is actually finite-by-nilpotent (the finite group being N) and any such group is virtually nilpotent. To see this note that the kernel of the action by conjugation on N is a nilpotent subgroup of finite index. Remark 11.3. — Recall that the condition |A |  K|A| implies the existence of an O(1) O(1) approximate group Z of size O(K |A|) and of O(K ) left translates of Z which cover 192 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO A (see [52, Theorem 4.6]). Using Remark 1.9 and Theorem 2.12, we then see that G can be taken so that G /NisO(log K)-nilpotent in the sense that it admits a generating set u ,..., u with  = O(log K) such that [u , u ]∈ u ,..., u for all i < j.Inparticular 1  i j j+1 such a group admits a normal series with cyclic factors of length at most O(log K). Remark 11.4. — If one assumes that A is a K-approximate group instead of the doubling condition |A |  K|A| in Corollary 11.2, then we may also conclude from The- orem 1.6 that N and a generating set of G are contained in A . Using Theorem 2.12 we see that if one additionally wishes to ensure the logarithmic bound as in the previous O (1) remark, then one can only guarantee that N lies inside A . The following corollary is reminiscent of Gromov’s theorem but it involves a weaker type of polynomial growth condition in which the generating set may be arbi- trarily large. Furthermore it only requires that at one scale. Corollary 11.5. —Let d > 0. Then there is R(d)> 0 such that the following holds. Suppose that G is generated by a finite symmetric set S and that there is some scale r > R(d) such that |S | r |S|. Then there is a finite normal subgroup N  G and a subgroup G  G containing N such that (i) N ⊆ S ; (ii) G has index O (1) in G; 1 d (iii) G /N is O(d)-nilpotent (see Remark 1.9 for a definition). r d d Proof. — Our assumption is that |S |  r |S|.Let K = 2 · 10 and C be such that, in the last part of Remark 11.4,Nlies in A . We claim that there is some r , r r 5 d 2 r  r/2C ,suchthat A := S has |A |  10 |A|.Notethat A is then a K-approximate 0 K group with K = 2 · 10 (see Lemma 5.2). Applying Corollary 11.2 and Remark 11.4 and ensuring that R(d) is so large that R(d)>(K ) (K being the quantity in Corollary 11.2), we obtain a finite normal subgroup N  G and a subgroup G  G containing N such that G has index O (1) in G and G /NisO(log K) = O(d)-nilpotent. Furthermore N 1 d 1 2C 2r C r K 0 K and a set of generators for G are contained in A = S ⊆ S . √ √ i+1 i 5 r d 5 r It remains to justify the claim. If it is false then |S | > 10 |S | whenever √ √ i r d log ( r/10C )−1 r 5 r < r/10C , and in particular |S |  (10 ) |S |.If r is greater than some absolute constant, this is greater than r |S|, contrary to assumption. Remark 11.6. — Note that there is no bound on the size of N. Indeed, if G is a large finite simple group and S = G then N must equal G, which shows that |N| can be arbitrarily large compared to d, r. In [51] Y. Shalom and the third author gave a quantitative refinement of Gromov’s theorem inspired by Kleiner’s recent new proof (see also [38, Corollary 4.2] for an earlier THE STRUCTURE OF APPROXIMATE GROUPS 193 result in that direction). A consequence of their result is that a polynomial growth condi- tion at one large scale is enough to guarantee virtual nilpotence. We take the opportunity to record that this follows easily from Corollary 11.2. Corollary 11.7. —Let d > 0.Thenthere is R(d)> 0 such that the following holds. Suppose that G is generated by a finite symmetric set S containing the identity and that there is some scale r > R(d) r d such that |S |  r .Then G contains G ,where (i) G has index O ((r )!) in G; (ii) G is nilpotent with step O (1). Proof. — We apply Corollary 11.5 to obtain groups N, G with the properties stated r d there. As N is contained in S , it has cardinality at most r .The group G acts on N by conjugation; since the permutation group of N has cardinality at most (r )!, we conclude that the stabiliser G of this action has index at most (r )! in G .As G /N is nilpotent of 1 1 step O (1), we conclude that G is nilpotent of step O (1) + 1 = O (1), and the claim d d d follows. Remark 11.8. — As observed in the last section of Gromov’s original paper [27], Gromov’s theorem on polynomial growth already easily implies a weaker result of this r d kind in which the hypothesis is that |S |  r for all r = 1, 2,..., R(d). Note that this result of Gromov (and, a fortiori, Corollary 11.7) have content even when the group G is r d finite. Another weakening of the above result appears in [58], where |S |  r is assumed for infinitely many r rather than for all r. Corollary 11.7 is stronger than the results in [38, 51] in the sense that the bounds do not depend on the cardinality |S| of S. On the other hand, the results in [38, 51], which follow a strategy close to that of Kleiner’s work [37], yield more effective quanti- tative control on the index and step of G , especially in the case when S is of bounded cardinality. Another consequence of our main theorem is that polynomial growth in the sense of Corollary 11.5 at one large scale implies polynomial growth at all subsequent scales. Corollary 11.9. —Let d > 0. Then there is R (d)> 0 such that the following holds. Suppose r d that G is generated by a finite symmetric set S and that |S |  r |S| for some r  R (d). Then r  O (1) |S |  (r ) |S| for all r  r. Proof. — A simple modification of the proof of Corollary 11.5 shows that there is 5r r d 0 0 some r , r  r  r/6, such that |S |  K|S | where K = 100 (say). Applying Corol- 0 0 r 4r 0 0 lary 11.2 with A := S (as before) we obtain a normal subgroup H ⊆ S such that G/H is virtually nilpotent with the index, step and number of generators of the nilpotent sub- group G /H all being O (1). 1 d 194 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO 2 2r Now by Corollary 5.2,A = S is a 2K-approximate group. This means that there 4r 2r 0 0 is some set X, |X|  2K,suchthat S ⊆ XS . From this it follows that 2mr m 2r 0 0 (11.1)S ⊆ X S for every positive integer m. Let π : G → G/H be the quotient homomorphism. We have 2mr 2mr 0  0 (11.2) S  |H| π S , since the cardinality of any fibre is at most |H|. From (11.1) and the fact that π is a homomorphism we have 2mr m 2r 0 0 (11.3) π S ⊆ π(X) π S . 4r On the other hand, since H ⊆ S ,wehave 2r 6r 0 0 (11.4) |H| π S  S . Moreover, since r  r  r/6, we have 6r r d 2d (11.5) S  S  r |S|  r |S|. Putting (11.2), (11.3), (11.4)and (11.5) together gives 2mr m 2d (11.6) S  π(X) r |S|. Now π(X) is a set of size O (1), contained in a virtually nilpotent group in which the index and step of the nilpotent subgroup are O (1). Every such group is a quotient of one fixed virtually nilpotent group with number of generators, index and step of the nilpotent subgroup also O (1) and whose generators are lifts of the elements in π(X). Hence there is a bound of the form m O (1) π(X)  m for all m > 1. Comparing this with (11.6)confirms that r O (1) S  r |S| whenever r is a multiple 2mr with m > 1. It is not hard to see that the same estimate therefore holds for all r , at the expense of increasing the exponent O (1) if necessary. A consequence/reformulation of the preceding result is the following. THE STRUCTURE OF APPROXIMATE GROUPS 195 Corollary 11.10. —Let α> 0. Then there are r ∈ N and β> 0 with lim β(α) = 0 0 α→0 such that the following holds. Let G be a finite group generated by a symmetric set S and as- sume that the diameter of the associated Cayley graph satisfies diam (G)  (|G|/|S|) . Then r 1/β |S |  min{r |S|, |G|} if r  r (α). Proof. — If this does not hold for some r, r and β then as soon as r is large 0 0 n e enough (in terms of β ) Corollary 11.9 applies and yields |S |  n |S| for all n  r and some e = e(β) > 0. In particular, when n reaches the diameter of G, we obtain S = Gso |G|  (diam (G)) |S|. This contradicts our hypothesis if e < 1/α. We shall apply Corollary 11.10 later on to deduce an isoperimetric inequality; see Corollary 11.15. Finally we show that by repeatedly applying Corollary 11.2 we canobtainthe following more precise result, which says something non trivial for finite groups as well. We say that a polycyclic group has length at most L if it is obtained from the trivial group by at most L successive extensions by a cyclic group. Corollary 11.11. —Let G be a group which has a left-invariant metric d : G× G →[0, ∞) satisfying the following conditions for some K  1: (i) (Uniform doubling property) We have |B(2r)|  K|B(r)| for every r > 0; (ii) (Finiteness condition) There are at most K different subgroups of the form B(r) as r ranges over (0, ∞). Then G has a subgroup of index at most O (1) which is polycyclic of length O (1). K K Proof.—Given d ∈ N and R  0 we claim that if there are at most d groups of the form B(r) for r  R, then B(R) contains a polycyclic subgroup of index O (1). This K,d is clearly enough to establish the corollary. To prove the claim, we proceed by induction on d.Itisclear for d = 1, since B(R) is then the trivial group. Let R be the upper bound of those R  0 such that there are at most d − 1 groups of the form B(r) for r  R . Without loss of generality 0 < R  R. Then B(r) = B(R) whenever R  r  R. By the induction hypothesis, B(R /2) contains 0 0 a polycyclic subgroup P of index O (1) and length O (1). K,d K,d Let K = O (1) be the constant obtained in Corollary 11.2. Setting S = B(R ) and K 0 A = B(K R ), we may apply Corollary 11.2 and conclude that G = B(R ) contains a 0 0 subgroup G of index O (1) such that G has a normal subgroup N ⊂ B(4K R ) with 1 K 1 0 G /N nilpotent with step and number of generators O (1). It is enough to show that G 1 K 1 has a polycyclic subgroup of index O (1), because then so will G = B(R) . K,d By the uniform doubling assumption and a covering argument, B(4K R ) can be covered by O (1) translates of B(R /2). It follows that N can be covered by O (1) K 0 K,d Recall that B(R) is the closed ball {g ∈ G; d(1, g)  R}. 196 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO translates of P, and in particular [N : N ∩ P]= O (1).Now N ∩ P is a subgroup of K,d P and hence is also polycyclic of length O (1); in particular, it is generated by O (1) K,d K,d elements. Therefore so is N, and hence N , the intersection of all subgroups of N of index at most [N : N ∩ P], has index O (1) in N. (To see this recall Schreier’s theorem that K,d 2k−1 if S is a symmetric generating set for a group , and if    has index k,then S contains a set of generators for  .) The group N , being a subgroup of N ∩ P, is polycyclic. It is also characteristic in N and hence, since N is normal in G ,N is also normal in G . 1 0 1 However G acts by conjugation N/N , and the kernel of this action is a subgroup 1 0 G of G with index O (1).Now (N ∩ G )/N is central in G /N and of size O (1). 1 K,d 0 0 K,d 1 1 1 We thus have N  N ∩ G  G , where each successive quotient is polycyclic of length 1 1 O (1). It follows that G is polycyclic of length O (1), which is what we wanted to K,d K,d establish. Remark 11.12. — There are examples of groups which satisfy the assumptions of Corollary 11.11 yet have no nilpotent subgroup of index O (1). For instance, let p be a large prime and set G := (Z/pZ)  Z, where the action is by an element of SL (Z/pZ) −1 ∗ which is a diagonal matrix γ of the form γ := diag(x, x ),where x ∈ F is a generator of the multiplicative group of F . Then no subgroup of G of index less than p− 1 is nilpotent (note that such a subgroup must contain (Z/pZ) and be the preimage of the subgroup of Z with that index). However we can endow G with a uniformly doubling weighted word metric (with 3 generators) by letting the two standard generators of (Z/pZ) each have weight and γ have weight 1. We turn now to some geometric applications of the above results. Manifolds with a lower bound on Ricci Curvature. — A. Petrunin suggested to us some years ago that a result such as Corollary 11.5 would give a purely group-theoretical proof of a theorem of Fukaya and Yamaguchi [16] according to which fundamental groups of almost non-negatively curved manifolds are virtually nilpotent. Recall that a closed manifold M is said to be almost non-negatively curved if one can find a sequence of Riemannian metrics on it for which diam(M)  1 while K  −1/n where K is the M M sectional curvature. Indeed, a simple application of the Bishop-Gromov inequalities com- bined with Corollary 11.5 yields the following improvement assuming only a condition on the Ricci curvature. Corollary 11.13 (Ricci gap). — Given d ∈ N,there is ε(d)> 0 such that the following holds. Let M be an d -dimensional compact Riemannian manifold with Ricci curvature bounded below by −ε and diameter at most 1. Then π (M) has normal subgroup of index O (1), which is finite-by-(O(d)- 1 d nilpotent). In particular π (M) is virtually nilpotent. See also http://mathoverflow.net/questions/11091. THE STRUCTURE OF APPROXIMATE GROUPS 197 Proof. — Fix a base point x on the universal cover Mand letF be a Dirichlet fundamental domain based at x for the action of  := π (M):thatis, 0 1 F := p ∈ M : d(x , p)  d(γ · x , p) for all γ ∈  . 0 0 Set S := {γ ∈  : d(γ · x , x )  3}. Note that diam(F )  1 and that S is symmetric 0 0 and contains 1. Observe further that S generates  and that for every integer r  1we have B(x , r) ⊂ S ·F ⊂ B(x , 3r + 1),where B(x , r) is the ball of radius r on Mfor the 0 0 0 Riemannian metric lifted from M. It follows that |S | |B(x , 3r + 1)| (11.7)  . |S| |B(x , 1)| From the assumed Ricci curvature bound and the Bishop-Gromov volume com- parison estimates (see [17, Theorem 4.19]) we have the bound |B(x , r)| |B (r)| 0 −ε |B(x , 1)| |B (1)| 0 −ε where B (r) is a metric ball in the comparison model space with constant curva- −ε ture −ε and dimension d . The volume of this ball is |B (r)|= |B (r/ ε)|= −ε −1 ( ε) sinh( εt) d−1 c ( ) dt,where c > 0 is the volume of the d − 1-dimensional unit sphere (see d d 0 ε [17, p. 138] for this volume computation). As ε tends to 0, this tends to c r /d . Combining this with (11.7)weobtainthatfor every R  1 there is some ε = ε (d, R ) such that 0 0 0 0 |S | 2(3r + 1) |S| for all r  R provided that 0 <ε <ε . Letting R = R (2d) be as in Corollary 11.5, 0 0 0 0 we obtain the existence of some ε = ε(d)> 0 for which the conclusion of that statement holds. This completes the proof. Remark 11.14. —The fact that π (M) is virtually nilpotent under the above Ricci bounds assumptions was obtained by Cheeger and Colding in [11] (and had been con- jectured earlier by Gromov) and their proof was recently completed and extended by Kapovitch and Wilking [36], who also established that the index of the nilpotent sub- group is uniformly bounded by a constant depending on the dimension d only,anim- provement which seems beyond the scope of our methods. This extended earlier work of Kapovitch, Petrunin and Tuschmann in [35] which proved the same result under sec- tional curvature bounds instead of Ricci. The work of these authors is, unlike our work, differential-geometric in nature. The linear dependence in d of the nilpotency length proven in our Corollary 11.13 seems new however. 198 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO An isoperimetric inequality. — It has been well-known since the work of Varopoulos on Kesten’s conjecture ([42, 59]) that isoperimetric inequalities on Cayley graphs are closely related to lower bounds on the volume growth. Using this idea and Corollary 11.10,we can derive the following property of finite Cayley graphs with a polynomial upper bound on the diameter. Corollary 11.15 (Isoperimetric inequality on finite groups). — Let α> 0. Then there are r ∈ N and β> 0 with lim β(α) = 0 such that the following holds. Let G be a finite group generated α→0 by a symmetric set S and assume that the diameter of the associated Cayley graph satisfies diam (G) 1 1 1 α β 1−β (|G|/|S|) . Then for every subset E in G with r  |E|  |G|, |∂ E|  |S| |E| . 2 2 8 Proof. — This follows almost immediately from Corollary 11.10 and the follow- ing well-known lemma, which may be found in [28, Chapter 5] or [42] and references therein. For the convenience of the reader we offer a self-contained proof. Lemma 11.16 (Isoperimetry versus growth). — Let G be group and S some finite symmetric generating set containing 1.Let B(r) = S be the word ball of radius r in the word metric. Let ∂ E = SE \ E be the boundary of a subset E ⊂ G.If E ⊆ G is a set, write r(E) for the infimum of those r for which |B(r)|  2|E|. Then for all E with |E| < |G|/2 we have |E|  4r(E)|∂ E|. Proof. — We give a proof for the reader’s convenience. Let f = 1 the indicator function of the set E, and f := g · f be the average of f over balls of radius g∈B(r) |B(r)| r. By the triangle inequality we have g · f − f   |g|· max s · f − f  ,where |g| is 1 s∈S 1 the distance to the identity in the word metric. Moreover s · f − f  =|sE  E|  2|∂ E| for every s ∈ S. Hence f − f   2r|∂ E|.Onthe otherhandfor every x ∈ E, there are r 1 at most |E| elements g ∈ B(r) such that g · 1 (x) = 0. Therefore if |B(r)|  2|E| then 1 1 f (x)  and hence f − f   |E|. The claim follows. r r 1 2 2 In [1], Benjamini and Kozma conjecture that one can take β = α in the Corollary 11.15 (at the expense of introducing a possible multiplicative constant c in place of |S| /8 in (ii)). This, however, is beyond the scope of our method. We would like to thank Itai Benjamini for drawing our attention to their work and its connection to Gromov-type theorems. A generalized Margulis lemma. — In hyperbolic geometry, the Margulis lemma asserts that there is a constant ε = ε(n)> 0, the Margulis constant, such for any discrete sub- n n group  of isometries of the hyperbolic n-space H , and any point x ∈ H ,the almost stabiliser  (x) := {γ ∈  : d(γ · x, x)<ε} is virtually cyclic. This lemma is important for describing the geometry of cusps in hyperbolic manifolds, or for establishing volume lower bounds (see e.g. [55]). Various generalisations of this lemma have been established in the past for more general Riemannian manifolds under curvature upper and lower bounds (e.g. [9, Chapter 6]). Typically in these results, unless the manifold has strictly THE STRUCTURE OF APPROXIMATE GROUPS 199 negative curvature, “virtually cyclic” in the conclusion of the lemma must be replaced by “virtually nilpotent”. In [28, §5.F] Gromov raises the issue of establishing a generalized Margulis lemma un- der very weak assumptions on the metric space and he proposes a conjectural statement in this direction. Below we answer Gromov’s question affirmatively. A metric space X is said to have bounded packing with packing constant K if there is K > 0 such that every ball of radius 4 in X can be covered by at most K balls of radius 1. Say that a subgroup  of isometries of X acts discretely on X if every orbit is discrete in the sense that {γ ∈  : γ · x ∈ } is finite for every x ∈ X and for every bounded set  ⊆ X. Corollary 11.17 (Generalized Margulis Lemma). — Let K  1 be a parameter. Then there is some ε(K)> 0 such that the following is true. Suppose that X is a metric space with packing constant K, and that  is a subgroup of isometries of X which acts discretely. Then for every x ∈ X the “almost stabiliser”  (x) = S (x) ,where S (x) := {γ ∈  : d(γ · x, x)<ε}, is virtually nilpotent. ε ε ε Proof. — Each set S (x) is symmetric and contains the identity. Now by the assump- tion on X the ball B(x, 4) can be covered by collection of balls B(x , 1), i = 1, 2,..., K. Suppose that for i = 1, 2,..., k there is at least one element γ ∈ S (x) with γ · x ∈ B(x , 1). i 4 i i Suppose now that γ ∈ S (x) is arbitrary; then there is some i ∈{1, 2,..., k} such that −1 γ · x ∈ B(x , 1). But this means that d(γ · x,γ · x)< 2, and therefore γ γ ∈ S (x). This i i 2 implies that S (x) ⊆ γ S (x), which yields (since S (x) ⊆ S (x)) the doubling esti- 4 i 2 2 4 i=1 mate |S (x) |  K|S (x)|. 2 2 Let K = K (K)> 0 be the constant from Corollary 11.2.Set ε := 2/K ,S = S (x) and A = S (x). A direct application of Corollary 11.2 shows that  (x) = S is virtually 2 ε nilpotent. Remark 11.18. — This confirms Gromov’s conjecture, which suggested the same conclusion under the slightly stronger hypotheses that every ball of radius R in X can be covered by at most C(R/r) balls of radius r for all 0 < r < R  1 and some fixed constants C, m > 0. The assumptions of this generalized Margulis lemma are satisfied for example if X is a complete Riemannian manifold with a lower bound on its Ricci curvature, by an immediate application of the Bishop-Gromov volume comparison estimates. In this case, the result was proved by Cheeger-Colding [11] and Kapovitch-Wilking [36], namely: Corollary 11.19. —Let d  1 be an integer. Then there is ε = ε(d)> 0 with the following property. Suppose that M is a d -dimensional complete Riemannian manifold with a Ricci curvature lower bound Ric  −(d − 1) and that  is a subgroup of Isom(M) which acts properly discontinuously by isometries on M.Thenfor every x ∈ M the “almost stabliser”  (x) := {γ ∈  : d(γ · x, x)<ε} is virtually nilpotent. 200 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO In fact the result in [36, Theorem 1] is a stronger version of Corollary 11.19, establishing that  (x) has a nilpotent subgroup of index O (1). This stronger result ε d seems to be beyond the scope of our method. We also note that Corollary 11.11 applies to the Margulis lemma in the context of Riemannian d -manifolds with a lower bound on sectional curvature, because then the Gromov short basis has bounded cardinality from Toponogov’s theorem (see for instance [9, 37.3]). We thus get this way an alternate proof of the Fukaya-Yamaguchi theorem [16] according to which almost non-negatively curved n-manifolds have O (1)-virtually poly- cyclic fundamental group. Again, by [35] we know better, namely that they are O (1)- virtually nilpotent, but once again this seems beyond the scope of our method. Finally we would like to remark that the usual proofs of the classical Margulis lemma bear some resemblance to the proof of our main theorem in as much as they use a similar “shrinking commutator trick” to establish nilpotence. While we proved this shrinking commutator estimate for the escape norm associated to an approximate group as part of the Gleason lemmas (Theorem 8.1), in the Margulis lemma, one proves a similar estimate for the norm γ  = d(γ · x, x) by a riemannian geometric argument using the assumed curvature bounds. This “shrinking commutator trick” dates back at least to Bieberbach [2] in his proof of Jordan’s theorem on finite linear groups. Acknowledgements EB is supported in part by the ERC starting grant 208091-GADA. He also ac- knowledges support from MSRI where part of this work was finalized. TT is supported by a grant from the MacArthur Foundation, by NSF grant DMS-0649473, and by the NSF Waterman award. The first author would like to thank E. Lindenstrauss from whom he first learned about these questions and for several related discussions. We also acknowledge the huge intellectual debt we owe to prior work of Hrushovski [33], without which we would not have started this project. We are grateful to him for several enlightening discussions re- garding his work and the subject matter of the present paper. We also thank I. Goldbring and L. van den Dries for showing us a preliminary version of their notes on Hilbert’s fifth problem and its local versions, B. Hayes for corrections, and J. Lott for help with the references. Finally, all three authors would like to thank T. Sanders for a number of valuable discussions concerning this work. Appendix A: Basic theory of ultralimits and ultraproducts In this appendix we review the machinery of ultralimits and ultraproducts. We will bor- row some terminology from nonstandard analysis in order to do this, although we will not rely too heavily on nonstandard machinery in this paper. THE STRUCTURE OF APPROXIMATE GROUPS 201 We will assume the existence of a standard universe U which contains all the objects and spaces that one is interested in (such as the natural numbers N,the real numbers R, the classical Lie groups, etc.). The precise construction of this universe is not particularly important for our purposes, so long as it forms a set. We refer to objects and spaces inside the standard universe as standard objects and standard spaces, with the latter being sets whose elements are in the former category. We will rely heavily on the existence of a nonprincipal ultrafilter. Lemma A.1 (Ultrafilter lemma). — There exists a collection α of subsets of the natural numbers N with the following properties: (i) (Monotonicity) If A ∈ α and B ⊇ A, then B ∈ α. (ii) (Closure under intersection) If A, B ∈ α, then A ∩ B ∈ α. (iii) (Maximality) If A ⊆ N, then either A ∈ α or N\A ∈ α, but not both. (iv) (Non-principality) If A ∈ α,and A is formed from A by adding or deleting finitely many elements to or from A, then A ∈ α. We refer to a collection α obeying the above axioms as a nonprincipal ultrafilter. Proof. — The collection of cofinite subsets of N already obeys the monotonicity, closure under intersection, and non-principality properties. Using Zorn’s lemma, one can enlarge this collection to a maximal collection which, it may be verified, has all the required properties. Throughout the paper, we fix a non-principal ultrafilter α.Aproperty P(n) de- pending on a natural number n is said to hold for n sufficiently close to α if the set of n for which P(n) holds lies in α. A set of natural numbers lying in α will also be called an α-large set. Once we have fixed this ultrafilter, we can define nonstandard objects and spaces. Definition A.2 (Nonstandard objects). — Given a sequence (x ) of standard objects in U, n n∈N we define their ultralimit lim x to be the equivalence class of all sequences (y ) of standard n→α n n n∈N objects in U such that x = y for n sufficiently close to α. Note that the ultralimit lim x can also n n n→α n be defined even if x is only defined for n sufficiently close to α. An ultralimit of standard natural numbers is known as a nonstandard natural number,an ultralimit of standard real numbers is known as a nonstandard real number,and so on. By using this lemma, our results thus rely on the axiom of choice, which we will of course assume throughout this paper. On the other hand, it is possible to rephrase the purely combinatorial results in this paper, such as Theorem 2.10, in the language of Peano arithmetic. Applying a famous theorem of Gödel [22], we then conclude that Theorem 2.10 is provable in ZFC if and only if it is provable in ZF. In fact it is possible, with significant effort, to directly translate these ultrafilter arguments to a much lengthier argument in which neither ultrafilters nor the axiom of choice are used. However, this would require one to “finitise” or “proof-mine” such infinitary results as the Heine-Borel theorem or Theorem B.18, and this in turn would require finitisations of the construction of Haar measure and the Peter-Weyl theorem. This would lead to a vastly messier argument. 202 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO For any standard object x, we identify x with its own ultralimit lim x. Thus, every standard n→α natural number is a nonstandard natural number, etc. Any operation or relation on standard objects can be extended to nonstandard objects in the obvious manner. Indeed, if O is a k-ary operation, we define 1 k 1 k O lim x ,..., lim x := lim O x ,..., x n n n n n→α n→α n→α 1 k 1 k and if R is a k-ary relation, we define R(lim x ,..., lim x ) to be true iff R(x ,..., x ) is n→α n→α n n n n true for all n sufficiently close to α. One easily verifies that these nonstandard extensions of O and R are well-defined. Example 23. — The sum of two nonstandard real numbers lim x ,lim y is n→α n n→α n the nonstandard real number lim x + lim y = lim x + y , n n n n n→α n→α n→α and the statement lim x < lim y means that x < y for all n sufficiently close n→α n n→α n n n to α. Definition A.3 (Ultraproducts). — Let (X ) be a sequence of standard spaces X in U n n∈N n indexed by the natural numbers. The ultraproduct X of the X is defined to be the space of n n n→α all ultralimits lim x , where x ∈ X for all n. We refer to the ultraproduct of standard sets as an n→α n n n nonstandard set; in a similar vein, an ultraproduct of standard groups is a nonstandard group, and an ultraproduct of standard finite sets is a nonstandard finite set. We refer to X := X n→α as the ultrapower of a standard set X; the identification of x with lim x causes X to be identified n→α ∗ ∗ withasubsetof X. We will refer to the ultrapower U of the standard universe U as the nonstandard universe. Remark A.4. — Nonstandard sets in nonstandard analysis behave analogously in some ways to measurable sets in measure theory; for instance, the union or intersection of two nonstandard sets is again a nonstandard set. Also, just as a subset of a measurable set need not be measurable, a subset of a nonstandard set need not be another nonstan- dard set. For instance, the nonstandard natural numbers N is a nonstandard set (being the ultraproduct of the sequence N, N,...), but the standard natural numbers N, despite being a subset of N, is not a nonstandard set. A fundamental property of ultralimits is that they preserve first-order statements and predicates, a fact known as Łos’s theorem. Here is one formalisation of this theorem. Actually, the notion of an elementary set (e.g. a finite union of intervals) would be an even closer analogy here than the notion of a measurable set. THE STRUCTURE OF APPROXIMATE GROUPS 203 Theorem A.5 (Łos’s theorem with parameters). — Let m be a standard natural number, and for each 1  i  m, let x = lim x be a nonstandard object. If P(y ,..., y ) is a predicate, then i n→α i,n 1 m P(x ,..., x ) is true (as quantified over the nonstandard universe U) if and only if P(x ,..., x ) 1 m 1,n m,n is true for all n sufficiently close to α (as quantified over the standard universe U). Proof. — (Sketch) By definition, Łos’s theorem is true for “primitive” predicates which take the form R(x ,..., x ) for some primitive k-ary relation R and objects 1 k x ,..., x ,orofthe form x = O(x ,..., x ) for some primitive k-ary operator O. From 1 k k+1 1 k the ultrafilter axioms, we also see that Łos’s theorem is closed with respect to boolean operations; for instance, if Theorem A.6 holds for P(x ,..., x ) and Q(x ,..., x ),then 1 m 1 m it also holds for ¬PorP ∧ Q. Now, we claim that if Łos’s theorem holds for the predicate P(x ,..., x ),then 1 m it also holds for the quantified predicates ∃x : P(x ,..., x ) and ∀x : P(x ,..., x ) m 1 m m 1 m (where now there are only m − 1free variables x ,..., x ,with x being bound). We 1 m−1 m show this just for the existential quantifier ∃, as the case of the universal quantifier ∀ is similar (and can be deduced from the existential case by negation). Suppose first that ∃x : P(x ,..., x ) is true in U. Then there exists x = lim x such that P(x ,..., x ) m 1 m m n→α m,n 1 m holds; by hypothesis, this implies that P(x ,..., x ) holds for n sufficiently close to 1,n m,n α,and thus ∃x : P(x ,..., x , x ) holds for n in U sufficiently close to α as de- m 1,n m−1,n m sired. Conversely, if ∃x : P(x ,..., x , x ) holds in U for n sufficiently close to α, m 1,n m−1,n m then by the axiom of (countable) choice, we may find x ∈ U for such n such that m,n P(x ,..., x , x ) holds. Setting x := lim x , we conclude that P(x ,..., x ) 1,n m−1,n m,n m n→α m,n 1 m holds, and the claim follows. The above discussion yields Łos’s theorem for any predicate that can be built out of primitive predicates by a finite number of boolean operations and quantifications. However, it is easy to see that all predicates are logically equivalent to a predicate of this form. For instance, ∀a∀b∀c : (a + b) + c = a + (b + c) is equivalent to ∀a∀b∀c∃d∃e∃f : (d = a + b) ∧ (e = b + c) ∧ (f = d + c) ∧ (f = a + e). This completes the proof. In applications, we will actually use a slight generalisation of Łos’s theorem. Theorem A.6 (Łos’s theorem with parameters and ultraproducts). — Let m, k be standard natural numbers. For each 1  i  m, let x = lim x be a nonstandard object, and for each 1  j  k, let i n→α i,n A = A be a nonstandard set. If P(y ,..., y ; B ,..., B ) is a predicate over m objects and j j,n 1 m 1 k n→α k sets, with the sets A ,..., A only appearing in P through the membership predicate x ∈ B for various 1 k j j and various objects B ,then P(x ,..., x ; A ,..., A ) is true (as quantified over the nonstandard j 1 m 1 k universe U) if and only if P(x ,..., x ; A ,..., A ) is true for all n sufficiently close to α (as 1,n m,n 1,n k,n quantified over the standard universe U). 204 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO Proof. — We replace each appearance of x ∈ B in P with a new primitive relation R (x, n), which is interpreted in U as x ∈ A . This replaces the pred- j j,n icate P(y ,..., y ; B ,..., B ) by a predicate Q(y ,..., y , n),with P(x ,..., x ; 1 m 1 k 1 m 1,n m,n A ,..., A ) logically equivalent to Q(x ,..., x , n). One easily verifies that 1,n k,n 1,n m,n P(x ,..., x ; A ,..., A ) is logically equivalent to Q(x ,..., x , lim n), and the claim 1 m 1 k 1 m n→α now follows from Theorem A.5. Example 24. — Any ultraproduct G := G of groups G is again a group, n n n→α because one can write the property of G being a group as a predicate P(G) that involves −1 membership in G (as well as the constant id and the group operations ·,() , of course). Conversely, if G = G is a group, then G is a group for all n sufficiently close to α. n n n→α Example 25. —Let G = G be an ultraproduct of groups (and thus also a n→α group), and let A = A and B = B be subsets of G that are nonstandard n n n→α n→α sets. Then, for n sufficiently close to α,A and B are subsets of G and B (because n n n n this statement can be written as a predicate involving membership in A , B , G ). In a n n n similar (but more complicated) spirit, for any standard K ∈ N, A can be covered by K left-translates of B if and only if, for n sufficiently close to α,A can be covered by K left-translates of B . A nonstandard real number x ∈ R is said to be bounded if one has |x|  Cfor some standard C > 0, and unbounded otherwise. Similarly, we say that x is infinitesimal if |x|  c for all standard c > 0; in the former case we write x = O(1), and in the latter x = o(1). For every bounded real number x ∈ R there is a unique standard real number st(x) ∈ R, called the standard part of R,suchthat x = st(x) + o(1), or equivalently that st(x) − ε x  st(x) + ε for all standard ε> 0. Indeed, one can set st(x) to be the supremum of all the real numbers y such that x > y (or equivalently, the infimum of all the real numbers y such that x < y). We write X = O(Y),X  Y, or Y Xif we haveX  CY for some standard C. Given a sequence f : X → Y of standard functions between standard sets n n n X , Y , one can form the ultralimit f := lim f , which is a function from the ultra- n n n→α n product X := X to the ultraproduct Y := Y defined by the formula n n n→α n→α f lim x := lim f (x ). n n n n→α n→α Such ultralimits will be called nonstandard functions (and are also known as internal functions in the nonstandard analysis literature). In particular, since standard finite sequences (a ) n=1 of standard reals a ∈ R with some standard length N ∈ N can be viewed as a function n → a from {1,..., N} to R, one can thus define nonstandard finite sequences (a ) of non- n n n=1 ∗ ∗ standard reals a ∈ R with some nonstandard length N ∈ N as an ultralimit of standard finite sequences (a ) ,thus N = lim N and n ,n n→α n n n =1 a = lim a . lim n n ,n n→α n n n→α THE STRUCTURE OF APPROXIMATE GROUPS 205 One can then transplant various operations on standard finite sequences to their non- standard counterparts, and can in particular define the sum a ∈ R n=1 N n of a nonstandard finite sequence (a ) = lim (a ) by the formula n n→α n ,n n=1 n n =1 N N a := lim a . n n ,n n→α n=1 n =1 Appendix B: Local groups In this appendix we recall the basic definitions and notations of (symmetric) local group theory, following Goldbring [24]. −1 Definition B.1 (Local group). — A symmetric local group G = (G, id, ·,() ) is a topo- logical space G with a distinguished element id ∈ G (the identity element), together with a globally −1 defined inversion map () : G → G and a partially defined product map ·:  → G, obeying the following axioms: (i) (Partial closure)  is an open neighbourhood of (G ×{1}) ∪ ({1}× G) in G × G. −1 −1 (ii) (Continuity) The maps () : x → x and ·: (x, y) → x · y are continuous on G and respectively. (iii) (Local associativity) If g, h, k ∈ G are such that (g · h) · kand g · (h · k) are well-defined (thus (g, h), (g · h, k), (h, k), (g, h · k) all lie in ), then (g · h) · k = g · (h · k). (iv) (Identity) For any g ∈ G,one has id · g = g · id = g. −1 −1 −1 (v) (Invertibility) If g ∈ G,then g · g and g · g are well-defined (i.e. (g, g ), −1 (g , g) ∈ ) and are equal to id. If necessary, we will write id, as id , to reduce confusion. If  = G × G, we call G a global G G group or a topological group. −1 If G has the structure of a smooth finite-dimensional real manifold, and the inversion map () and product map · are smooth maps, we say that G is a local Lie group. Remark B.2. — One can also consider non-symmetric local groups, in which the −1 inversion map () is only defined on an open neighbourhood  of the identity. However, the theory of non-symmetric local groups contains some minor additional technicalities caused by the existence of non-invertible elements which we wish to avoid here. As we will not consider non-symmetric local groups anywhere in this paper, we will often omit the adjective “symmetric” from the term “local group” when there is no chance of confusion. 206 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO Following [24], we do not explicitly assume that G is Hausdorff. In practice, though, one can reduce to the Hausdorff case because the closure of the identity ele- ment will turn out to be a closed normal subgroup that one can quotient out by. Example 26. — If G is a symmetric local group and U is a symmetric open −1 neighbourhood of the identity (thus g ∈ U whenever g ∈ U), then U can also be viewed as a symmetric local group, by restricting the domain  of the product maps to {(g, h) ∈  ∩ (U × U) : g · h ∈ U} (and also restricting the topological structure of G to U). We will sometimes write this symmetric local group as G to emphasise that it is the restriction of G to U. In particular, an important source of local groups comes from restricting a global group to an open symmetric neighbourhood of the identity. One can also restrict G to non-open symmetric neighbourhoods of the identity, but the resulting object obtained is not necessarily a symmetric local group (see e.g. Example 28 below). We say that two symmetric local groups G, G are locally identical if they have a common restriction, thus there exists a U which is an open symmetric neighbourhood of the identity 1 = 1  in both G and G for which the group operations on G and G ,when G G restricted to U, agree completely (in particular, they have the same domain and range). This is an equivalence relation, and we will focus on those properties of symmetric local groups that are preserved up to local identity. In a similar spirit, we say that two subsets A, B of a symmetric local group in G are locally identical if there exists an open neighbourhood U of the identity in G such that A ∩ U = B∩ U. For instance, all neighbourhoods of the identity are locally identical. Note that every open neighbourhood if the identity contains an open symmetric neighbourhood, so we can assume here that U is symmetric without loss of generality. Remark B.3. — Symmetric local groups are defined as topological groups, but if one wishes, one can restrict attention to discrete symmetric local groups, in which every set is open. In this case, all references to continuity, openness, and the Hausdorff property in Definition B.1 can be omitted as being automatically satisfied. On the other hand, all discrete local groups are locally equivalent to the trivial local group {id}. Example 27. —If g is a (finite-dimensional) Lie algebra, and B is a sufficiently small symmetric open neighbourhood of the identity in g,then exp(B) is a symmetric local group, with the multiplication law given by the Baker-Campbell-Hausdorff formula. Example 28. — The closed interval [−1, 1] in R with the addition operation is not a symmetric local group, because the set {(x, y) ∈[−1, 1]×[−1, 1]: x + y ∈[−1, 1]} is not open in [−1, 1]×[−1, 1].However,the open interval (−1, 1) is a symmetric local group. THE STRUCTURE OF APPROXIMATE GROUPS 207 Given any finite number of elements g ,..., g in a global group G, one can use 1 m the associativity axiom to unambiguously define the product g ... g . In a symmetric local 1 m group, one can only define this product g ... g locally. We formalise this as a definition: 1 m Definition B.4 (Finite products). — Let g ,..., g be a finite number of elements in a symmetric 1 m local group G. We say that the product g ... g is well-defined in G (or well-defined for short) if, 1 m for each 1  i  j  m, we can find a group element g ∈ G with the following properties: [i,j] • For each 1  i  m, we have g = g . [i,i] i • If 1  i  j < k  m, the product g · g is well-defined (i.e. (g , g ) ∈ )and [i,j] [j+1,k] [i,j] [j+1,k] equal to g . [i,k] By induction we see that if these group elements g exist, then they are unique. We then define [i,j] g ... g := g .If g = ··· = g = g, we abbreviate g ... g as g . By abuse of notation, we also 1 k [1,k] 1 k 1 k write g ... g ∈ G to denote the assertion that g ... g is defined in G. 1 m 1 m We adopt the convention that g ... g = id when m = 0. 1 m An easy induction using the local associativity axiom shows that if g ,..., g ∈ Gis 1 m such that g ... g is well-defined whenever 1  i < j  m with (i, j) = (1, m),and (g ... g ) · i j i j (g ... g ) is well-defined whenever 1  i  j < k  m,then g ... g is well-defined, and j+1 k 1 m we have (g ... g ) = (g ... g ) · (g ... g ) i k i j j+1 k for all 1  i  j < k  m. Remark B.5. — It is worth pointing out one subtlety here: in order for g ... g to be 1 m well-defined, it is necessary that all possible ways of decomposing this m-fold product into pairwise products be well-defined. For instance, for g g g to be well-defined, both (g · 1 2 3 1 g ) · g and g · (g · g ) need to be well-defined. Similarly, if g , g , g , g are such that g g g , 2 3 1 2 3 1 2 3 4 1 2 3 (g g g ) · g , g g g ,and g · (g g g ) are well-defined, this is not yet sufficient to deduce that 1 2 3 4 2 3 4 1 2 3 4 g g g g is well-defined, because (g g ) · (g g ) need not be well-defined. For instance, in 1 2 3 4 1 2 3 4 the (additive) local group {−1, 0, +1}, the expression (+1) + (−1) + (−1) + (+1) is not well-defined, because (−1) + (−1) is not well-defined. Related to this is the well-known fact that local associativity does not imply global associativity: it is possible for two different ways of decomposing an m-fold product into pairwise products to both exist, but give distinct values; see [41] for further discus- sion. For instance, there exists a local group G and elements g , g , g , g ∈ Gsuch that 1 2 3 4 ((g · g ) · g ) · g and g · (g · (g · g )) both exist, but are not equal to one another. Of 1 2 3 4 1 2 3 4 course, in this case, we do not consider g g g g to be well-defined. 1 2 3 4 Another easy induction also shows that for each m  1, the set of tuples m m (g ,..., g ) ∈ G for which g ... g is well-defined is an open subset of G . 1 m 1 m 208 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO Now we extend the notion of products and inverses from individual group elements to sets of such elements. Definition B.6. —Let G be a symmetric local group. A subset A of G is said to be sym- −1 −1 metric if the set A := {g : g ∈ A} is contained in A.If A ,..., A are subsets of G,wesay 1 m that A ... A is well-defined in G (or well-defined for short) if g ... g is well-defined for all 1 m 1 m g ∈ A ,..., g ∈ A , in which case we write A ... A := {g ... g : g ∈ A ,..., g ∈ A }.If 1 1 m m 1 m 1 m 1 1 m m A = ··· = A = A,weabbreviate A ... A as A . By abuse of notation, we write A ... A ⊂ G 1 m 1 m 1 m for the assertion that A ... A is well-defined in G. We adopt the convention that A ... A ={id} 1 m 1 m when m = 0. In particular, A ={id} for any A ⊂ G. An easy induction (see [24, Lemma 2.5]) shows that for any local group G and any open neighbourhood U of the identity, there exists a nested sequence U ⊃ U ⊃ U ⊃ 0 0 1 2 ··· of symmetric open neighbourhoods of the identity such that U ⊂ U for every m+1 m  0, which in particular implies that U is well-defined in U ,and thus A ... A is 0 1 m well-defined in U whenever A ,..., A ⊂ U . 0 1 m m We make the trivial remark that multiplication of sets is associative: if A ... A 1 m is well-defined, then for any 1  i  j < k  m, (A · A ) · (A ... A ) and A ... A are i j j+1 k i k well-defined and equal to each other. By passing to neighbourhoods such as U , one can improve the group-like proper- ties of a local group. To illustrate this principle, let us first introduce the following defini- tion. Definition B.7 (Cancellative local groups). — A symmetric local group G is said to be can- cellative if the following assertions hold: (i) Whenever g, h, k ∈ G are such that gh and gk are well-defined and equal to each other, then −1 −1 h = k. (Note that this implies in particular that (g ) = g.) (ii) Whenever g, h, k ∈ G are such that hg and kg are well-defined and equal to each other, then h = k. −1 −1 −1 −1 −1 (iii) Whenever g, h ∈ G are such that gh and h g are well-defined, then (gh) = h g . (In particular, if U ⊂ G is symmetric and U is well-defined in G for some m  1, then U is also symmetric.) Clearly all global groups are cancellative. A local group need not be cancellative everywhere; however, we can restrict to a large subset on which it is cancellative, by using the following proposition. Proposition B.8. —Let G be a symmetric local group, and let U be an open symmetric neigh- bourhood of the identity in G such that U is well-defined. Then the restriction of G to U is cancellative. In particular, the restriction of G to the open symmetric neighbourhood U dis- cussed earlier is cancellative. We shall see later that the property of being cancellative is THE STRUCTURE OF APPROXIMATE GROUPS 209 hereditary in that it is inherited by passing to subgroups and quotients, and because of this we will be able to easily restrict attention to the cancellative case in our arguments. −1 −1 −1 Proof.—If g, h ∈ U, then (gh) ghh g is well-defined in G. By evaluating this well-defined expression in two different ways we conclude property (iii). In a similar spirit, −1 −1 by evaluating g gh and g gk for g, h, k ∈ U in two different ways, we obtain (i); and similarly for (ii). Lemma B.9. —Let G be a symmetric local group, and let U, V be open sets with id ∈ V. Then U ⊂ U · V if U · V is well-defined, and similarly U ⊂ V · U if V · U is well-defined. Proof. — We prove the first claim only, as the second is similar. Suppose that g is an adherent point of U. By continuity, we can find an open neighbourhood W of g and −1 −1 an open neighbourhood Y of the identity such that g · g · W · Y is well-defined and −1 −1 Y ⊂ V. By continuity, the set {h ∈ W : g h ∈ Y} is an open neighbourhood of g,and −1 −1 −1 thus contains an element h of U. Writing v := g h and expanding out g · g · h · v in −1 two different ways, we conclude that g = hv ,and thus g ∈ U · V as required. We can give the class of local groups the structure of a category by defining the notion of a (continuous) homomorphism. Definition B.10 (Homomorphisms). — Let G, H, K be symmetric local groups. A continuous homomorphism φ : G → H is a continuous map from G to H with the following properties: (i) φ maps the identity of G to the identity of H: φ(id ) = id . G H −1 −1 (ii) For every g ∈ G, we have φ(g) = φ(g ). (iii) If g, h ∈ G are such that g · h is well-defined, then φ(g) · φ(h) is well-defined and is equal to φ(g · h). We will often omit the adjective “continuous” when G is discrete. A local homomorphism from G to H is a continuous homomorphism φ : U → H from a symmetric open neighbourhood U of the identity of G to H,where of course we give U the structure of the restricted local group G from Example 26. Two local homomorphisms φ : U → H, φ : U → H are equivalent if there exists a neighbourhood V of the identity contained in both U and U such that φ and φ agree on V; this is an equivalence relation. A local morphism is an equivalence class of local homomorphisms. Given two local homomorphisms φ : U → H and ψ : V → K from G to H and H to K respectively, we define the composition map ψ ◦ φ : U → K by ψ ◦ φ(g) := ψ(φ(g)),where U := {g ∈ U : φ(u) ∈ V}. This allows one to define a composition of two local morphisms in the obvious manner. Example 29. — There are no non-trivial global morphisms from the unit circle R/Z to R. However, there do exist non-trivial local morphisms, such as (the equivalence 210 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO class of) the map φ from (−1/4, 1/4) mod 1 to R defined by setting φ(x mod 1) := x for all x ∈ (−1/4, 1/4). The concept of a local homomorphism is closely related to that of a Freiman homomorphism in additive combinatorics, as discussed for example in [54]. One easily verifies that continuous homomorphisms and local morphisms both obey the axioms of a category; in particular, the composition of two continuous homo- morphisms is a continuous homomorphism, and the composition of two local morphisms is again a local morphism. As usual in category theory, we can now say that two local groups G, G are locally isomorphic if there exists a local morphism φ from G to G with an inverse φ from G to G which is also a local morphism, such that the compositions φ ◦ φ or φ ◦ φ are equivalent to the identity. Thus, for instance, the unit circle R/Z and the line R are locally isomorphic. This notion of local isomorphism generalises the notion of local identity from Remark 26. Definition B.11 (Sub-local groups [24]). — Given two symmetric local groups G and G,we say that G is a sub-local group of G if G is the restriction of G to a symmetric neighbourhood of the identity, and there exists an open neighbourhood V of G with the property that whenever g, h ∈ G are such that gh is defined in V,then gh ∈ G ; we refer to V as an associated neighbourhood for G .If G is also a global group, we say that G is a subgroup of G. If G is a sub-local group of G, we say that G is normal if there exists an associated neigh- −1 bourhood V for G with the additional property that whenever g ∈ G , h ∈ V are such that hg h is −1 well-defined and lies in V, then hg h ∈ G . We call V a normalising neighbourhood of G . Example 30. —If G, G are the (additive) local groups G := {−2, −1, 0, +1, +2} and G := {−1, 0, +1},then G is a sub-local group of G (with associated neighbourhood V = G ). Note that this is despite G not being closed with respect to addition in G; thus we see why it is necessary to allow the associated neighbourhood V to be strictly smaller than G. In a similar vein, the open interval (−1, 1) is a sub-local group of (−2, 2). The interval (−1, 1) ×{0} is also a sub-local group of R ; here, one can take for instance (−1, 1) as the associated neighbourhood. As all these examples are abelian, they are clearly normal. Example 31. —Let T : V → V be a linear transformation on a finite-dimensional vector space V, and let G := Z  V be the associated semi-direct product. Let G := {0}× W, where W is a subspace of V that is not preserved by T. Then G is not a normal subgroup of G, but it is a normal sub-local group of G, where one can take {0}× Vas a normalising neighbourhood of G . Observe that any sub-local group of a cancellative local group is again a cancella- tive local group. One also easily verifies that if φ : U → H is a local homomorphism from G to H for some open neighbourhood U of the identity in G, then ker(φ) is a normal sub-local THE STRUCTURE OF APPROXIMATE GROUPS 211 group of U, and hence of G. Note that the kernel of a local morphism is well-defined up to local identity. If H is Hausdorff, then the kernel ker(φ) will also be closed. Conversely, normal sub-local groups give rise to local homomorphisms into quo- tient spaces. Lemma B.12 (Quotient spaces [24]). — Let G be a cancellative local group, and let H be a normal sub-local group with normalising neighbourhood V.Let W be a symmetric open neighbourhood of the identity such that W ⊂ V. Then there exists a cancellative local group W/H andasurjective continuous homomorphism φ : W → W/H such that, for any g, h ∈ W, one has φ(g) = φ(h) if and −1 −1 only if gh ∈ H, and for any E ⊂ W/H, one has E open if and only if φ (E) is open. −1 Proof. — We define an equivalence relation on W by declaring g ∼ h if gh ∈ H. Using the cancellative properties of V (and hence of W ) we see that this is indeed an equivalence relation. We let W/H := {[g] : g ∈ W} be the set of equivalence classes [g] := {h ∈ W : g ∼ h}, with the obvious projection map π : W → W/H. We define −1 −1 an inversion relation on W/H by setting [g] := [g ] , and a product operation by setting [g] [h] to equal [g h ] if g h ∈ W for at least one representative g , h of [g] , [h] ∼ ∼ ∼ ∼ ∼ respectively. We now verify that these relations are well-defined. To make the inversion relation −1 −1 well-defined, we need to verify that if g ∼ h,then g ∼ h . But from the cancellative 6 −1 −1 −1 −1 −1 −1 6 properties of W ,wehave g (h ) = g (gh ) g, and the claim follows as W is a normalising neighbourhood for H. Similarly, to make the multiplication relation well- defined, we need to verify that if g, g , h, h are such that g ∼ g , h ∼ h ,and gh, g h ∈ W, −1  −1   −1  −1 6 then gh ∼ g h .But (gh)(g h ) = (g(g ) )g (h(h ) )(g ) , and the claim follows as W is a normalising neighbourhood for H. Similar arguments (which we omit) show that W/H obeys the identity, inverse, and local associativity axioms. Next, we give W/H the quotient topology, declaring a set E in W/Hopen iff its −1 inverse image π (E) is open in W (or equivalently, in G). One easily verifies that W/H becomes a symmetric local group, and the claim follows. Example 32. — Let G be the additive local group G := (−2, 2) , and let H be the sub-local group H := {0}× (−1, 1), with normalising neighbourhood V := (−1, 1) . If we then set W := (−0.1, 0.1) , then the hypotheses of Lemma B.12 are obeyed, and W/H can be identified with (−0.1, 0.1), with the projection map φ : (x, y) → x. Example 33. — Let G be the torus (R/Z) , and let H be the sub-local group H ={(x,αx) mod Z : x ∈ (−0.1, 0.1)},where 0 <α < 1 is an irrational number, with 2 2 2 2 normalising neighbourhood (−0.1, 0.1) mod Z .Set W := (−0.01, 0.01) mod Z . Then the hypotheses of Lemma B.12 are again obeyed, and W/Hcan be identified with the interval I := (−0.01(1 + α), 0.01(1 + α)), with the projection map φ : (x, y) 2 2 mod Z → y − αx for (x, y) ∈ (−0.01, 0.01) . Note, in contrast, that if one quotiented Gby the global group H ={(x,αx) mod Z : x ∈ R} generated by H, the quotient 212 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO would be a non-Hausdorff space (and would also contain a dense set of torsion points, in contrast to the interval I which is “locally torsion free”). It is because of this patho- logical behaviour of quotienting by global groups that we need to work with local group quotients instead. Remark B.13. — As we have seen in the above discussion, many familiar concepts in (global) group theory have analogues in the local group setting. We will however men- tion one important global group-theoretic concept that does not have a convenient local analogue, and that is the notion of the global group A generated by a set A of genera- tors. The problem is that this global group A consists of words in A of arbitrarily length, whereas in a local group one can typically only multiply together a bounded number m −1 m of elements of A. However, sets such as A or (A ∪ A ∪{id}) for various choices of exponent m can sometimes serve as a partial substitute for this concept in local group theory, though one of course has to keep track of the precise value of m throughout the argument. Locally compact local groups. — Recall that a topological space X is said to be locally compact if and every point in X has a compact neighbourhood. In particular, one can speak of a locally compact symmetric local group. To verify local compactness of a symmetric local group, it suffices to do so at the identity. Lemma B.14. —Let G be a symmetric local group. Then G is locally compact if and only if there is a compact symmetric neighbourhood of the identity. Proof.—[24, Lemma 2.16] The “only if ” part is clear (since id already has a compact neighbourhood). Now we turn to the “if ” part. Let K be a compact symmetric neighbourhood of the identity. By continuity, there exists an open neighbourhood V of g −1 −1 −1 −1 −1 such that g · V · V · g is well-defined and g · V · V · g ⊂ K. In particular, h → g h −1 −1 −1 is a homeomorphism from V · V · g to g · V · V · g which is inverted by the map −1 k → gk. By Lemma B.9, we conclude that h → g h is also a homeomorphism from Vto −1 −1 −1 g · V = g · V. In particular, since g · V is a closed subset of K, it is compact, and so Vis compact also. Thus g has a precompact neighbourhood as required. Corollary B.15. —If G is a locally compact symmetric local group, and U is a symmetric open neighbourhood of the identity, then U is also a locally compact local group. Proof. — By Lemma B.14, G contains a symmetric precompact open neighbour- hood V of the identity. By continuity, one can find a symmetric open neighbourhood W of the identity such that W · W is well-defined in V ∩ U. By Lemma B.9, we conclude that the closure W in U is thesameasthe closureof W in G; as it is contained in the THE STRUCTURE OF APPROXIMATE GROUPS 213 precompact set V, it is thus precompact. The claim then follows from another application of Lemma B.14. An important subclass of the locally compact local groups are the (symmetric) local Lie groups, defined as those (symmetric) local groups which are also smooth finite-dimensional real manifolds, such that the group operations are smooth on their domain of definition. We have the following basic theorem. Theorem B.16 (Lie’s third theorem). — Every local Lie group is locally isomorphic to a global Lie group. Furthermore, one can take the global Lie group to be both connected and simply connected. See e.g. [50] for a proof. We have the following deep structure theorem for locally compact global groups, due to Gleason and Yamabe [61]. Theorem B.17 (Gleason-Yamabe). — Suppose that G is a locally compact global group. Then thereisanopensubgroup G of G with the following property: inside any neighbourhood of the identity U ⊆ G , there is a compact normal subgroup H such that G /H is isomorphic to a connected global Lie group. The analogous theorem for locally compact local groups was established more recently by Goldbring. Theorem B.18 (Goldbring). — Suppose that G is a locally compact local group. Then some restriction G of G to a symmetric neighbourhood of the identity has the following property. Inside any neighbourhood of the identity U ⊆ G , there is a compact normal subgroup H such that G /H is isomorphic to a local Lie group. Proof. — The only self-contained proof of Theorem B.18 in the literature is in the thesis [23], where it follows from a combination of Section 4.5 and [23,Proposi- tion 4.7.1]. A more easily accessible account of essentially the same material follows by combining [56, Proposition 4.1] (reduction to the NSS case) with [24, §8] (treatment of the NSS case). Alternatively (though ultimately more circuitously) one may apply the main result of [56], which shows that G has a restriction in common with a global locally compact group, followed by Theorem B.17. For our applications, we only need to apply Theorem B.18 when G is metrisable, although the general case can be deduced from the metrisable case without much effort. Appendix C: Nilprogressions and related objects In this appendix we prove two basic facts about coset nilprogressions in normal form, namely that after shrinking the length parameter slightly they are approximate groups, 214 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO and are globalisable: that is to say isomorphic to subsets of a global group. The proofs of these facts are quite short due to the strength of the normal form axioms. One can establish similar assertions without the normal form hypothesis, but the arguments are much more complicated in that they require one to work with an explicit basis for the free nilpotent group. They are not needed in this paper. Lemma C.1. —Let P = P (u ,..., u ; N ,..., N ) be a coset nilprogression in C-normal H 1 r 1 r form. Then for all ε> 0 that are sufficiently small depending on r, C, one has (C.1) (1 + N )...(1 + N )|H| P (u ,..., u ; εN ,...,εN ) 1 r ε,C,r H 1 r 1 r (1 + N )...(1 + N )|H| C 1 r and hence, by the volume bounds on P, P (u ,..., u ; εN ,...,εN ) |P|. H 1 r 1 r ε,C,r Furthermore, P (u ,..., u ; εN ,...,εN ) is a O (1)-approximate group. H 1 r 1 r ε,C,r Proof. — By quotienting out the finite group H, which is normalised by P (u ,..., H 1 u ; εN ,...,εN ) (say) if ε is small enough, we may assume that H is trivial. The upper r 1 r bound in (C.1) is then immediate from the upper bound in (2.2), while the lower bound follows from the local properness axiom in Definition 2.6. From (C.1) and the Ruzsa covering lemma we see that for ε small enough, P (u ,..., u ; 2εN ,..., 2εN ) H 1 r 1 r is covered by O (1) translates of P (u ,..., u ; εN ,...,εN ), and so the final claim ε,C,r H 1 r 1 r follows from Lemma 5.1. Remark C.2. —Itisinfactpossibletoshowthat |P (u ,..., u ; εN ,...,εN )| H 1 r 1 r decays at a polynomial rate in ε,and that P (u ,..., u ; εN ,...,εN ) is a O (1)- H 1 r 1 r C,r approximate group uniformly in ε, but we will not need these stronger conclusions here. Lemma C.3. —Let P = P (u ,..., u ; N ,..., N ) be a coset nilprogression in C-normal H 1 r 1 r form. Then for all ε> 0 that are sufficiently small depending on r, C, the set P (u ,..., u ; H 1 r εN ,...,εN ) is isomorphic to a subset of a global group G. 1 r From this lemma (and Lemma C.1) we see that Theorem 2.13 follows immediately from Theorem 2.10. Proof. — We first establish the claim under the additional hypothesis that the N ,..., N are sufficiently large depending on r, C; we will remove this hypothesis at 1 r the end of the argument. THE STRUCTURE OF APPROXIMATE GROUPS 215 Let v ,...,v be lifts of the generators u ,..., u of P/H to P. By Definition 2.6 1 r 1 r and the normality of H, one has N N j+1 r (C.2) [v ,v ]∈ P v ,...,v ; O ,..., O H i j j+1 r C C N N N N i j i j for all 1  i < j  r; note that the hypothesis that the N are large ensure that the right- hand side is well-defined in P. j+1 N Consider a word in P(v ,...,v ; O ( ), ..., O ( )), which therefore con- j+1 r C C N N N N i j i j N N j+1 ±1 j+2 ±1 tains O ( ) copies of v ,O ( ) copies of v , and so forth. Let us the leftmost C C j+1 j+2 N N N N i j i j ±1 ±1 copy of v and move it all the way to the left. Each time it passes through a v for some j+1 k N ±1 j + 1 < k  r,weuse (C.2)and create O ( ) new copies of v for each l > k, plus N N j+1 k an element of H which can be pushed all the way to the right using the normality of H. N ±1 Thus, if one initially had a copies of v for each j + 1 < k  r before one started N N i j ±1 moving the leftmost v to the left, then by the end of the move, one would have j+1 N a N N l k k l (C.3) a + O l C N N N N N N i j i j j+1 k j+1<k<l ±1 copies of v for each j + 1 < l  r. We may simplify the expression (C.3)as 1 N a + O a . l C k N N N j+1 i j j+1<k<l Thus we have effectively replaced the sequence (a ) by the sequence k j+1<kr a + O a . l C k j+1 j+1<lr j+1<k<l j+1 We iterate this process O ( ) = O (N ) times, and note that the a were initially of C C j+1 k N N i j size O (1), and end up at a sequence, all of whose entries are of size O (1).Inother C C,r ±1 words, after moving all copies of v to the left, and all copies of H to the right, we end j+1 N ±1 up with O ( ) copies of v in the middle for each j + 1 < k  r. We conclude that C,r N N i j n j+2 r i,j,j+1 [v ,v ]∈ v P v ,...,v ; O ,..., O H i j j+2 r C,r C,r j+1 N N N N i j i j j+1 for some n = O ( ); note that as long as the N are large enough, all words that i,j,j+1 C i N N i j appear in this reorganisation will lie inside P and so the algebraic manipulations can be 216 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO justified. Iterating this procedure r − j times (which will be justified if the N are large enough) we see that i,j,j+1 i,j,r (C.4) [v ,v ]= v ...v h i j i,j j+1 r for some n = O ( ) and h ∈ H. Also, one has i,j,k C,r i,j N N i j −1 (C.5) v hv = φ (h) i i for some (outer) automorphism φ : H → HofH. Now let G be the global group generated by H and formal generators e ,..., e , 1 r subject to the relations i,j,j+1 i,j,r (C.6) [e , e ]= e ... e h i j i,j j+1 r and −1 (C.7) e he = φ (h) i i for 1  i < j  r.Weclaim that for ε small enough, there is an injective homomorphism from P (u ,..., u ; εN ,...,εN ) to G, which will give the claim. H 1 r 1 r To see this, first observe from the normality of H that P (u ,..., u ; εN ,...,εN ) = P(v ,...,v ; εN ,...,εN )H. H 1 r 1 r 1 r 1 r Organising the words in P(v ,...,v ; εN ,...,εN ) by moving all occurrences of v to 1 r 1 r 1 the left (using (C.4)) and all occurrences of H to the right (using the normality of H) we then have (C.8)P (u ,..., u ; εN ,...,εN ) H 1 r 1 r ⊆ P(v ; εN )P v ,...,v ; O (εN ), ..., O (εN ) H 1 1 2 r C,r 2 C,r r assuming ε is small enough in order to justify all the algebraic manipulations. Iterating this we see that (C.9)P (u ,..., u ; εN ,...,εN ) ⊆ P v ; O (εN ) ... P v ; O (εN ) H. H 1 r 1 r 1 C,r 1 r C,r r Thus it suffices to establish an injective homomorphism φ from the set n n (C.10) v ...v h : n = O (εN ); h ∈ H i C,r i 1 r to G. From the local properness property in Definition 2.6, all the products in (C.10)are distinct if ε is small enough. We may thus define φ by the formula n n n n 1 r 1 r φ v ...v h := e ... e h. 1 r 1 r THE STRUCTURE OF APPROXIMATE GROUPS 217 Next, we show that φ is injective. Indeed, suppose that there exist n , n = O (εN ) i C,r i and h, h ∈ Hwith n n n 1 r 1 φ v ...v h = φ v ...v h 1 r 1 r and thus n n 1 r 1 e ... e h = e ... e h . 1 r 1 r By the universal properties of G, there is a homomorphism from G to Z that maps e to 1 and annihilates the other e and H. This implies that n = n . We can then eliminate i 1 n , n and work with the subgroup G of G generated by e ,..., e and H. From abstract 1 2 2 r nonsense we see that G is universal with respect to the constraints (C.6), (C.7)for i  2, and that G is the semidirect product of G with Z using the conjugation action of e on 2 1 G defined using (C.6), (C.7)for i = 1. In particular, there is a homomorphism from G to 2 2 Z that maps e to 1 and annihilates the e and H for i > 2. This gives n = n . Continuing 2 i 2 in this fashion we see that n = n for all i and hence h = h , which establishes injectivity. Finally, we need to show that φ is a homomorphism. It suffices to show that if n , n , n = O (εN ) and h, h , h ∈ Hare such that i C,r i i i n  n n n n  n 1 1 1 r r r (C.11) v ...v hv ...v h = v ...v h 1 r 1 r 1 r then n n n n n  n 1 r 1 1 r r (C.12) e ... e he ... e h = e ... e h . 1 r 1 r 1 r To see this, we rearrange the word on the left-hand side of (C.11)bymoving alloccur- rences of v to the left, and all occurrences of elements of H to the right, using (C.4) and (C.5); if ε is small enough, then all manipulations take place inside P and can thus be justified. Iterating this process, we must eventually be able to express this word in the n˜ 1 n˜ ˜ ˜ form v ...v h for some n˜ = O (εN ) and h ∈ H. By injectivity, we then have n˜ = n i C,r i i 1 r i and h = h . But then if one formally replaces all the v by e and uses (C.6), (C.7) in place i i of (C.4), (C.5) in the rearrangement procedure just described, we conclude (C.12), and the claim follows. Now we remove the hypothesis that the N ,..., N are sufficiently large depending 1 r + + on r, C. Let F : R → R be a function depending on r, C to be chosen later. By the pigeonhole principle, we can find a threshold M ≥ 1withM = O (1) such that every length N is either less than M, or larger than F(M).Ifwelet 1  i < ··· < i  r i 1 r be those indices i with N > F(M), then we see (if F is sufficiently rapidly growing) j i that P (u ,..., u ; N ,..., N ) will be a coset nilprogression in O (1)-normal form. H i i i i C,r,M 1 r 1 For F sufficiently rapidly growing, the preceding argument then applies to conclude that P (u ,..., u ; εN ,...,εN ) is isomorphic to a subset of a global group if ε is small H i i i i 1  1 r enough depending on C, r, M, and the claim follows.  218 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO Remark C.4. —From(C.9) we see that every element in P (u ,..., u ; H 1 r εN ,...,εN ) takes the form 1 r 1 r (C.13) v ...v h 1 r for some integers a ,..., a with a = O (εN ) and h ∈ H. Conversely, it is clear that i r i C,r i if |a |  εN then all expressions of the form (C.13) lie in P (u ,..., u ; εN ,...,εN ). i i H 1 r 1 r Informally, we thus see that the nilprogression P (u ,..., u ; εN ,...,εN ) is comparable H 1 r 1 r in some sense to the nilbox 1 a v ...v h :|a |  εN ; h ∈ H . i i We will however not exploit this description of nilprogressions in this paper. A variant of the above analysis also gives polynomial growth of progressions in C-normal form in the global case. Proposition C.5 (Polynomial growth). — Let P = P (u ,..., u ; N ,..., N ) be a coset H 1 r 1 r m O (1) C,r nilprogression in C-normal form in a global group. Then for all m  1, one has |P | m |P|. C,r Proof. — We allow all implied constants to depend on C, r. As H is normalised by P, we may quotient out by H and reduce to the case when H is trivial. Then P ⊆ P(u ,..., u ; mN ,..., mN ) 1 r 1 r and so it suffices (by the volume bound (2.2)) to show that O(1) P(u ,..., u ; mN ,..., mN )  m (N + 1)...(N + 1). 1 r 1 r 1 r By modifying the proof of (C.8), one easily verifies that P(u ,..., u ; mN ,..., mN ) 1 r 1 r 2 2 ⊆ P(v ; mN )P v ,...,v ; O m N ,..., O m N ; 1 1 2 r 2 r iterating this, one sees that O(1) O(1) P(u ,..., u ; mN ,..., mN ) ⊆ P v ; O m N ... P v ; O m N , 1 r 1 r 1 1 r r and the claim follows.  THE STRUCTURE OF APPROXIMATE GROUPS 219 REFERENCES 1. I. BENJAMINI and G. KOZMA, A resistance bound via an isoperimetric inequality, Combinatorica, 25 (2005), 645–650. 2. L. BIEBERBACH, Über einen Satz des Herrn C. Jordan in der Theorie der endlichen Gruppen linearer Substitutionen,Sitzber.Preuss. Akad. Wiss, Berlin, 1911. 3. Y. BILU, Addition of sets of integers of positive density, J. Number Theory, 64 (1997), 233–275. 4. Y. BILU, Structure of sets with small sumset, Astérisque, 258 (1999), 77–108. Structure theory of set addition. 5. E. BREUILLARD and B. GREEN, Approximate groups. I: the torsion-free nilpotent case, J. Inst. Math. Jussieu, 10 (2011), 37–57. 6. E. BREUILLARD and B. GREEN, Approximate groups. II: the solvable linear case, Q. J. of Math., Oxf., 62 (2011), 513–521. 7. E. BREUILLARD and B. GREEN, Approximate groups. III: the unitary case, Turk.J.Math., 36 (2012), 199–215. 8. E. BREUILLARD,B.GREEN,and T. TAO, Approximate subgroups of linear groups, Geom. Funct. Anal., 21 (2011), 774– 9. Y. D. BURAGO andV.A.ZALGALLER, Geometric Inequalities, Grundlehren der Mathematischen Wissenschaften [Funda- mental Principles of Mathematical Sciences], vol. 285, Springer, Berlin, 1988. Translated from the Russian by A. B. Sosinski˘ ı, Springer Series in Soviet Mathematics. 10. M.-C. CHANG, A polynomial bound in Freiman’s theorem, Duke Math. J., 113 (2002), 399–419. 11. J. CHEEGER and T. H. COLDING, Lower bounds on Ricci curvature and the almost rigidity of warped products, Ann. Math., 144 (1996), 189–237. 12. L. J. CORWIN and F. GREENLEAF, Representations of Nilpotent Lie Groups and Their Applications, CUP, Cambridge, 1990. 13. E. CROOT and O. SISASK, A probabilistic technique for finding almost-periods of convolutions, Geom. Funct. Anal., 20 (2010), 1367–1396. 14. D. FISHER,N.H.KATZ,and I. PENG, Approximate multiplicative groups in nilpotent Lie groups, Proc. Am. Math. Soc., 138 (2010), 1575–1580. 15. G. A. FREIMAN, Foundations of a Structural Theory of Set Addition, American Mathematical Society, Providence, 1973. Translated from the Russian, Translations of Mathematical Monographs, vol. 37. 16. K. FUKAYA and T. YAMAGUCHI, The fundamental groups of almost non-negatively curved manifolds, Ann. Math., 136 (1992), 253–333. 17. S. GALLOT,D.HULIN,and J. LAFONTAINE, Riemannian Geometry, Universitext, Springer, Berlin, 1987. 18. N. GILL and H. HELFGOTT, Growth in solvable subgroups of GL (Z/pZ), preprint (2010), arXiv:1008.5264. 19. N. GILL and H. HELFGOTT, Growth of small generating sets in SL (Z/pZ), Int. Math. Res. Not., 18 (2011), 4226–4251. 20. A. M. GLEASON, The structure of locally compact groups, Duke Math. J., 18 (1951), 85–104. 21. A. M. GLEASON, Groups without small subgroups, Ann. Math., 56 (1952), 193–212. 22. K. GÖDEL, Consistency of the axiom of choice and of the generalized continuum-hypothesis with the axioms of set theory, Proc. Natl. Acad. Sci, 24 (1938), 556–557. 23. I. GOLDBRING, Nonstandard Methods in Lie Theory, Ph.D. Thesis, University of Illinois at Urbana-Champaign, 2009. 24. I. GOLDBRING, Hilbert’s fifth problem for local groups, Ann. Math., 172 (2010), 1269–1314. 25. B. GREEN and I. Z. RUZSA, Freiman’s theorem in an arbitrary abelian group, J. Lond. Math. Soc., 75 (2007), 163–175. 26. B. GREEN and T. TAO, Compressions, convex geometry and the Freiman-Bilu theorem, Q. J. Math., 57 (2006), 495– 27. M. GROMOV, Groups of polynomial growth and expanding maps, Publ. Math. IHÉS, 53 (1981), 53–73. 28. M. GROMOV, Metric Structures for Riemannian and Non-Riemannian Spaces, Modern Birkhäuser Classics, Birkhäuser, Boston, 2007. Based on the 1981 French original, With appendices by M. Katz, P. Pansu and S. Semmes, Translated from the French by Sean Michael Bates. 29. M. HALL Jr., The Theory of Groups, Chelsea Publishing Co., New York, 1976. Reprinting of the 1968 edition. 30. H. A. HELFGOTT, Growth and generation in SL (Z/pZ), Ann. Math., 167 (2008), 601–623. 31. H. A. HELFGOTT,Growthin SL (Z/pZ), J. Eur. Math. Soc., 13 (2011), 761–851. 32. J. HIRSCHFELD, The nonstandard treatment of Hilbert’s fifth problem, Trans. Am. Math. Soc., 321 (1990), 379–400. 33. E. HRUSHOVSKI, Stable group theory and approximate subgroups, J. Am. Math. Soc., 25 (2012), 189–243. 34. I. KAPLANSKY, Lie Algebras and Locally Compact Groups, The University of Chicago Press, Chicago, 1971. 35. V. KAPOVITCH,A.PETRUNIN,and W. TUSCHMANN, Nilpotency, almost nonnegative curvature, and the gradient flow on Alexandrov spaces, Ann. Math., 171 (2010), 343–373. 36. V. KAPOVITCH and B. WILKING, Structure of fundamental groups of manifolds with Ricci curvature bounded below, preprint (2011), arXiv:1105.5955. 220 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO 37. B. KLEINER, A new proof of Gromov’s theorem on groups of polynomial growth, J. Am. Math. Soc., 23 (2010), 815–829. 38. J. LEE and Y. MAKARYCHEV, Eigenvalue multiplicity and volume growth, preprint (2008), arXiv:0806.1745. 39. D. MONTGOMERY and L. ZIPPIN, Small subgroups of finite-dimensional groups, Ann. Math., 56 (1952), 213–241. 40. D. MONTGOMERY and L. ZIPPIN, Topological Transformation Groups, Interscience Publishers, New York, 1955. 41. P. J. OLVER, Non-associative local Lie groups, J. Lie Theory, 6 (1996), 23–51. 42. C. PITTET and L. SALOFF-COSTE, A survey on the relationships between volume growth, isoperimetry, and the behavior of simple random walk on Cayley graphs, with examples, survey, preprint (2000). 43. L. PYBER and E. SZABÓ, Growth in finite simple groups of Lie type of bounded rank, preprint (2010), arXiv:1005.1858. 44. I. Z. RUZSA, Generalized arithmetical progressions and sumsets, Acta Math. Hung., 65 (1994), 379–388. 45. I. Z. RUZSA, An analog of Freiman’s theorem in groups, Astérisque, 258 (1999), 323–326. 46. T. SANDERS, From polynomial growth to metric balls in monomial groups, preprint (2009), arXiv:0912.0305. 47. T. SANDERS, On a non-abelian Balog-Szemerédi-type lemma, J. Aust. Math. Soc., 89 (2010), 127–132. 48. T. SANDERS, On the Bogolyubov-Ruzsa lemma. Anal. Partial Differ. Equ. (2010), to appear, arXiv:1011.0107. 49. T. SANDERS, A quantitative version of the non-abelian idempotent theorem, Geom. Funct. Anal., 21 (2011), 141–221. 50. J.-P. SERRE, Lie Algebras and Lie Groups, Lecture Notes in Mathematics, vol. 1500, Springer, Berlin, 2006. 1964 lectures given at Harvard University, Corrected fifth printing of the second (1992) edition. 51. Y. SHALOM and T. TAO, A finitary version of Gromov’s polynomial growth theorem, Geom. Funct. Anal., 20 (2010), 1502–1547. 52. T. TAO, Product set estimates for non-commutative groups, Combinatorica, 28 (2008), 547–594. 53. T. TAO, Freiman’s theorem for solvable groups, Contrib. Discrete Math., 5 (2010), 137–184. 54. T. TAO and V. VU, Additive Combinatorics, Cambridge Studies in Advanced Mathematics, vol. 105, Cambridge University Press, Cambridge, 2006. 55. W. P. THURSTON, Three-Dimensional Geometry and Topology, vol. 1, Princeton Mathematical Series, vol. 35, Princeton University Press, Princeton, 1997. Edited by Silvio Levy. 56. L. van den DRIES and I. GOLDBRING, Globalizing locally compact local groups, J. Lie Theory, 20 (2010), 519–524. 57. L. van den DRIES and I. GOLDBRING, Seminar notes on Hilbert’s 5th problem, preprint (2010). 58. L. van den DRIES and A. J. WILKIE, Gromov’s theorem on groups of polynomial growth and elementary logic, J. Algebra, 89 (1984), 349–374. 59. N. T. VAROPOULOS,L.SALOFF-COSTE,and T. COULHON, Analysis and Geometry on Groups, Cambridge Tracts in Mathe- matics, vol. 100, Cambridge University Press, Cambridge, 1992. 60. H. YAMABE, A generalization of a theorem of Gleason, Ann. Math., 58 (1953), 351–365. 61. H. YAMABE, On the conjecture of Iwasawa and Gleason, Ann. Math., 58 (1953), 48–54. E. Breuillard Laboratoire de Mathématiques, Bâtiment 425, Université Paris Sud 11, 91405 Orsay, France emmanuel.breuillard@math.u-psud.fr B. Green Centre for Mathematical Sciences, Wilberforce Road, Cambridge CB3 0WA, England b.j.green@dpmms.cam.ac.uk THE STRUCTURE OF APPROXIMATE GROUPS 221 T. Tao Department of Mathematics, UCLA, 405 Hilgard Ave, Los Angeles, CA 90095, USA tao@math.ucla.edu Manuscrit reçu le 7 novembre 2011 Manuscrit accepté le 18 septembre 2012 publié en ligne le 19 octobre 2012. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Publications mathématiques de l'IHÉS Springer Journals

The structure of approximate groups

Loading next page...
 
/lp/springer-journals/the-structure-of-approximate-groups-daYOcWxb60

References (78)

Publisher
Springer Journals
Copyright
Copyright © 2012 by IHES and Springer-Verlag Berlin Heidelberg
Subject
Mathematics; Analysis; Mathematics, general; Number Theory; Geometry; Algebra
ISSN
0073-8301
eISSN
1618-1913
DOI
10.1007/s10240-012-0043-9
Publisher site
See Article on Publisher Site

Abstract

by EMMANUEL BREUILLARD, BEN GREEN, and TERENCE TAO ABSTRACT Let K  1 be a parameter. A K-approximate group is a finite set A in a (local) group which contains the identity, is symmetric, and such that A · A is covered by K left translates of A. The main result of this paper is a qualitative description of approximate groups as being essentially finite-by- nilpotent, answering a conjecture of H. Helfgott and E. Lindenstrauss. This may be viewed as a generalisation of the Freiman-Ruzsa theorem on sets of small doubling in the integers to arbitrary groups. We begin by establishing a correspondence principle between approximate groups and locally compact (local) groups that allows us to recover many results recently established in a fundamental paper of Hrushovski. In particular we establish that approximate groups can be approximately modeled by Lie groups. To prove our main theorem we apply some additional arguments essentially due to Gleason. These arose in the solution of Hilbert’s fifth problem in the 1950s. Applications of our main theorem include a finitary refinement of Gromov’s theorem, as well as a generalized Margulis lemma conjectured by Gromov and a result on the virtual nilpotence of the fundamental group of Ricci almost nonnegatively curved manifolds. CONTENTS 1. Introduction ...................................................... 115 2. Coset nilprogressions and a more detailed version of the Main Theorem .................... 124 3. Ultra approximate groups and Hrushovski’s Lie Model Theorem ......................... 130 4. An outline of the argument .............................................. 140 5. Sanders-Croot-Sisask theory . ............................................ 142 6. Proof of the Hrushovski Lie model theorem .................................... 148 7. Strong approximate groups ............................................. 160 8. The escape norm and a Gleason type theorem ................................... 163 9. Proof of the main theorem .............................................. 172 10. A dimension bound . ................................................. 183 11. Applications to growth in groups and geometry .................................. 190 Acknowledgements ..................................................... 200 Appendix A: Basic theory of ultralimits and ultraproducts ............................... 200 Appendix B: Local groups ................................................. 205 Appendix C: Nilprogressions and related objects .................................... 213 References ......................................................... 219 1. Introduction Approximate groups. — A fair proportion of the subject of additive combinatorics is concerned with approximate analogues of exact algebraic properties, and the extent to which they resemble those algebraic properties. In this paper we are concerned with sets that are approximately closed under multiplication, which we do not necessarily assume to be commutative, and more specifically with approximate groups. These are finite non- empty sets A with group-like properties which we shall state precisely later. First we will motivate the definition of an approximate group with some discussion and examples. Suppose first of all that A is a finite subset of some ambient group G = (G, ·). This is the setting considered in essentially all of the existing literature, and the one of DOI 10.1007/s10240-012-0043-9 116 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO importance in applications. However, as we shall see later, our method of proof is in fact more naturally adapted to a more general setting, in which A lies in a local group rather than a global one. It is easy to see that a finite non-empty subset A of G is a genuine subgroup if, and −1 only if, we have xy ∈ A whenever x, y ∈ A. Perhaps the most natural way in which a −1 −1 set A may be approximately a subgroup, then, is if the set A · A := {xy : x, y ∈ A} has cardinality not much bigger than the cardinality of A: for example, we might ask that −1 |A · A |  K|A| for some constant K. Sets with this property or with the closely related property |A |  K|A|,where A := A · A ={xy : x, y ∈ A}, are said to have small doubling, and this is indeed a commonly encountered condition in various fields of mathematics, in particular in additive combi- natorics. It is a perfectly workable notion of approximate group in the abelian setting and the celebrated Freiman-Ruzsa theorem, Theorem 2.1 below, describes subsets of Z with this property. However in [52] it was noted that in non-commutative settings a somewhat different, though closely related, notion of approximate group is more natural: A is an −1 approximate group if it is symmetric in the sense that the identity id lies in A, if a ∈ A whenever a ∈ A, and if A · A is covered by K left-translates of A. As suggested above we consider in this paper a slightly more general (and perhaps more natural, in retrospect) “local” definition of approximate group in which there is no ambient global group G. It will be convenient to introduce the following definition. This requires the concept of a local group, which is discussed at some length in Appendix B. Definition 1.1 (Multiplicative set). — A multiplicative set is a finite non-empty set A con- −1 200 tained in a (symmetric) local group G = (G, ·), such that the product set (A ∪ A ) is well-defined, −1 −1 where A := {a : a ∈ A} is the inverse of A. Strictly speaking, one should refer to the pair (A, G) as the multiplicative set rather than just A, but we will usually abuse notation and omit the ambient local group G. In some (abelian) examples, we will use additive group notation G = (G, +) rather than mul- tiplicative notation G = (G, ·). In such cases, we will refer to multiplicative sets as additive sets instead. Clearly, any finite non-empty subset of a (global) group G is a multiplicative set. The reader should probably keep this model case in mind throughout a first reading of this paper. Indeed the additional generality afforded by the local setting is only needed at a single, albeit critical, place in the argument in Section 9. One should informally think of a multiplicative set A as a set that behaves “as if ” it were in a global group, so long as one only works “locally” in the sense that one only considers products of up to 200 elements of A and their inverses. The exponent 200 in Definition 1.1 is somewhat arbitrary, but for the purposes of studying approximate groups, the exact choice of this exponent is not important in practice, so long as it is at least 8 (see Theorem 5.3 for a precise formalisation of this assertion). For the reader familiar with Freiman homomorphisms (cf. [54, §5.3]), we remark that these are essentially the morphisms in the category of multiplicative sets. THE STRUCTURE OF APPROXIMATE GROUPS 117 Definition 1.2 (Approximate groups). — Let K  1.A K-approximate group is a multi- plicative set A with the following properties: −1 (i) the set A is symmetric in the sense that id ∈ A and a ∈ A if a ∈ A; (ii) there is a symmetric subset X ⊂ A with |X|  K such that A · A ⊆ X · A. We will sometimes refer to actual (global) groups as genuine groups, in order to distinguish them from approximate groups. We define a global K-approximate group to be a K-approximate group A that lies inside a global group G. We refer to K as the covering parameter of the approximate group A. Remark 1.3. — We will also have occasion to deal with infinite K-approximate groups, which are defined exactly as ordinary K-approximate groups, except that they arenolongerrequiredtobefinitesets. AconvexbodyinaEuclideanspace,orasmall ball in a Lie group, are examples of infinite approximate groups. Later we will introduce the important notion of an ultra approximate group, which is another example. However, by default, approximate groups in this paper will be understood to be finite unless otherwise stated. The connection between sets with small doubling and the apparently stronger property of being an approximate group was worked out in [52], building on work of Ruzsa [45]; see Remark 1.5 below. When we speak of an “approximate group” we shall generally imagine that K is fixed (e.g. K = 10) and that |A| is large. Let us give some examples. Example 1 (Finite group). — A 1-approximate group is the same thing as a finite group. Example 2 (Arithmetic/geometric progression). —If N ∈ N is a natural number, then the arithmetic progression P(1; N) := {−N,..., N} (which one can view inside the (additive) global group Z, or the local group {−200N,..., 200N}) is a 2-approximate group. More generally, if G = (G, ·) is any (global) group and g ∈ G then the geometric progression −N N P(g, N) := {g ,..., g } is a 2-approximate group. Example 3 (Generalised arithmetic progression). —Let G = (G, +) be an abelian group, let u ,..., u ∈ Gfor some r  0, and let N ,..., N > 0 be real numbers. We refer to the 1 r 1 r set P(u ,..., u ; N ,..., N ) 1 r 1 r := {n u + ··· + n u : n ,..., n ∈ Z;|n |  N ,..., |n |  N } 1 1 r r 1 r 1 1 r r as a generalised arithmetic progression of rank r. One easily verifies that this is a 2 -approximate group. 118 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO Example 4 (Homomorphic images). —Let φ : G → H be a homomorphism between local or global groups. If A is a K-approximate subgroup of G, then φ(A) is a K- approximate subgroup of H. This observation can be generalised to the case when φ is a Freiman homomorphism (of order 3) rather than a group homomorphism; see [54, §5.3] for more discussion. Indeed, Freiman homomorphisms are very similar to homomorphisms of local groups, although for technical reasons we will rely on the latter concept rather than the former. Conversely, if B is a K-approximate subgroup of H, φ is surjective, and ker(φ) is −1 finite, then φ (B) is a K-approximate subgroup of G. In the latter case one can view the −1 K-approximate group φ (B) as a “finite extension” of the K-approximate group B by the genuine group ker(φ). Example 5 (Large subsets). — Let A be a K-approximate group, and let A be a sym- metric neighbourhood of the identity in A such that A is covered by K left-translates of A .Then A is a KK -approximate group. This hints that approximate groups are con- siderably more numerous than genuine groups, because the latter property is preserved under passage to “large” subsets, whereas the former is not. Example 6 (Heisenberg example). — Let G be the free nilpotent group of step 2 gen- erated by two generators u , u . More concretely, one can take G to be the Heisenberg 1 2 group ⎛ ⎞ 1 ZZ ⎝ ⎠ (1.1)G := 01 Z 00 1 with generators ⎛ ⎞ ⎛ ⎞ 100 110 ⎝ ⎠ ⎝ ⎠ 011 010 u := and u := . 1 2 001 001 Consider also the commutator ⎛ ⎞ −1 −1 ⎝ ⎠ [u , u ]:= u u u u = 010 ; 2 1 2 1 2 1 one has ⎛ ⎞ 1 n n 1 12 n n n 1 2 ⎝ ⎠ 01 n = u u [u , u ] 2 2 1 1 2 00 1 for all integers n , n , n . 1 2 12 THE STRUCTURE OF APPROXIMATE GROUPS 119 Let N , N  10 be real numbers. Define the nilprogression P(u , u ; N , N ) to be 1 2 1 2 1 2 −1 −1 −1 the set of all words in u , u , u , u that involve at most N occurrences of u , u and 1 2 1 1 1 2 1 −1 at most N occurrences of u , u . It is not difficult to verify that P(u , u ; N , N ) is a 2 2 1 2 1 2 symmetric neighbourhood of the identity which contains the set n n 1 2 12 u u [u , u ] :|n |  N /10, |n |  N /10, |n |  N N /10 2 1 1 1 2 2 12 1 2 1 2 and is contained in the set n n 1 2 n u u [u , u ] :|n |  10N , |n |  10N , |n |  10N N . 2 1 1 1 2 2 12 1 2 1 2 One can easily verify that P(u , u ; N , N ) is a K-approximate group for some absolute 1 2 1 2 constant K (for instance, one could take K = 100). Remark 1.4. — The above example was constructed inside the Heisenberg group. Later on we will discuss a generalisation of this example to arbitrary nilpotent groups. These examples, which we will call nilprogressions, will be needed to state the precise ver- sion of our main theorem (Theorem 2.10) below. We will define them later in this intro- duction. Example 7 (Direct products). — The direct product of a K -approximate group and a K -approximate group is a K K -approximate group, and so one may build up examples 2 1 2 of approximate groups using both subgroups and nilprogressions. Example 8 (Helfgott’s example). — The following example of Helfgott is a less obvious way of combining a subgroup and a nilprogression. Let A ⊆ GL (F ) be the following set of 3 × 3 matrices: 3 p ⎧ ⎫ ⎛ ⎞ ⎨ r xz ⎬ ⎝ ⎠ A := 0 s y : x, y, z ∈ F , −N  n  N . ⎩ ⎭ −n 00 (rs) Here, r, s ∈ F are fixed and N is large yet much smaller than p. Then A is a O(1)-approximate group. Note that A has the following form: it admits a subgroup H, normalised by A, such that A/H is a geometric progression. Indeed ⎧ ⎛ ⎞ ⎫ ⎨ 1 xz ⎬ ⎝ ⎠ H = 01 y : x, y, z ∈ F . ⎩ ⎭ In the language of Example 4, A is a finite extension of a geometric progression by the finite group H. See terrytao.wordpress.com/2009/06/21/freimans-theorem-for-solvable-groups/#comment-39705. 120 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO Each of the above examples was rather “algebraic” in nature, whereas the defini- tion of approximate group is somewhat combinatorial. We also have some more combina- torial criteria for generating approximate groups using sets of small doubling or tripling. Remark 1.5 (Relationship between small doubling and approximate groups). —Let A bea non-empty finite subset of a global group G. If |A |  K|A|, then the set H := (A∪{id}∪ −1 2 2 O(1) A ) is aO(K )-approximate group that contains A; see [52, Theorem 3.9]. In a 2 −1 O(1) similar vein if |A |  K|A| or |A · A |  K|A|, then there exists a O(K )-approximate O(1) O(1) groupHof size |H|= O(K |A|) such that A can be covered by O(K ) left-translates gH of H; see [52, Theorem 4.6]. Our aim in this paper is to “describe” the structure of approximate subgroups in an arbitrary ambient group in terms of more explicit algebraic objects such as those listed in the examples. Here is one form of our main result in this regard. Theorem 1.6 (Main theorem, simple form). — Let A be a global K-approximate group, thus it is contained in a (global) group G. Then there exists a subgroup G of G and a finite normal subgroup H of G with the following properties: (i) A can be covered by O (1) left-translates of G ; K 0 (ii) G /H is nilpotent and finitely generated of rank and step at most O (1); 0 K (iii) A contains H and a generating set of G . In particular, the group G is finite-by-nilpotent, and hence also virtually nilpotent. Indeed, the stabiliser in G of the conjugation action on H has finite index in G and is a 0 0 central extension of a finite index subgroup of G /H, and therefore is also nilpotent. By specialising Theorem 1.6 to the combinatorial examples in Remark 1.5 we obtain an analogous structure theorem for sets of small doubling. Corollary 1.7 (Freiman-type theorem). — Let A and B be finite non-empty subsets in a (global) 1 1 2 2 group G such that |AB|  K|A| |B| . Then there exists a subgroup G of G and a finite normal subgroup H of G with the following properties: (i) A can be covered by at most O (1) right translates of G ; K 0 (ii) G /H is nilpotent and finitely generated of rank and step O (1). In particular, G is 0 K 0 finite-by-nilpotent and hence also virtually nilpotent. Here and in the rest of the paper we use X = O (Y),X  Y, or Y X for two (standard) quantities X, Yand K K K a (standard) parameter K to denote the assertion that |X|  C Y for some (standard) quantity C > 0depending only K K on K, and similarly for other choices of subscripted parameters. We also adopt an analogous notation for nonstandard quantities; see Appendix A. The rank of a finitely generated group is the least number of generators required to generate the group. The step is the length of the lower central series, minus 1. THE STRUCTURE OF APPROXIMATE GROUPS 121 O(1) Proof.—By [52, Theorem 4.6], there exists a O(K )-approximate group A of O(1) O(1) size O(K |A|) such that A can be covered by O(K ) right translates of A and B can O(1) be covered by O(K ) left translates of A . We may thus apply Theorem 1.6 to A . Theorem 1.6 (or Corollary 1.7) answers in the affirmative a conjecture that we have been referring to as the Helfgott-Lindenstrauss Conjecture, on account of its having been raised independently in private communications by both Harald Helfgott and Elon Lin- denstrauss. In fact, the conjecture is reasonably explicit in the comments surrounding [31, Theorem 1.1]. Remark 1.8 (The linear case). — Various forms of the main theorem are also known in groups of Lie type of bounded dimension, as a consequence of results of many authors [6–8, 18, 19, 30, 31, 33, 43]. For instance, in [18] an analogue of Theorem 1.6 was established in the case when G is a solvable algebraic group of bounded dimension over a finite field of prime order. In that case, the group G /H has bounded rank, and the number of cosets of G needed to cover A is polynomial in K. We have no examples to rule out the possibility that this polynomiality in K holds in all groups G, perhaps at the cost of weakening the rank and step bounds on G /H. Unfortunately our methods, which rely on ultrafilter arguments, give no quantitative bounds on the covering number whatsoever. Remark 1.9 (Bounds on the nilpotent group). — Our method allows us to give an explicit bound on the dimension (rank and step) of the nilpotent group G /Hin Theorem 1.6 at the expense of replacing A in item (iii) by a larger power of A. Namely, if we allow for H and the generating set of G to be contained in A , then we may ensure that the nilpotent group G /His -nilpotent with  = O(K log K).Ifweare happytogoasfar O (1) as A , then this may be further reduced to   6log K. Here we say that a group is -nilpotent if it admits a generating set u ,..., u such that [u , u ]∈ u ,..., u for all 1  i j j+1 i < j . In particular such a group admits a normal series with cyclic factors of length at most , and so is also nilpotent of step at most . We refer the reader to Theorem 2.12 and to Section 10 for a detailed statement and proof. Remark 1.10. — Note that no bound is provided on the size of the finite group H in Theorem 1.6, other than that it is finite. Indeed, by considering A to be a large finite simple group it is not difficult to see that H can be arbitrarily large. We will in fact prove a much more precise version of Theorem 1.6 involving a slightly complicated type of approximate group which we call a coset nilprogression.We discuss this concept in some detail in the next section. For many applications, however, Theorem 1.6 is quite sufficient. Applications. — We now give a small selection of applications to growth in groups and to Riemannian geometry; a greater variety is assembled in Section 11, which also contains proofs of these statements. 122 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO Polynomial growth conditions and Gromov’s theorem. — Firstly, Theorem 1.6 yields a quick proof of Gromov’s theorem [27] on groups of polynomial growth. Theorem 1.11 (Gromov’s theorem). — Let G be a group of polynomial growth. That is, G is n d generated by a finite symmetric set S, and there are constants C and d such that |S |  Cn for all n ∈ N. Then G is virtually nilpotent. Remark 1.12. — In fact, our arguments show that there is some function n f (n) f : N → N, f (n) →∞, such that, if G does not have polynomial growth, then |S |  n for all n. We do not get an explicit function f . However, if the control parameter O (1) in Theorem 2.10 were known to be polynomial in K, we could take f (n) = c log n. The best (in fact only) lower bound known for this function at present is (log log n) , due to Shalom and the third author [51]. It is conjectured by some, in the absence of any examples to c n c n the contrary, that f (n)> n , and possibly even that |S |  e . In [33] Hrushovski also gave a derivation of Gromov’s theorem from his Lie model theorem (see Theorem 3.10 below). He in fact proved a strengthening of Gromov’s theorem (see [33, Theorem 7.1] or Theorem 11.1 below). We will be able to recover Hrushovski’s result more directly (see Corollary 11.2 below). In fact, our approach can also yield the following other strengthening of Gromov’s theorem, which is uniform in the size of the generating set S and appears to be new. Recall that if  ∈ N then we say that a group is -nilpotent if it admits a generating set u ,..., u such that [u , u ]∈ u ,..., u 1  i j j+1 for all i < j . Theorem 1.13. —Let d > 0. Then there is n = n (d)> 0 such that if G is a group generated 0 0 n d by a finite symmetric set S with 1 ∈ S for which |S |  n |S| for some n  n (d), then G is virtually nilpotent. In fact G has a normal subgroup of index at most O (1) which is finite-by-(O(d)-nilpotent). Proof. — The proof of this (and hence of Theorem 1.11) is a short enough deduction that we can give it here in the introduction. We refer the reader to Section 11 for more details. Let N = N(d) be a large quantity to be specified later, and let n be sufficiently n d large depending on N and d . By the pigeonhole principle and the hypothesis |S |  n |S| we see that if n is sufficiently large depending on N then there exists n ,N  n  n /100, 0 0 100n d n such that |S |  (200) |S |. By Corollary 5.2 (which is quite easy) this implies that 2n O(d) S is a e -approximate group. By our main theorem, Theorem 1.6 (and Remark 1.9), we can thus find a finite-by-(O(d)-nilpotent) and hence virtually nilpotent group G such 2n that S is covered by O (1) left-translates of G . By the pigeonhole principle, if N is large d 0 m+1 m enough, we can find a nonnegative m < 2n such that S G = S G . Multiplying on the 0 0 m+k m left by S repeatedly we conclude that S G = S G for all k  0. Since S generates G, 0 0 m 2n 2n we conclude that G = S G = S G . Since S was covered by O (1) left-translates of 0 0 d G, G has index O (1) in G, and so G is also virtually nilpotent. 0 d THE STRUCTURE OF APPROXIMATE GROUPS 123 Riemannian manifolds. — A. Petrunin suggested to us some years ago that a re- sult such as Corollary 1.13 would give a purely group-theoretical proof of a theorem of Fukaya and Yamaguchi [16] according to which fundamental groups of almost non- negatively curved manifolds are virtually nilpotent. Recall that a closed manifold M is said to be almost non-negatively curved if one can find a sequence g of Riemannian metrics on it for which diam(M, g )  1 while K  −1/n where K is the sectional cur- n M,g M,g n n vature. Indeed, a simple application of the Bishop-Gromov inequalities combined with Corollary 11.5 yields the following improvement assuming only a lower bound on the Ricci curvature and an upper bound on the diameter. Corollary 1.14 (Ricci gap). — Given d ∈ N,there is ε(d)> 0 such that the following holds. Let M = (M, g) be an d -dimensional compact Riemannian manifold with Ricci curvature bounded below by −ε and diameter diam(M)  1.Then π(M) is virtually nilpotent. This result is known to differential geometers and follows from the works of Cheeger-Colding [11] and Kapovitch and Wilking [36]. We refer the reader to Sec- tion 11.1 for more discussion and references concerning the above result. We only note that Corollary 11.5 yields in fact an explicit bound on the nilpotency class, namely that after passing to a subgroup of π (M) with index O (1) and quotienting by a finite normal 1 d subgroup, we obtain a O(d)-nilpotent group. Generalised Margulis lemma. — Another corollary of Theorem 1.6 is a “generalised Margulis lemma” for metric spaces of a type conjectured by Gromov in [28,§5.F].Amet- ric space X is said to have bounded packing with packing constant K if there is K > 0such that every ball of radius 4 in X can be covered by at most K balls of radius 1. Say that a subgroup  of isometries of X acts discretely on X if every orbit is discrete in the sense that {γ ∈  : γ · x ∈ } is finite for every x ∈ X and for every bounded set  ⊆ X. Corollary 1.15 (Generalized Margulis Lemma). — Let K  1 be a parameter. Then there is some ε(K)> 0 such that the following is true. Suppose that X is a metric space with packing constant K, and that  is a subgroup of isometries of X which acts discretely. Then for every x ∈ X the “almost stabiliser”  (x) = S (x) ,where S (x) := {γ ∈  : d(γ · x, x)<ε}, is virtually nilpotent. ε ε ε Note that the space X is not assumed to be a manifold. The traditional Margulis lemma establishes a similar statement for subgroups of isometries of pinched negatively curved manifolds, or more generally under a curvature lower bound. Approximate groups and polynomial growth. — Finally we remark on an additive- combinatorial application, which asserts that approximate groups have large subsets with “polynomial growth”. See also http://mathoverflow.net/questions/11091. 124 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO Theorem 1.16 (Approximate groups are locally of polynomial growth). — Suppose that A is a 4   4 4 global K-approximate group. Then A contains a O (1)-approximate group A with (A ) ⊂ A and m O (1) |A | |A| such that |(A ) | m |A | for all m  1. K K This theorem is an immediate consequence of Theorem 2.10 and Proposition C.5 below. Remark 1.17. — The above argument converted nilpotent structure (or more pre- cisely, coset nilprogression structure, see below) to polynomial growth. In the reverse di- rection, there is the result of Sanders [46] in certain monomial groups, in which poly- nomial growth is shown to imply a metric ball type structure, at least under the (rather strong) restriction that the approximate group A is normal in the ambient group G. 2. Coset nilprogressions and a more detailed version of the Main Theorem This section concerns the more precise variants of our main theorem, whose ex- istence we hinted at in the first introductory section. Let us first recall the fundamental inverse sumset theorem for abelian approximate groups. This was first introduced by Freiman [15], and a simplified argument was subsequently given in the paper [44]of Ruzsa. Here is the theorem in the torsion-free setting. Recall the notion of a generalised arithmetic progression, defined in Example 3 above. Theorem 2.1 (Freiman-Ruzsa theorem). — Let G = (G, +) be a torsion-free (global) abelian group, and let K  2 be a parameter. Suppose that A ⊆ G is a K-approximate group. Then 4A = A + A + A + A contains a generalised arithmetic progression P = P(u ,..., u ; N ,..., N ) 1 r 1 r O(1) O(1) O(1) − log K log K with r  log K and |P| e |A|. In particular A can be covered by O(e ) translates of P. Proof.—See[48] for the main part of this; the final assertion is then a consequence of Ruzsa’s covering lemma, Lemma 5.1. For earlier results of this type with weaker bounds on r and P, see [10, 44]. In [26] it was noted that one can take r as small as log K + ε for any ε> 0, at the cost of decreasing the size of |P| somewhat; see also [3, 4] for prior results along these lines. Roughly speaking, Theorem 2.1 asserts that, in a global torsion-free abelian group such as the integers Z, approximate groups are “controlled” by generalised arithmetic progressions of bounded rank. In the case of abelian groups with torsion, the class of generalised arithmetic progressions is not sufficient, as one must also now deal with the THE STRUCTURE OF APPROXIMATE GROUPS 125 example of finite genuine groups (Example 1). It is thus natural to introduce the con- cept of a coset progression H + P: the sum of a finite genuine group H and a generalised arithmetic progression P = P(u ,..., u ; N ,..., N ). This concept is sufficient for the 1 r 1 r formulation of a Freiman type theorem in an arbitrary abelian group. Theorem 2.2 (Abelian Freiman-Ruzsa theorem). — Let G = (G, +) be a (global) abelian group, and let K  2 be a parameter. Suppose that A ⊆ G is a K-approximate group. Then 4A contains a coset progression H + P,where P = P(u ,..., u ; N ,..., N ) 1 r 1 r O(1) is a generalised arithmetic progression with r  log K, H is a finite abelian subgroup disjoint from O(1) O(1) − log K log K P,and |H + P|=|H||P| e |A|. In particular, A can be covered by O(e ) translates of H + P. Proof. — Again, see [48]; see also [25] for an earlier result in this direction. We turn now to the business of dropping the commutativity assumption. We will also drop the assumption that A is contained in a global group and merely assume that A is a subset of a local group G. Informally, this means that we will not require the mul- tiplication law to be defined everywhere in G, but only in a certain neighborhood of id. We refer the reader to Appendix B for a precise definition and basic properties; see also [50, IV.3] for a discussion of the closely related notion of group chunk. We generalise the concept of a generalised arithmetic progression to this setting as follows. Definition 2.3 (Non-commutative progression). — Let u ,..., u be r elements in a local group 1 r G = (G, ·),and let N ,..., N be r positive real numbers. If all products g ... g are well-defined 1 r 1 n −1 5 in G, where each g is equal to one of u or u and, for each j = 1,..., r, the formal expression u i j j −1 and its inverse u appear at most N times, then we call the set of such products a non-commutative progression of rank r and side lengths N ,..., N and we denote it by P(u ,..., u ; N ,..., N ).We 1 r 1 r 1 r refer to r as the rank of the non-commutative progression. Remark 2.4. — One can view non-commutative progressions as multiparameter variants of balls in a word metric. For instance when all N take the same value N and one is working in a global group, the progression P(u ,..., u ; N,..., N) is comparable 1 r with the word ball B(N) of radius N in the group u ,... u for the word metric with 1 r generating set {u ,..., u } in the sense that B(N) ⊆ P(u ,..., u ; N,..., N) ⊆ B(rN). 1 r 1 r In the global abelian setting, all generalised arithmetic progressions of bounded rank are automatically approximate groups with a bounded covering parameter K. This For this definition, we consider u and u to be distinct formal expressions when i = j,evenif u and u take the i j i j −1 −1 same value in G, and similarly for u , u . Thus, for instance, P(u , u ; 1, 1) contains u u even if u , u are equal. 1 2 1 2 1 2 i j 126 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO is not the case in general non-abelian groups, even in the global setting. For instance, if F is the free non-abelian group on two generators e , e , then the non-commutative 1 2 progression P(e , e ; N, N) (which, as remarked earlier, is essentially the ball of radius N 1 2 in F) grows exponentially in N, and one can easily verify that P(e , e ; N, N) is only a 1 2 K-approximate group for K growing exponentially in N. However, the situation is much closer to the abelian case if the ambient group G is nilpotent. Given the link between pro- gressions and balls, the reader familiar with Gromov’s theorem on groups of polynomial growth [27] (to be discussed later on) will not find this surprising. Indeed, it can be shown (though we will not do so here) that if G is a global nilpotent group of step s, a non- commutative progression P(u ,..., u ; N ,..., N ) in G will be a O (1)-approximate 1 r 1 r r,s group if N ,..., N are sufficiently large depending on r and s. 1 r This motivates the following definition. Given some generators u ,..., u ,let us 1 r recursively define an iterated commutator of degree k involving these generators for a natural ±1 ±1 number k  1 by declaring u ,..., u to be the iterated commutators of degree 1, and [g, h] to be a iterated commutator of degree j + k whenever g, h are iterated commutators −1 −1 of weight j, k respectively for some j, k  1. Thus for instance [[u , u ], [u , u ]] is an 2 4 3 2 iterated commutator of u , u , u , u of degree 4. 1 2 3 4 Definition 2.5 (Nilprogression). — Suppose that G is a local group and that s  0 is an integer. A nilprogression of rank r and s is a non-commutative progression P(u ,..., u ; N ,..., N ) with 1 r 1 r the property that every iterated commutator of degree s + 1 in the generators u ,..., u is well-defined 1 r and equals the identity id. Example 9. — The generalised arithmetic progressions P(u ,..., u ; N ,..., N ) 1 r 1 r in Example 3 is a nilprogression (in additive notation) of rank r and step 1. The set P(u , u ; N , N ) in Example 6 is a nilprogression of rank 2 and step 2. 1 2 1 2 It can be shown (though we shall not do so here) that if N ,..., N are sufficiently 1 r large depending on r, s,and P(u ,..., u ; CN ,..., CN ) is a well-defined nilprogression 1 r 1 r of step s for some sufficiently large C depending on r, s,then P(u ,..., u ; N ,..., N ) is 1 r 1 r aO (1)-approximate group. r,s The concept of a nilprogression as defined above is related to, though not quite identical with, the one given in [5]. As a byproduct of our proof methods, we will be able to work with a more tractable subclass of nilprogressions, which we will call nilprogressions in C-normal form. These generalise the notion of a proper generalised arithmetic progression in the additive combinatorics literature, and are also close in spirit to the nilprogressions introduced in [53]. Definition 2.6 (C-normal form). — Let C  1. A non-commutative progression P(u ,..., u ; N ,..., N ) 1 r 1 r is said to be in C-normal form if the following axioms are obeyed. THE STRUCTURE OF APPROXIMATE GROUPS 127 (i) (Upper-triangular form) For every i, j with 1  i < j  r and for all four choices of signs ± one has CN CN j+1 r ±1 ±1 (2.1) u , u ∈ P u ,..., u ; ,..., . j+1 r i j N N N N i j i j In particular, [u , u ]= id whenever 1  i < r. i r 1 n (ii) (Local properness) The expressions u ... u are distinct as n ,..., n range over integers 1 r with |n |  N ,i = 1,..., r. i i (iii) (Volume bound) One has (2.2) 2N + 1 ··· 2N + 1  |P|  C 2N + 1 ··· 2N + 1 . 1 r 1 r The somewhat ugly expression (2N + 1)··· (2N + 1) is convenient to have 1 r in (2.2) for some minor technical reasons, but it would not do much harm for the reader to mentally substitute N ... N for this expression instead if desired. The volume bound 1 r (2.2) is morally (up to some degradation in the constants C) implied by the other axioms of a nilprogression in C-normal form, when the N ,..., N are sufficiently large, and one 1 r is working in a global group (or at least if one assumes P(u ,..., u ; DN ,..., DN ) to be 1 r 1 r well-defined for some sufficiently large D = D ), but for some further minor technical r,s reasons it is convenient to state this bound explicitly in the definition. Example 10. — The generalised arithmetic progressions P(u ,..., u ; N ,..., N ) 1 r 1 r in Example 3 will be in 1-normal form if it is proper, i.e. if all the expressions n u +···+ 1 1 n u for |n |  N are distinct. r r i i Example 11. — The set P(u , u ; N , N ) in Example 6 is not in C-normal form for 1 2 1 2 any bounded C, because [u , u ] is non-trivial. However, the closely related nilprogression 1 2 P u , u , [u , u ]; N , N , N N 1 2 1 2 1 2 1 2 of rank 3 and step 2 is in 1-normal form. The two sets are “comparable” in a number of ways; for instance, one can easily verify that 1 1 P u , u ; N , N ⊂ P u , u , [u , u ]; N , N , N N 1 2 1 2 1 2 1 2 1 2 1 2 C C ⊂ P(u , u ; CN , CN ) 1 2 1 2 for some absolute constant C (e.g. one can take C = 100). Remark 2.7. — Note that in the global group case, the step of a nilprogression in C-normal form is less or equal to its rank. 128 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO In Lemma C.1 we will show that any non-commutative progression P(u ,..., u ; 1 r N ,..., N ) in C-normal form is “essentially” a O (1)-approximate group. More pre- 1 r r,C cisely, we will show that P(u ,..., u ; εN ,...,εN ) is a O (1)-approximate group 1 r 1 r r,C,ε whenever ε> 0 is sufficiently small and the N ’s are sufficiently large depending on C, r. We will also show that every element of P(u ,..., u ; εN ,...,εN ) canberewrittenin 1 r 1 r 1 n the form u ... u h,where h ∈ Hand |n |= O (εN ), while conversely every such prod- i r,s i 1 r uct with |n |  εN obviously belongs to P(u ,..., u ; εN , ...,εN ). i i 1 r 1 r Just as in the abelian case, we need to account for genuine subgroups. The ana- logue of coset progression is a coset nilprogression, a concept we first define in the simpler setting of global groups. Definition 2.8 (Global coset nilprogression). — Let G be a (global) group. By a coset nilprogres- −1 sion of rank r and step s in G,wemeanaset P of the form π (Q),where G is a subgroup of G, H is a finite normal subgroup of G , π : G → G /H is the quotient map, and Q is a nilprogression of 0 0 0 rank r and step s in G /H. We say that P is in C-normal form if Q is in C-normal form. We can extend this definition to local groups, using the local notion of quotient group reviewed in Lemma B.12. Definition 2.9 ((Local) coset nilprogression). — Let G be a (local) group, which we endow with the discrete topology. By a coset nilprogression of rank r and step s in G,wemeanaset P of the form −1 π (Q),where H is a finite genuine subgroup of G with a cancellative normalising neighbourhood G , W is a neighbourhood of H in G with W ⊂ G , WH = HW = W, π : W → W/H is the 0 0 quotient map defined in Lemma B.12,and Q is a nilprogression of rank r and step s in W/H. We say that P is in C-normal form if Q is in C-normal form. We call H the finite group associated with P,and Q the nilprogression associated with P. If Q = P(u ,..., u ; N ,..., N ), then we write P = P (u ,..., u ; N ,..., N ). 1 r 1 r H 1 r 1 r Example 12. — A subgroup is a coset nilprogression of rank 0 and step 0. More generally, the direct product of a subgroup with a nilprogression of rank r and step s is a coset nilprogression of rank r and step s. The coset nilprogression will be in C-normal form if the associated nilprogression is. Example 13. — The set A constructed in Example 8 is a coset nilprogression of p−1 rank 1 and step 1, and is also in 1-normal form as long as N < . Again, coset nilprogressions in normal form are essentially approximate groups; see Lemma C.1 for a precise version of this statement. We are now ready to state our main technical theorem, which among other things implies Theorem 1.6, and whose proof will occupy the bulk of this paper. THE STRUCTURE OF APPROXIMATE GROUPS 129 Theorem 2.10 (Main theorem). — Let A be a K-approximate group. Then A contains a coset nilprogression P of rank and step O (1) and |P| |A|.Furthermore, P can be taken to be in K K O (1)-normal form. We remark that precursor results to this theorem in the case of nilpotent or solv- able groups were obtained in [5, 6, 14, 18, 52, 53]. Theorem 2.10 also provides an inde- pendent proof of a qualitative version of the abelian results of Theorem 2.1 and Theo- rem 2.2, which, in contrast to the other known proofs of these results, manages to almost completely avoid the use of Fourier analysis. It is easy to see that Theorem 2.10 implies Theorem 1.6, by taking G to be the global group generated by P. The key point here is that a group generated by a set u ,..., u is nilpotent of step at most s if every iterated commutator of the u ,..., u of 1 r 1 r degree s + 1 is trivial. A proof of this assertion may be found in Hall’s book [29]. By standard non-commutative product estimates, we can also establish the follow- ing Freiman-type theorem for sets of bounded doubling. Corollary 2.11 (Freiman-type theorem). — Let K  1.Let G be a (global) group and A, B be 1/2 1/2 finite non-empty subsets of G such that |AB|  K|A| |B| . Then there exists a coset nilprogression P of rank and step O (1) with |P| |A| which is in O (1)-normal form, such that A can be K K K covered by O (1) left-translates of P,and B can be covered by O (1) right-translates of P. K K Proof. — This follows immediately from combining Theorem 2.10 with [52,The- orem 4.6]. In Section 10, we will show the following explicit bounds on the rank and step of P. Theorem 2.12 (Bounds on the rank and step of the nilprogression). — In Corollary 2.11 (and in Theorem 2.10 if A is assumed to be a global K-approximate group), at the expense of replacing 4 12 the conclusion P ⊆ A with the weaker statement that P ⊆ A , the coset nilprogression P can be taken to have rank and step at most O(K log K) while remaining in O (1)-normal form. Moreover, if we O (1) settle for the even weaker inclusion P ⊂ A , one can ensure that P has rank and step at most 6log K (while still remaining in O (1)-normal form). It is likely that the numerical constants 6 and 12 here can be improved, but we will not pursue such improvements here. Local approximate groups can be embedded in global groups. — As we have remarked above, the approximate groups A considered in this paper are local in the sense that we do not However, our argument still uses results relating to Hilbert’s fifth problem which require Fourier-analytic tools, such as Pontryagin duality, even in the abelian setting. 130 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO need to assume that A lies in a global group G. However as a consequence of Theo- rem 2.10, the more detailed version of our main theorem, we have the following state- ment. It asserts that, at least at the qualitative level, there is in fact no loss of generality in dealing with the global case. Theorem 2.13. — Suppose that A is a K-approximate group. Then A contains a O (1)- 4 4 approximate group A with (A ) ⊂ A and |A | |A| which is isomorphic to a subset of a global group G. This theorem follows from Theorem 2.10 and the fact (which we prove in Lemma C.3) that a large portion of a coset nilprogression in normal form can be embedded in a global group. This theorem can be viewed as a discrete analogue to a recent result of Goldbring and van den Dries [56], who established that every locally compact local group is locally isomorphic near the identity to a locally compact global group (thus there is a neighbourhood of the identity in the former group that is isomorphic to a neighbour- hood of the identity in the latter group). One should also compare this result with Lie’s third theorem that every local Lie group is locally isomorphic to a global Lie group (see Theorem B.16 and the discussion in Serre’s book [50]). 3. Ultra approximate groups and Hrushovski’s Lie Model Theorem In the next section we will give an outline of the argument we shall use to prove Theorem 2.10. An extremely important component of it will be a Lie Model Theorem that implicitly appears in a remarkable paper of Hrushovski [33, Theorem 4.2], which provided the foundation for much of the work here, and for which we will give a self- contained proof later in this paper. We can state this theorem very informally as follows: Theorem 3.1 (Hrushovski’s Lie Model Theorem, informal version). — In a suitable limit, an approximate group is virtually modelled by a precompact neighbourhood of the identity in a Lie group. Of course, to make this theorem more precise, one has to formalise terms such as “suitable limit”, “virtually”, and “modelled”. We shall do so presently, but first we point out that Theorem 3.1 is very similar in spirit to a key step [27, §7] in Gromov’s proof of his celebrated theorem on groups of polynomial growth, which we state informally as follows. Theorem 3.2 (Gromov’s Lie Model Theorem, informal version). — In a suitable limit, a group of polynomial growth can be modeled by a finite-dimensional locally compact space with a transitive isometric action of a Lie group. To deepen the analogy between the two results, we note that Theorem 3.1 and Theorem 3.2 both require the deep body of results surrounding the solution to Hilbert’s THE STRUCTURE OF APPROXIMATE GROUPS 131 fifth problem on the topological description of the category of Lie groups (see [40]) in order to bring into view the Lie structure, which is not manifestly present when one first takes a limit. There are however some technical differences between the precise formu- lations of Theorem 3.1 and Theorem 3.2. In the latter theorem, one has a group G (of polynomial growth) generated by a finite set S. This gives a metric on G, the word metric given by the generating set S. Gromov then looks at the discrete balls S , n = 1, 2, 3 ... “from a distance” to get some continuous limit metric space X. For example if G = Z and S ={−1, 0, 1},then S ={−n,..., n}, and it is heuristically clear that these discrete intervals S , after rescaling by n, “converge” in a suitable sense to the continuous interval [−1, 1]⊆ R. To effect this limit, Gromov introduced what is now known as Gromov-Hausdorff con- vergence of a sequence of metric spaces. In subsequent work of van der Dries and Wilkie [58] a slightly different approach, using ultralimits (or non-standard analysis) was pio- neered. This construction is now known, in the geometric group theory literature, as the asymptotic cone. The asymptotic cone, then, is (a quotient of) an ultraproduct of the sequence of n 7 balls (S ) . We will use a similar limit in order to formalise Theorem 3.1,namelyan n∈N ultraproduct A of an arbitrary sequence (A ) of K-approximate groups, an object we n n∈N call an ultra approximate group. We now define this term more precisely. Definition 3.3 (Ultra approximate group). — Throughout this paper, we fix a non-principal ultrafilter α ∈ β N\N (see Lemma A.1 for a definition of this concept). If K > 0 is a real number then an ultra K-approximate group is an ultraproduct A := A ,where each A is a (standard) n n n→α K-approximate group. Thus, A is the space of all formal limits lim a with a ∈ A , where two n→α n n n formal limits lim a and lim a are considered equal if a = a for all n sufficiently close to α n→α n n→α n n n (i.e. for all n in an α-large subset of N). See Appendix A for more discussion on ultraproducts. Often we will not need to refer to K explicitly, in which case we speak simply of an ultra approximate group. Note that we allow the approximate groups A to lie in different ambient groups G (much as the notion of Gromov-Hausdorff convergence also does not require the spaces X involved to all live in a common ambient space). Ultraproducts are a model- theoretic limit, in contrast to the more geometric notion of a limit defined by Gromov- Hausdorff convergence. There are two key properties of these model-theoretic limits that make them convenient to use for our purposes. The first is Łos’s theorem, which roughly speaking asserts that any property that can be stated in the language of first-order logic holds for an ultraproduct A = A if and only if it holds for those A with n suf- n n n→α ficiently close to α; see Theorem A.6. The second is countable saturation, which we will use to establish the completeness of a certain (pseudo)metric space associated to an ultra approximate group; see Proposition 6.1. In [33], more saturated limits (not necessarily constructed using ultrafilters) were also considered, but we will not need such constructions here. 132 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO Next, we discuss what it would mean to “model” an ultra approximate group A. Informally, a model would seek to describe the “coarse-scale” behaviour of A, and in par- 2 3 ticular be able to predict when an orbit id, a, a , a ,... of an element a of A will “escape” A, while ignoring the “fine-scale” behaviour of A. Such a model will be formalised by a homomorphism φ : A → L of local groups that obey certain good properties (see Defi- nition 3.5 below). Before we present this formal definition, though, we first discuss some key examples of ultra approximate groups and their models. Example 14 (Nonstandard finite groups). — Suppose that A is a sequence of (standard) finite groups; then the ultraproduct A := A is an ultra approximate group. In this n→α case, A is in fact a genuine group, with group operation given by the law lim a · lim b := lim (a b ). n n n n n→α n→α n→α We will refer to such groups as nonstandard finite groups. A typical example of a nonstandard finite group is the nonstandard cyclic group Z/NZ := Z/nZ, n→α where N ∈ N is the nonstandard natural number (3.1)N := lim n. n→α In a nonstandard finite group A, there are no elements that ever escape A:if a ∈ A,then one has a ∈ A for all n ∈ N. As such, it will turn out that A can be modeled by a trivial homomorphism φ : A →{id} to the trivial group. Example 15 (Nonstandard intervals). — Now consider the sequence A := P(1; n) = {−n,..., n} of (standard) arithmetic progressions in Z. The ultraproduct A := A n→α can be viewed as the nonstandard arithmetic progression A = P(1; N) ={−N,..., N} in the nonstandard integers Z := Z, where N was defined in (3.1). Then A is an ultra n→α approximate group, and it can also be viewed as a local group inside the nonstandard integers Z. Consider now the map π : A → R defined by π lim a := st lim , n→α n→α Our use of the term “model” here is not, strictly speaking, the precise notion that is used in model theory, but is closer to the notion of a “Freiman model” from additive combinatorics, as used for instance in [25, 44]. This group is the analogue of the profinite completion Z = lim Z/nZ of the integers, but is built using the machinery of ultralimits rather than inverse limits. The two groups are however not identical. For instance, Z is torsion- free, whereas Z/NZ can contain torsion; for example if N is even, or equivalently if the set of even natural numbers is α-large, then Z/NZ contains the element N/2 mod N, which has order 2. But see Remark 3.4 below for a link between ultraproducts and inverse limits. THE STRUCTURE OF APPROXIMATE GROUPS 133 where stx is the standard part of a nonstandard real x (see Appendix A). Thus, for every standard ε> 0, one has π lim a − ε   π lim a + ε n n n→α n→α for all n sufficiently close to α. One may also write π(a) = st for all a ∈ A.The map π is a homomorphism of local groups from A into [−1, 1].Itis surjective since, for any γ ∈[−1, 1], the nonstandard integer x := γ N= lim γ n, n→∞ where  is the integer part function, has image π(x) = γ . The kernel ker(π ) is the set of x ∈ A with x = o(N) (thus if x = lim x and ε> 0 is standard, then |x |  εn an n→α n n α-large set of n). For instance, every standard integer lies in ker(π ),asdosomenon- standard integers such as  N= lim  n. n→∞ 10 m There are similar maps from A to [−m, m] for any fixed natural number m, which by abuse of notation we also call π . Informally, these maps model A by the interval [−1, 1], and more generally model A by [−m, m]. In this particular case, the model π : A →[−1, 1] of the ultraproduct A can be viewed as a limiting object for models π : A →[−1, 1] of the individual factors A , by defining π (a) := .However,inmore n n n n general situations, the model for the ultraproduct is only a limit for approximate models of the factors, and this is one reason why we need to work in the ultraproduct setting as much as we do. The model π : A →[−m, m] is not injective: if π(a) is trivial, this does not imply that a is trivial. However, π does have an injectivity-like property which will be impor- tant later, which roughly speaking asserts that if π(a) is small,then a is small. For instance, 1000 11 observe that if a ∈ A is such that π(a) ∈ (−1, 1),then a ∈ A. This property on the model π can be used to derive some important facts about the ultraproduct A; for in- 2 100 10 stance it implies the escape property that if a, a ,..., a all lie in A ,then a lies in A. These sorts of escape properties will play a major role in our arguments in later sections. Example 16 (Generalised arithmetic progression). — We still work in the integers Z,but now take A to be the rank two generalised arithmetic progression 10 10 A := P 1, n ; n, n := a + bn : a, b ∈{−n,..., n} . Strictly speaking, as we are currently in an additive setting, one should write mA = A + ··· + A rather than A = A · ··· · A here. This claim is not quite true when π(a) is −1or +1, as can be seen for instance by considering a = N + 1 = lim n + 1. n→α 134 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO Then the ultraproduct A := A is the subset of the nonstandard integers Z of the n→α form 10 10 A = P 1, N ; N, N = a + bN : a, b ∈{−N,..., N} . This is an ultra approximate group which can be modeled by the Euclidean plane R , m 2 using the model maps π : A → R defined for each standard m by the formula a b π a + bN := st , st N N m 2 whenever a, b = O(N). The image π(A ) is then the square [−m, m] .Asbefore, if 1000 2 a ∈ A is such that π(a) ∈ (−1, 1) ,then a ∈ A; this can be used to conclude that 2 100 10 if a, a ,..., a ∈ A ,then a ∈ A. Note here that while A lives in a “one-dimensional” ∗ 2 group Z,the model R is “two-dimensional”. This is also reflected in the volume growth of the powers A of A for small m and large n, which grow quadratically rather than lin- early in m. Example 17 (Heisenberg box, I). — This example is related to the Heisenberg example in Example 6.Wetakeeach A to be the “nilbox” ⎧ ⎫ ⎛ ⎞ ⎛ ⎞ 1 x z 1 ZZ ⎨ ⎬ n n ⎝ ⎠ ⎝ ⎠ (3.2)A := 01 y ∈ 01 Z :|x |, |y |  n, |z |  n . n n n n n ⎩ ⎭ 00 1 00 1 This is not quite an approximate group because it is not quite symmetric (cf. Example 6), but we will ignore this technicality for sake of exposition. In any case it can be repaired in −1 a number of ways, for instance by replacing A with A ∪ A . Once again we consider n n the ultraproduct A := A ; this is a subset of the nilpotent (nonstandard) group n→α ∗ ∗ 1 xz 1 Z Z 01 Z , consisting of all elements 01 y with |x|, |y|  Nand |z|  N ; again, this is a 00 1 00 1 (discrete) local group. Consider now the map ⎛ ⎞ 1 RR ⎝ ⎠ π : A → 01 R 00 1 defined by ⎛ ⎛ ⎞ ⎞ ⎛ ⎞ x z 1 xz 1st st N N ⎝ ⎝ ⎠ ⎠ ⎝ ⎠ (3.3) π 01 y := 01 st . 001 00 1 THE STRUCTURE OF APPROXIMATE GROUPS 135 This is easily seen to be a homomorphism (of local groups) to the Heisenberg group, whoseimage is thecompact set ⎧ ⎛ ⎞ ⎛ ⎞ ⎫ 1 xz 1 RR ⎨ ⎬ ⎝ ⎠ ⎝ ⎠ 01 y 01 R (3.4) ∈ :|x|, |y|, |z|  1 . ⎩ ⎭ 001 00 1 Informally, π models A (or A ) by what is essentially a unit ball in this Lie group. As before, we have the injectivity-like property that if a ∈ A is such that π(a) is sufficiently close to the identity, then a ∈ A; as such, one can again establish the escape property that if 2 100 10 a, a ,..., a all lie in A ,then a lies in A. Example 18 (Heisenberg box, II). — This is a variant of the preceding example, in which the (not quite) approximate groups A now take the form ⎧ ⎫ ⎛ ⎞ ⎨ 1 x z ⎬ n n ⎝ ⎠ (3.5)A := 01 y :|x |, |y |  n, |z |  n n n n n ⎩ ⎭ 00 1 so that the ultralimit A := A takes the form n→α ⎧ ⎛ ⎞ ⎛ ⎞ ⎫ ∗ ∗ ⎨ 1 xz 1 Z Z ⎬ ∗ 10 ⎝ ⎠ ⎝ ⎠ A := 01 y ∈ 01 Z :|x|, |y|  N, |z|  N . ⎩ ⎭ 001 00 1 Now consider the map 8 3 π : A → R defined by ⎛ ⎛ ⎞ ⎞ 1 xz x y z ⎝ ⎝ ⎠ ⎠ π 01 y = st , st , st . N N N The image of this map is the unit cube [−1, 1] , and is in particular compact. It is also a homomorphism of local groups, since ⎛ ⎛ ⎞ ⎛ ⎞ ⎞ 1 xz 1 x z x + x y + y z + z + xy ⎝ ⎝ ⎠ ⎝ ⎠ ⎠ π 01 y 01 y = st , st , st , N N N 001 00 1 10 2 10 but the nonstandard real xy /N = O(N /N ) is infinitesimal, and so the previous ex- pression is equal to x + x y + y z + z st , st , st N N N 136 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO which establishes the homomorphic nature of π . 8 3 Here we note that the homomorphism π : A → R is not associated to any exact 8 3 homomorphisms π from A to R . Instead, it is only associated to approximate homomor- phisms ⎛ ⎛ ⎞ ⎞ 1 x z n n x y z n n n ⎝ ⎝ ⎠ ⎠ π 01 y := , , n n n n n 00 1 into R . Such approximate homomorphisms are somewhat less pleasant to work with than genuine homomorphisms; one of the main reasons why we work in the ultraproduct setting is so that we can use genuine group homomorphisms, or at least local group homomorphisms, throughout the paper. Note that the preceding example (3.2) admits a homomorphism π˜ onto the abelian 1 RR group R by composing the map (3.3) with the natural map from 01 R to its abelian- 00 1 isation R . However the kernel of π˜ is, for us, too “big”. In particular it contains every 10 z 01 0 , and in particular contains elements of A not in A. By contrast there are no such 00 1 elements in the example (3.5). In particular, we can still use the model π to establish the 100 10 same escape property for A as before, namely that whenever a,..., a ∈ A , one has a ∈ A. We also note the sets A for small m and large n grow cubically in m in this example, and quartically in m in the previous example. This is consistent with the model groups having homogeneous dimension 3 in the current example and 4 in the previous example. In allthe aboveexamples, themodel group L wasaLiegroup.Wegivenow give some examples to show that the model need not initially be of Lie type, but can then be replaced with a Lie model after some modification. Example 19 (Nonstandard cyclic group, revisited). — The first example is the nonstandard N n cyclic group A := Z/2 Z = Z/2 Z. This is a nonstandard finite group and can n→α thus be modeled by the trivial group {id} as discussed in Example 14.However,itcan also be modeled by the compact abelian group Z of 2-adic integers using the model π : A → Z defined by the formula π(a) := lim a mod 2 n→∞ n n−1 where for each standard natural number n, a(mod 2 ) ∈{0,..., 2 } is the remainder of a modulo 2 (this is well-defined in A) and the limit is in the 2-adic metric. Note that the image π(A) of A is the entire group Z , and conversely the preimage of Z in A = A 2 2 is trivially all of Z/2 Z; as such, one can quotient out Z in this model and recover the trivial model of A. THE STRUCTURE OF APPROXIMATE GROUPS 137 Example 20 (Nonstandard abelian 2-torsion group). — In a similar spirit to the preced- N n ing example, the nonstandard 2-torsion group A := (Z/2Z) = (Z/2Z) can be n→α modeled by the compact abelian group (Z/2Z) by the formula π(a) := lim π (a) n→∞ where π : A → (Z/2Z) is the obvious projection, and the limit is in the product topology N N of (Z/2Z) . As before, we can quotient out (Z/2Z) and model A instead by the trivial group. Remark 3.4. — The above two examples can be generalised to model any nonstan- dard finite group G = G equipped with surjective homomorphisms from G to n n+1 n→α G by the inverse limit of the G . n n Example 21 (Lamplighter group). —Let F be the field of two elements. Let G be the Z Z Z lamplighter group Z  F ,where Z acts on F by the shift T : F defined by T(a ) := n n∈Z 2 2 2 (a ) . Thus the group law in G is given by n−1 n∈Z (i, x)(j, y) := i + j, x + T y . For each n,wethenset A ⊆ G to be the set A := (i, x) ∈ G : i ∈{−1, 0, +1}; x ∈ F , n Z whereweidentify F with the space of elements (a ) of F such that a = 0only for n n∈Z n 2 2 n ∈{1,..., n}. These sets A are not quite approximate groups because they are not sym- metric, but they are close enough to approximate groups for this discussion. For instance, −1 they have bounded doubling or bounded tripling, and A ∪ A is an approximate group. ∗ Z We model the ultraproduct A := A ⊂ Z  F by the group n→α 2 G × G := (i, x), (j, y) ∈ G × G : i = j , 0 Z 0 0 0 where G is the modified lamplighter group Z  F ((t)),where F ((t)) is the ring of 0 2 2 formal Laurent series a t over F with only finitely many non-zero a for n negative, n 2 n n∈Z and the shift given by the multiplication map T : f → tf .Wegive F ((t)) (and hence G 2 0 and G × G ) a topology by declaring the norm of a non-zero element a t of 0 Z 0 n n∈Z −n F ((t)) to be 2 ,where n is the least integer for which a is non-zero. The model map 2 n π : A → G × G is then given by the formula 0 Z 0 (n) (n) n (n) n π i, lim a := i, lim a t , i, lim a t . n n n−n n∈Z n→α n→α n→α n∈Z n∈Z Roughly speaking, π(a) captures the behaviour of a at the two “ends” of F . The image π(A) of A under this model is then the compact neighbourhood of the identity π(A) = (i, x), (i, y) ∈ G : i ∈{−1, 0, +1}, x, y ∈ F [t] 0 2 138 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO where F [[t]] ⊂ F ((t)) is the ring of formal power series a t over F .One can 2 2 n 2 n=0 also compute the images π(A ) for larger values of m, although they are a bit more 2 100 10 complicated. One can verify the escape property that if g, g ,..., g ∈ π(A ) for some g ∈ G × G ,then g ∈ π(A); here it is essential that we use both of the two factors of 0 0 G × G ,asthe claimisfalse if we project π to just one of the two factors G, or to the 0 Z 0 base group Z. So, in this case, one needs a moderately complicated (though still locally 12 m compact) group G × G to properly model A and its powers A .However,ifwepass 0 Z 0 to the large subset A of A defined by A := A ,where n→α n A := (i, x) ∈ G : i = 0; x ∈ F n 2 then A is now a nonstandard finite group (isomorphic to the group F considered in Example 20) and can be modeled simply by the trivial group {id}. Thus we see that we can sometimes greatly simplify the modeling of an ultra approximate group by passing to a large ultra approximate subgroup. Let us formalise the properties enjoyed by the above examples in the following definition, which will play a key role in this paper. Definition 3.5 (Good models). — Let A be an ultra approximate group. A good model for A is a symmetric local topological group L (see Definition B.1), together with a homomorphism π : A → L of local groups with the following properties: (i) (Thick image) There exists an open neighbourhood of the identity U in L such that −1 π (U ) ⊆ A and U ⊆ π(A). In particular ker π ⊆ A; 0 0 (ii) (Compact image) π(A) is contained in a compact set. (iii) (Approximation by “internal” sets) Suppose that F ⊆ U ⊆ U ,where F is compact and U is open. Then there is an ultraproduct A = A of finite sets A ⊆ A such n n n→α −1  −1 that π (F) ⊆ A ⊆ π (U). We will often abuse notation and refer to just L or π as the good model for A, rather than the pair (L,π). Remark 3.6. — Properties (i) and (ii) together imply that L is locally compact. We leave it to the reader to check that the examples given above have all of the properties of this definition. One can think of a good model as accurately describing the “coarse-scale” structure of the ultra approximate group A, without directly controlling the “fine-scale” structure. For instance, in the example (3.5) which is “abelian at coarse scales” but “2-step nilpotent at fine scales”, the model π only detects the abelian structure and not the 2-step nilpotent structure. 12 m m This can also be seen from volume growth considerations: A grows like 4 , which is also the rate of volume growth of π(A) in G × G , whereas the volume growth in a single factor G would only grow like 2 ,and thevolume 0 Z 0 0 growth in Z is only linear in m. THE STRUCTURE OF APPROXIMATE GROUPS 139 Remark 3.7. — In (iii), if F and U are symmetric neighbourhoods of the identity, −1 then A can be chosen to be symmetric (since one can replace A with A ∩ (A ) ). As L is locally compact, we may shrink U to be precompact; then U can be covered by finitely many translates of F, and thus A is then an ultra approximate group. Finally, we need to explain the adjective “virtually” in Theorem 3.1.Ingroup the- ory, “virtually” means “after passing to a finite index subgroup”. Note that a subgroup G of a group G has finite index if and only if G can be covered by finitely many left- translates—or, equivalently, right-translates—of G . This motivates the following defini- tion. Definition 3.8 (Large approximate subgroups). — Let A, A be ultra approximate groups. We 4 4 say that A is a large ultra approximate subgroup of A if one has (A ) ⊂ A ,and A can be −1 covered by finitely many left-translates of A (by elements of A · (A ) , of course). Remark 3.9. — It would be more aesthetically pleasing to have A ⊂ A instead of 4 4 (A ) ⊂ A , but we need the exponent 4 in the inclusion for some minor technical reasons. Note that the property of being a large ultra approximate subgroup is transitive. We are now in a position to state Hrushovski’s Lie Model Theorem. Theorem 3.10 (Hrushovski Lie Model Theorem). — Let A be an ultra approximate group. Then there is a large ultra approximate subgroup A of A such that A admits a local Lie group as a good model. We will prove this theorem in Section 6. As stated above, the basic idea of the proof is to first establish that A itself admits a locally compact local group as a good model. Here results of multiplicative combinatorics, and in particular a lemma of Sanders [47] (see also [13]), are critical. Once this is done, Theorem 3.10 follows relatively quickly from the deep results in the literature on Hilbert’s fifth problem. This theorem will then play a key role in the proof of Theorem 2.10 in two ways: firstly by allowing us to establish certain “escape” properties on (ultra) approximate groups that will be used to build useful metric structures on these groups; and secondly by giving a natural notion of the “dimension” of an (ultra) approximate group which we will need to induct on. Note that one can invoke Lie’s third theorem (Theorem B.16) to upgrade the local Lie group in Theorem 3.10 to a connected, simply connected, global Lie group, but for technical inductive reasons it will be more convenient to keep the model in the category of local Lie groups for now. Theorem 3.10 will be proven in Section 6. We will also establish a “global” variant of this theorem later, first in a weak form as Proposition 6.12 and then in a stronger form as Theorem 10.10. 140 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO 4. An outline of the argument In the previous section we introduced the notion of a (Hrushovski) Lie model, one of the key technical tools we will use to prove Theorem 2.10. In this section we outline the argument for this proof as a whole. Our aim is to show that every K-approximate group is controlled in some sense by a coset nilprogression of rank and density O (1). We shall prove this by contradiction, assuming that there is a sequence (A ) of K-approximate groups for which the state- n n∈N ment fails in the limit for any given choice of implied constant in the O (1) notation. In particular, the cardinality |A | will go to infinity as n →∞. We assemble these approx- imate groups into an ultra approximate group A := A . Our assumption implies n→α that A is not “controlled” in a certain sense by what we call an ultra coset nilprogression, which we now define. Definition 4.1 (Ultra coset nilprogression). — An ultra coset nilprogression is an ultraprod- uct P = P of coset nilprogressions P = P(u ,..., u ; N ,..., N ) of fixed (standard) n n 1,n r,n 1,n r,n n→α rank r and step s. We then say that P has rank r and step s. If the P are also all in C-normal form for some (standard) C independent of n, we say that the ultra coset nilprogression is in normal form.We call N := lim N for i = 1,...,rthe lengths of the ultra coset nilprogression, and say that the i n→α i,n nilprogression is nondegenerate if all the N are unbounded. We define the concept of an ultra nilprogression similarly, but replacing “coset nilprogression” by “nilprogression” throughout. As with all ultraproducts, it suffices to have the P obey the stated properties for all n sufficiently close to α, as one can redefine P arbitrarily on the remaining values of n without affecting the ultraproduct P. Note that an ultra nilprogression P can be expressed as P = P(u ,..., u ; N ,..., N ) 1 r 1 r where r is the rank, u ,..., u are elements of the ambient nonstandard local group, and 1 r N ,..., N are nonstandard positive reals. 1 r To obtain the contradiction, then, it is sufficient to establish the following ultra- product version of our main theorem. Theorem 4.2. — Suppose that A is an ultra approximate group. Then A contains a nondegen- erate ultra coset nilprogression P in normal form with |P| |A|. Here |P| |A| means that the non-standard numbers |A| and |P| satisfy |A|= O(|P|), or in other words that there is a (standard) number C > 0such that |A |  C|P | for an α-large set of n ∈ N. See the end of Appendix A for more infor- n n mation. THE STRUCTURE OF APPROXIMATE GROUPS 141 The Hrushovski Lie model theorem, Theorem 3.10, will be a key tool in establish- ing this, as we discuss below. In addition to this theorem, a further fundamental additional concept in our argument will be the notion of an escape norm. Definition 4.3 (Escape norm). — Let A be a multiplicative set. For a group element g ∈ A , we define the escape norm g ∈[0, 1] to be the quantity e,A g := inf : n ∈ N; g ∈ A for all 0  i  n . e,A n + 1 i i Recall that by convention, the statement g ∈ A is false if g is not well-defined. Now suppose that A is a nonstandard multiplicative set, i.e. an ultraproduct A = A of standard multiplicative sets A . n n n→α 10 ∗ If g = lim g is an element of A ,wedefine the escape norm g ∈ [0, 1] to be the quantity n→α n e,A g := lim g  . e,A n e,A n→α The escape norm can always be defined, but there are some remarkable lemmas essentially due to Gleason [21] concerning its properties when A is an approximate group. Specifically we will show in Section 8 that there is a set A controlling A for which the escape norms satisfy (precise versions of) the following estimates: (i) (Product property) If g ,..., g ∈ A then g ... g   g   + ··· + 1 n 1 n e,A 1 e,A g   ; n e,A −1 (ii) (Conjugation property) If g, h ∈ A then h gh  g  ; e,A e,A (iii) (Commutator property) If g, h ∈ A then [g, h]  g h  . e,A e,A e,A These estimates, which we shall informally term “Gleason’s lemmas” , will be proven in Section 8. They are valid in both the finitary and the ultralimit settings; the latter will be deduced, quite straightforwardly, from the former. The remarks in the following paragraph pertain to the finitary situation. To prove the Gleason lemmas, the set A must be what we call a strong approximate group. The precise definition of this is Definition 7.1. It is by no means obvious that there is a large strong approximate group A contained in A , but this will follow from the Hrushovski Lie model theorem (Theorem 3.10), basically because small balls in a Lie group are automatically strong approximate groups, and can then be pulled back by the model map. One A has been defined, Gleason’s lemmas are proven by an argument closely analogous to that of Gleason himself [21]. We shall say nothing further about the details here; the argument is self-contained and is discussed in Section 8. With Gleason’s lemmas in hand, let us describe the rest of the argument. Firstly, the set H ={g :g = 0} of elements which do not escape is a normal e,A (genuine) subgroup of A ; this follows from (i) and (ii). We may quotient by H to get an ultra approximate group A := A /H, all of whose non-identity elements have nonzero 0 142 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO escape norm. We shall call such approximate groups NSS approximate groups,inanalogy with the no small subgroups property in the theory of locally compact groups. Now, if g ∈ A is an element other than the identity with smallest (nonzero) · - 1 e,A 0 0 escape norm then we shall see that in fact, if A is chosen appropriately, g ∈ A . Item (iii) 1 0 then implies that for any h ∈ A , [g , h]∈ A hassmallerescapenormthan g , and hence 0 1 1 must be the identity. In other words, g is central in A and we may quotient again to get 1 0 a new approximate group A := A / g . We are being quite fuzzy at this point; in fact, 1 0 1 the quotienting takes place in the category of local groups and one is quotienting not by the entire group g but by an appropriate geometric progression within it. Continuing in this vein we pick g ∈ A other than the identity with smallest · - 2 e,A norm. We shall see that this norm is automatically nonzero, a consequence of the local nature of the quotienting operation. Continuing further, we pick g , g ,... . 3 4 All of this makes sense at the level of ultralimits as well, and in this setting one can show that A has a Hrushovski Lie model L with dim L < dim L for all i. Because i i i i−1 of this, the quotienting procedure terminates in finite time with an element g and one concludes by reversing these finitely many quotienting operations that A is controlled by an ultra coset nilprogression with “generators” H, g ,..., g , thereby leading to a proof 1 k of Theorem 4.2. This concludes our brief summary of the argument. Let us summarise the content of the remaining core sections of the paper. • In Section 5, we discuss results from multiplicative combinatorics, essentially due to Sanders and Croot-Sisask, which are relevant to the proof of Hrushovski’s Lie model theorem. • In Section 6, we prove the Hrushovski Lie model theorem. • In Section 7, we use the Hrushovski Lie model theorem to construct strong approximate groups. • In Section 8, we state and prove Gleason’s lemmas. • In Section 9, we give details of the inductive strategy outlined above for con- structing H and g ,..., g , and conclude the proof of Theorem 2.10 (except for 1 k the rank bound). • In Section 10 we show that the rank and step of the coset nilprogression can be bounded by 6 log Kin the global case. • Section 11 is devoted to various applications to the growth of groups and to Riemannian geometry. We prove there the corollaries stated in the introduction. 5. Sanders-Croot-Sisask theory In the next section we will establish Hrushovski’s Lie Model Theorem (Theo- rem 3.10), in which an ultra approximate group is related first to a locally compact THE STRUCTURE OF APPROXIMATE GROUPS 143 metrisable local group and then, via Goldbring’s solution [24] of the local Hilbert’s Fifth problem, to a local Lie group. In locally compact metrisable local groups we have total boundedness, which means that the unit (say) ball B(id, 1) := {x ∈ G : d(x, id)  1} may be covered by O (1) smaller balls B(x ,ε) := {x ∈ G : d(x, x )  ε}. On the other hand, ε i i by continuity of the group operation, B(id, 1) will contain high powers like B(id,ε) for suitably small ε. It is not surprising, then, that we need tools for showing (roughly speaking) that approximate groups A contain high powers of somewhat smaller, but still quite large, approximate subgroups A , which do not immediately escape A in the sense that (A ) is contained inside A (or perhaps a slightly larger set such as A ) for a reasonably large value of m. Such a tool is provided by a result from multiplicative combinatorics due to Sanders [47] and to Croot-Sisask [13, Theorem 1.6], namely Theorem 5.3 below. We shall also need a “normal” variant of this result, which essentially follows by combining Theorem 5.3 with [49, Lemma 13.1]. Our version of this is Theorem 5.6 below, and once again we provide a self-contained proof. Let us remark that by appealing to these results from multiplicative combinatorics we differ fairly substantially from the approach taken by Hrushovski [33], although one may perceive structural similarities in the model-theoretic arguments he uses. All of the results below are essentially already in the literature, but always for sub- sets A of some ambient (global) group G. As it turns out, though, the proofs of these results end up being equally valid for the more local setting of multiplicative sets. Indeed, most of the tools used in multiplicative combinatorics (with the notable exception of the Fourier transform) are already “local” in nature in that they only require one to do O(1) multiplications. Our first such tool is Ruzsa’s covering lemma, which essentially allows one to select a “complete set of coset representatives” in the approximate group setting. Lemma 5.1 (Local Ruzsa covering lemma). — Let A, B be finite sets, and suppose that A ∪ B is a multiplicative set. Then there exists a finite set X ⊆ B with |X|  |AB|/|A| and B ⊆ A X. Similarly there exists a finite set Y ⊆ B such that |Y|  |BA|/|A| and B ⊆ YA . Proof. — Let X be a subset of B such that the sets A · x for x ∈ X are disjoint, and such that X is maximal with respect to set inclusion; then we have |X|  |AB|/|B|.If b ∈ B, then A · b and A · x must intersect for some x,thus a · b = a · x for some a, a ∈ A. −1 −1 Multiplying on the left by a , we conclude that b = a · a · x, and the claim follows. A corollary of this is the following result, which allows one to produce an approxi- mate group from a set with small growth. Corollary 5.2. —Let A be a symmetric multiplicative set, and suppose that |A |  K|A|. Then A is a 2K-approximate group. 144 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO 2 5 Proof. — Clearly A is a symmetric set containing the identity. Since |A |  K|A| 2 4 K|A |, we see from Lemma 5.1 that there exists X ⊆ A with |X|  Ksuch that 4 2 4 4 2 A ⊆ A X, and there similarly exists Y ⊆ A with |Y|  Ksuch thatA ⊆ YA . Taking the union of X and Y we obtain the claim. We turn now to the result of Sanders [47] that drives our whole approach. Theorem 5.3 (Small neighbourhoods). — Suppose that A is a K-approximate group, and let m  1 be an integer. Then there is a O (1)-approximate group S with |S| |A| such that K,m K,m m 4 S ⊆ A . Remark 5.4. — Explicit bounds for the implied constants are given in, for example, [13, Theorem 1.6]. As much of the remainder of the argument is not explicitly effective with respect to bounds, we do not worry about such quantitative issues here. Similar remarks can be made in connection with the normal variant, Theorem 5.6 below. Proof. — We use the argument from [47], generalised to the setting of multiplicative sets. For the convenience of the reader, we reproduce it here. A somewhat different proof of Theorem 5.3 can also be obtained by using the techniques of [13]. For each 0 < t < 1, let f (t) denote the quantity |AB| f (t) := inf : B ⊆ A;|B|  t|A| . |A| Since |A |  K|A|,wehave 1  f (t)  Kfor all0 < t < 1. By the pigeonhole principle, we can thus find t 1such that K,m t 1 (5.1) f  1 − f (t). 2K 100m Fix this t. As there are only finitely many sets B that make up the infimum for f ,wecan find a B ⊂ Awith |B|  t|A| such that (5.2) |AB|= f (t)|A|. For each a ∈ A, the set Ba has cardinality |B| and is contained in A . 1 (x) =|A||B| Ba a∈A x∈A and hence by Cauchy-Schwarz we obtain 2 2 |A| |B| 1 (x) = . Ba |A | a∈A x∈A THE STRUCTURE OF APPROXIMATE GROUPS 145 The left-hand side can be rewritten as Ba ∩ Ba , a∈A a ∈A and so by the pigeonhole principle, there exists a ∈ Asuch that |A||B| |Ba ∩ Ba |  . |A | a∈A Since |B|  t|A| and |A |  K|A|,wethushave |Ba ∩ Ba |  |A| , a∈A and hence we can find a subset C of A of cardinality 2 2 (5.3) |C|  t /2K |A| −1 2 2 −1 such that |Ba ∩ Ba |  t |A|/2K for all a ∈ C. Multiplying by a and by a , we see 2 2 −1 −1 that |Bh ∩ B|  t |A|/2K for all h ∈ S ,where S := a C ∪ C a ∪{id} is a symmetric 0 0 0 subset in A containing the identity. From (5.1), we conclude that A(Bh ∩ B)  1 − f (t)|A|. 100m From (5.2), we conclude that |ABh ∩ AB|  1 − |AB|. 100m Using induction (and the hypothesis that A is well-defined, noting that B ⊂ Aand S ⊂ 2  m A ) we then see that for any 1  m < 100m, the set S is well-defined and |ABh ∩ AB|  1 − |AB| 100m m m 4 for all h ∈ S , which in particular implies that S ⊂ A . On the other hand, from (5.3) 0 0 we have |S | |A|. From Corollary 5.2 we see that S := S is a O (1)-approximate 0 K,m K,m m 2m 4 group. Since S = S ⊂ A , we obtain the first claim of the lemma. The second claim follows by applying the Ruzsa covering lemma (with B := S ). Remark 5.5. — Let us pause to note a consequence of this result. We defined mul- tiplicative sets to be ones in which one was at liberty to take up to 100 multiplications (i.e. A is well-defined), and the associative law would hold to this extent. Theorem 5.3,or 146 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO more accurately a close examination of the proof of it, says that if A is an approximate group and a multiplicative set in which merely 8 multiplications are allowed (i.e. A is well-defined) then A is O (1)-controlled by an O (1)-approximate group A = Sin m,K m,K which up to m multiplications are defined an associative. For this reason Theorem 2.10 holds if only 8 multiplications are allowed. We shall not dwell on such details further in this paper, allowing ourselves the luxury of 100 multiplications. We turn now to proving a “normal” variant of Theorem 5.3.Here, we usethe notation b −1 a := b ab and B b A := a : a ∈ A, b ∈ B for elements a, b and subsets A, B of a local group. Theorem 5.6 (Small normal neighbourhoods). — Suppose that A is a K-approximate group, and let m  1 be an integer. Let S ⊆ A be a K -approximate group with |S|= δ|A|. Then there is an m A 4 ˜ ˜ ˜ O (1)-approximate group S with |S| |A| such that (S ) ⊆ S . m,K,K ,δ K,K ,m,δ Theorem 5.6 will be deduced from Theorem 5.3. To motivate the argument, let us first recall a standard lemma from group theory. Lemma 5.7. —Let A be a finite group, and let S be asubgroupof A with |S|  |A|/K. Then ˜ ˜ there exists a further subgroup S ⊂ S of A with |S| |A| which is normal in A. Note that this lemma would easily yield Theorem 5.6 from Theorem 5.3 in the special case when A and S are genuine groups and not merely approximate groups. Proof.—Let x ,..., x be a complete set of right coset representatives for S in A, 1 k and set −1 −1 S = x Sx = x Sx. x∈A All the claims of the lemma are immediate, except for the claim that |S| |A|.However, this follows from iterating the fact that if H , H  G are subgroups of small index in a 1 2 group G then so is H ∩ H ; in fact we have the well-known inequality 1 2 (5.4) [G : H ∩ H ]  [G : H ][G : H ]. 1 2 1 2 To adapt this argument to the approximate setting we need an analogue of (5.4) for approximate groups. This is provided by the following lemma. THE STRUCTURE OF APPROXIMATE GROUPS 147 Lemma 5.8. — Suppose that A is a K-approximate group and that A , A ⊆ A are sets with 1 2 −1 −1 −1 |A |= δ |A|.Then A A ∩ A A contains a set BB with B ⊆ A and |B|  δ δ |A|/K. i i 1 2 1 2 1 2 −1 −1 Proof. — Since A A ⊆ A ,wehave |A A |  K|A|. It follows that there is some 2 2 1 1 −1 x with at least δ δ |A|/K representations as a a . Let B be the set of all values of a that 1 2 2 2 −1 −1 appear. Obviously BB ⊆ A A . Suppose that a , a ∈ B. Then there are a , a such 2 2 1 2 1 −1 −1 −1 −1 −1    −1 that x = a a = (a ) a ,and so a a = a a .Thus BB lies in A A as well. 2 1 1 1 2 1 1 2 1 2 By iterating the above lemma we obtain the following corollary. Corollary 5.9. — Suppose that A is a K-approximate group and that A ,..., A ⊆ A are 1 k −1 sets with |A |  δ|A| for each i. Then | A A | |A|. i i δ,k,K i=1 i Now we can prove Theorem 5.6. Proof of Theorem 5.6.—By Theorem 5.3, there is an O  (1)-approximate sub- l,K,K group S ⊆ S , |S | |A|, 0 m,K,K ,δ such that 4m+4 4 (5.5)S ⊆ S . The Ruzsa covering lemma allows us to do the analogue of picking a complete set of coset representatives in the approximate group setting. Specifically, there are x ,..., x , 1 k k = O (1),suchthat m,δ,K 4 2 (5.6)A ⊆ S x . i=1 Let us assume without loss of generality that x = id. By Corollary 5.9, the set 2 −1 T := x S x 0 i i=1 has cardinality |T| |A|. m,K,K ,δ We claim that the set S := T has the required properties. First of all note that, by Corol- lary 5.2, S is indeed an O  (1)-approximate group. m,K,K ,δ 148 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO Next observe that, since x = id, −1 2 (5.7) x Tx ⊆ S i 0 for each i. 4 2 Suppose that x ∈ A . Then, by (5.6), there is some i with 1  i  k and some s ∈ S such that x = sx . It follows from this, (5.7)and (5.5)that 2m −1 m −1 2m −1 −1 2m −1 −1 4m+4 4 x S x = x T x = s x T x s = s x Tx s ⊆ S ⊆ S . i i i i 0 This concludes the proof. 6. Proof of the Hrushovski Lie model theorem In this section we establish Theorem 3.10. The reader may wish to reread Sec- tion 3, which gave an overview of this theorem. We will deduce this theorem from the following two propositions. Proposition 6.1 (Locally compact model). — Let A be an ultra approximate group. Then A admits a model π : A → G by a metrisable locally compact local group G. Proposition 6.2 (From locally compact models to Lie models). — Let A be an ultra approximate 4 32 group and suppose that A admits a model π : A → G into a locally compact local group G. Then 4 4 8 ˜ ˜ ˜ there is a large ultra approximate group A of A (thus A ⊂ A ) which admits a model π˜ : A → L into a connected, simply-connected Lie group L. It is clear that the above two propositions together imply Theorem 3.10. We will give a self-contained proof of Proposition 6.1, using the multiplicative com- binatorics results of the previous section, together with the countable saturation property of ultraproducts. In contrast, the proof of Proposition 6.2 requires deep material related to (the local version of) Hilbert’s fifth problem, for which we provide suitable references. Building metrics on local groups. — We now begin the proof of Proposition 6.1. Suppose that we have a pseudometric d : G × G →[0, ∞) on some local group G, that is to say d satisfies the axioms of a metric, except that we may have d(x, y) = 0when x = y. Then we may of course define the balls B(id,ε) := {x ∈ G : d(x, id)<ε}, and these will be nested in the sense that B(id,ε) ⊆ B(id,ε ) if ε< ε . We now examine ways to reverse this construction, beginning with a quite general way to construct pseudometrics on symmetric local groups; this will be needed to prove Proposition 6.1. Let G be a symmetric local group. For any function ψ : G → R and g ∈ G, we define the shift T ψ : G → R by setting −1 T ψ(x) := ψ g x g THE STRUCTURE OF APPROXIMATE GROUPS 149 −1 if g x is well-defined in G, and T ψ(x) = 0 otherwise. We then define the “derivative” operator ∂ ψ := ψ − T ψ. g g The expression ∂ ψ := sup ∂ ψ(x) g  (G) g x∈G can be viewed heuristically as a “norm” of g relative to ψ , and this makes it natural to consider the function (6.1) d(g, h) := T ψ − T ψ ∞ =∂ −1 ψ ∞ . g h  (G) h g  (G) One can view d as the pullback of the metric on  (G) to G using the translation action g → T ψ of G on ψ . Lemma 6.3 (Using functions to build (pseudo-)metrics). — Let G be a local group, and let A be a symmetric neighbourhood of the identity such that A is well-defined in G.Let ψ : G → R be non-negative and supported on A. 128 2 ∞ ∞ (i) We have ∂ ψ  ψ for all g ∈ A , with equality holding when g ∈ A . g  (G)  (G) (ii) Whenever g, h ∈ A , one has (6.2) ∂ ψ ∞  ∂ ψ ∞ +∂ ψ ∞ . gh  (G) g  (G) h  (G) (iii) For any g ∈ A , we have −1 ∞ ∞ (6.3) ∂ ψ =∂ ψ . g  (G) g  (G) 64 64 + (iv) The function d : A × A → R defined by the formula (6.1) is a left-invariant pseudo- metric on A . Remark 6.4. — To spell out what we mean in (iv), we are asserting that d(g, g) = 0, that d(g, h) = d(h, g),and that d(g, k)  d(g, h) + d(h, k) for all g, h, k ∈ A . Furthermore 64 128 it has the left-invariance property d(gh, gk) = d(h, k) whenever h, k ∈ A , g ∈ A ,and gh, gk ∈ A . Later on, when proving Gleason’s lemmas, we shall require some slightly more exotic properties of these cocycle “norms”, related to commutation and a certain “Taylor expansion”. Proof. — The property (i) is clear from construction. For g, h ∈ A we have the representation property T T ψ = T ψ and hence the cocycle identity g h gh ∂ ψ = ∂ ψ + T ∂ ψ gh g g h 150 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO which gives (6.2). Similarly, for g ∈ A we have the inverse identity −1 −1 ∂ ψ =−T ∂ ψ g g which gives (6.3). The claims in (iv) follow easily from (ii) and (iii). In the next lemma we give a variant of the Birkhoff-Kakutani construction [40, §1.22], in which a function ψ is constructed so that the pseudometric d(g, h) = ∂ −1 ψ ∞ is adapted to a given nested sequence of symmetric sets which are sup- h g  (G) posed to resemble “balls” in this pseudometric. Lemma 6.5 (Birkhoff-Kakutani construction). — Suppose that G is a local group and that we have a sequence of symmetric neighbourhoods A , A ,... of the identity in G with the nesting property 0 1 2 200 that A ⊆ A for i = 0, 1, 2,... , and with A well-defined. Then there is a pseudometric i+1 0 64 64 d : A × A →[0, 1] 0 0 such that we have the inclusions 64 −k 64 −k (6.4) g ∈ A : d(g, id)< 2 ⊆ A ⊆ g ∈ A : d(g, id)  2 · 2 0 0 for all nonnegative integers k. In particular x → x in the pseudometric d if and only if, for each k ∈ N, −1 we have x x ∈ A for all sufficiently large n. n k −i −i 1 k Proof. — Suppose that q = 2 + ··· + 2 ,0 < q < 1, is a dyadic rational, and define B := A A ... A . q i i i k k−1 1 Even though the definition uses a potentially large number k of multiplications, the nest- ing property of the A means that these sets B are well-defined in the local group G. i q −k We claim that B ⊆ B whenever q is a dyadic rational with denominator divid- q+2 ing 2 ; this easily implies that (6.5)B ⊆ B  whenever 0 < q < q < 1. q q The claim follows by repeated use of the nesting A ⊆ A (the number of times it will be i+1 −k required is the number of carries when 2 is added to q in binary). In particular, B ⊆ A ⊂ A . q i −1 0 Define ψ : A →[0, 1] by ψ(x) := sup{1 − q : 0 < q < 1; x ∈ B }∪{0}, q THE STRUCTURE OF APPROXIMATE GROUPS 151 and consider the pseudometric d(g, h) := ∂ −1 ψ ∞ as discussed in Lemma 6.3.Note h g  (G) 64 ∞ that for g, h ∈ A , ∂ −1 ψ is supported in A , and so one can replace  (G) here with h g 0 0 A if desired. −k k −k If d(g, id)< 2 then |∂ ψ(id)| < 2 , which implies that ψ(g)> 1 − 2 and there- fore g ∈ B −k and hence g ∈ A . 2 k −k Conversely, suppose that g ∈ A : we are to show that d(g, id)  2 · 2 . To show this 1−k we must confirm that |∂ ψ(h)| < 2 for all h ∈ G. As discussed before, we may assume 192 −k −k that h ∈ A . Suppose that h ∈ B ,where 0 < q < 1− 2 is an integer multiple of 2 ,but −k −1 that h ∈ / B −k.Then ψ(h)  1 − q + 2 . On the other hand, g h ∈ A B ⊆ B −k,by q−2 k q q+2 −1 −k the claim established above, and therefore ψ(g h)  1− q− 2 . It follows that ∂ ψ(h) = −1 −k −k ψ(g h) − ψ(h)  −2 · 2 . Similarly, ∂ ψ(h)  2 · 2 . Since h was arbitrary it follows −k that d(g, id) =∂ ψ ∞  2 · 2 , and the claim follows. g  (G) If the sets A satisfy a certain normality condition, the group operations are con- tinuous with respect to the pseudometric d : Lemma 6.6 (Normal Birkhoff-Kakutani construction). — Suppose that G is a local group and that we have a sequence of symmetric sets A , A ,... in G with A well-defined and with 0 1 2 A the nesting property that (A ) ⊆ A for i = 0, 1,... (and so, in particular, we certainly have i+1 the weaker nesting property A ⊆ A required by the preceding lemma). Consider the pseudometric i+1 64 64 32 32 64 d : A × A →[0, 1] defined in the preceding lemma. Then the product map ·A × A → A 0 0 0 0 0 −1 32 32 and the inversion map : A → A are both continuous with respect to d . 0 0 Proof. — Suppose that g → g and that h → h. We wish to show that g h → gh,to n n n n −1 which end it suffices to establish that (gh) g h ∈ A for all sufficiently large n.However, n n k −1 for n sufficiently large in terms of k we have g g ∈ A , and hence n k+2 −1 −1 n 0 h g g h ∈ A ⊆ A ⊆ A . n n k+1 n k+2 k+2 −1 Furthermore, h h ∈ A for n sufficiently large, and so n k+1 −1 −1 −1 −1 2 (gh) g h = h h h g g h ∈ A ∈ A , n n n n n k n k+1 as required. The statement about the inverse map is easier. Suppose that g → g.Then −1 g g ∈ A for n sufficiently large, and so n k+1 −1 −1 −1 −1 g g = g g g g ∈ A ⊆ A ∈ A . n n k k+1 k+1 −1 But this means that g → g as n →∞. The previous lemma showed how to get a local topological group given a sequence of balls satisfying a suitable normalisation condition. The normal variant of the Croot- Sisask-Sanders lemma, Theorem 5.6, allows us to find precisely such a sequence of balls 152 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO given any K-approximate group A. Of course, these balls are just finite sets and, for sufficiently large i,A may well consist only of the identity element e. This will be the case, for example, when A =[−N, N]. However when transferred to the setting of an ultra approximate group A = A , these balls have “finite index” in A, and this n→α ultimately leads to the important conclusion that the metric d gives A the structure of a locally compact local group. Lemma 6.7. —Let A be an ultra approximate group. Then there is a sequence of ultra approx- 4 A imate groups A , A ,... such that A = A , we have the nesting property that (A ) ⊆ A for 0 1 0 i i+1 i = 0, 1,... ,and each A is large in the sense that A can be covered by finitely many left-translates of A . Proof. — By definition, one has A = A for some K-approximate groups A n n n→α and some fixed K. Applying Theorem 5.6 repeatedly we see that there are, for each n, n,0 O (1)-approximate groups S , i = 1, 2, 3 ... ,suchthat S := A and (S ) ⊆ K,i n,i n,0 n n,i+1 4 4 S for each i. Furthermore we have S ⊆ A and |S | |A | for each i. Setting n,i n,i K,i n n,i n A := S , n,i n→α all of the properties except the assertion about covering are immediate. To check that each A is large, we need only check that S is covered by O (1) left-translates of S , i n,0 K,i n,i for each i. This, however, is an immediate consequence of Lemma 5.1 and the lower bound on |S |. n,i Lemma 6.8. —Let A be an ultra approximate group. Consider a sequence of ultra approximate 32 32 groups A , A ,... as found in the preceding lemma, and let d : A × A →[0, 1] be the pseudometric 0 1 associated to these sets as in Lemma 6.5. Then A is locally compact with respect to the topology generated by d . Proof. — By the Heine-Borel theorem (which is usually stated for metrics, but which 13 32 extends without difficulty to pseudometrics) it suffices to show that A is complete and totally bounded. We deal with the latter task first. From the inclusion A ⊆{x : d(x, id) −k 32 2· 2 } and the left-invariance of d , this follows from the fact that A is covered by finitely many left-translates of A . We turn now to completeness. Suppose that (x ) is a Cauchy sequence. By re- n n∈N fining the sequence if necessary we may assume that it is rapidly Cauchy in the sense that −n−1 d(x , x )  2 . n m We claim that the sets x A are nested in the sense that x A ⊆ x A whenever n n m m n n −1 −n−1 m > n. To see this note that by left-invariance we have d(id, x x )  2 and hence, Indeed, one can deduce the pseudometric case from the metric case by quotienting out by the equivalence relation x ∼ y defined by the equation d(x, y) = 0. THE STRUCTURE OF APPROXIMATE GROUPS 153 −1 2 by the inclusions of Lemma 6.5, x x ∈ A . Since A A ⊆ A ⊆ A , it follows that m n+1 n+1 m n n n+1 −1 x x A ⊆ A , thereby confirming the claim. m m n Now each set x A is an ultraproduct S , by construction. The nesting m m m,n n→α property just established of course implies that, for any positive integer M, x A = m m mM ∅.Let y be an element of this intersection; this means that there is a set  ∈ α such that M M (y ) ∈ S for all n ∈  . By replacing  with  ∩  if necessary, and so on, M n m,n M 2 1 2 mM and using the basic properties of ultrafilters, we may assume that  ⊇  ⊇  ⊇··· . 1 2 3 By removing 2 from  ,3from  and so on, if necessary, we may also assume that no 2 3 integer lies in infinitely many  . Now define a sequence x by setting x = (y ) , where M is the largest integer for n M n which n ∈  . Then, by construction, x ∈ S for all n ∈  ,thatistosay fora M n m,n M mM set of n tending to α. This means that x ∈ x A for every M, and hence x ∈ x A . m m m m mM −1 −m In particular we have x x ∈ A for every m and hence d(x, x )  2 · 2 . It follows that m m x → x, thereby confirming that A is complete with the metric d . Remark 6.9. — The last part of this argument, in which an element is found in the infinite intersection x A given that each finite intersection x A is m m m m m mM nonempty, is an instance of the countable saturation property of the ultraproduct construc- tion. The completeness that is afforded by the countable saturation property is one of the main reasons why we work in the ultraproduct setting. Note that a similar complete- ness also appears in the ultralimit (X, d)/ ∼ of bounded metric spaces (X , d ),where n n X := X , d := st lim d ,and ∼ is the equivalence relation defined by setting n n→α n n→α x ∼ y whenever d(x, y) = 0. Indeed, it is not difficult to use countable saturation to verify that such ultralimits are automatically complete, even if the original spaces X are not. Proof of Proposition 6.1. — We have shown that A has the structure of a locally compact local group with respect to the metric d . To complete the proof of Proposition, we need only quotient by the equivalence relation ∼ on A ,definedby x ∼ y if and only if d(x, y) = 0. The quotient L := A / ∼ is then a metrisable, locally compact, local group 32 4 and there is a natural map π : A → L. We must check that L is a good model for A in the sense of Definition 3.5. Property (i) requires us to show that there is some open neighbourhood U of −1 4 4 the identity in L such that π (U ) ⊆ A and U ⊆ π(A ),orinother wordssomeball 0 0 32 4 {x ∈ A : d(x, id)<ε} lies in A . This again follows from (6.4) and the fact that each of the sets A constructed in Lemma 6.7 lies in A . Finally, property (ii) in the definition of good model requires us to show that π(A ) is compact. This is immediate. To prove property (iii), we first establish the following weaker property: (iii) : for any open neighbourhood U of the identity in L there is some U ⊆ Uand −1   −1 some ultra finite set A = A with π (U ) ⊆ A ⊆ π (U). n→α n 154 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO −k This is quite easily established: suppose that U contains the ball B(id, 2 ).Thenit follows immediately from the inclusions of Lemma 6.5 that we may take A := A and k+1 −k−1 then U := B(id, 2 ). We now upgrade this to property (iii) in the definition of good model. Suppose that F ⊆ U ⊆ U with F compact and U open. Then there is some open neighbourhood of the identity U such that FU ⊆ U. Applying (iii) , we may locate a further open set U ⊆ U −1   −1 and an ultra finite set A such that π (U ) ⊆ A ⊆ π (U ). By compactness there are elements x ,..., x such that F ⊆ x U ; we may assume that these elements lie 1 M m m=1 −1 in F(U ) = FU ⊆ U ⊆ U , and hence each is of the form x = π(a ) with a ∈ A. To 0 i i i conclude the proof of property (iii) simply take A := a A . This completes the proof m=1 of Proposition 6.1. To complete the proof of Theorem 3.10, we invoke results about Hilbert’s fifth problem, and specifically the structural theorem of Goldbring [23] describing locally compact local groups, which we state as Theorem B.18 in Appendix B. 32 32 Proof of Proposition 6.2. — Suppose that we have a model π : A → Gfrom A to a locally compact local group G, and let U be the open neighbourhood of the identity −1 4 featuring in the definition of good model (Definition 3.5), thus π (U ) ⊂ A and U ⊂ 0 0 π(A ).ByTheorem B.18, there are symmetric neighbourhoods U ⊆ U ⊆ U ⊆ Gwith 2 1 0 U ⊆ U (say) and a compact normal subgroup H of U such that U /H is isomorphic 1 2 1 to a local Lie group L. Let φ : U → U /H be the projection map. 1 1 By property (iii) of Definition 3.5 (applied to π : A → G) there is a symmetric ultra 4 −1 2 −1 3 ˜ ˜ finite set A ⊆ A with π (U ) ⊆ A ⊆ π (U ). Certainly, the map π˜ := φ ◦ π is well- 2 2 8 −1 3 4 −1 4 defined and gives a homomorphism from A to L; since π (U ) ⊂ π (U ) ⊂ A ,we 2 2 4 4 ˜ ˜ have A ⊆ A ,and by Remark 3.7, A is an ultra approximate group. We verify that this is a good model by checking (i), (ii) and (iii) of Definition 3.5 in turn. For (i), first note ˜ ˜ that π( ˜ A) contains U := φ(U ) = U H/H ⊆ L, which is an open neighbourhood of the 0 2 2 identity in L since U H ⊆ Gis open. Furthermore we have −1 −1 −1 −1 −1 2 ˜ ˜ π˜ (U ) = π φ φ(U ) ⊆ π (U H) ⊆ π U ⊆ A. 0 2 2 Turning to (ii), π( ˜ A) is contained in the compact set φ(U ). Finally, we check the “approximation by internal sets” property, which is (iii) in Def- −1 ˜ ˜ ˜ ˜ ˜ ˜ ˜ inition 3.5. Suppose that F ⊆ U ⊆ U ,with Fcompact and Uopen. Then φ (F) = FH −1 ˜ ˜ is compact, whilst φ (U) = UH is open. The approximation by internal sets property then follows from that fact that π : A → G is a good model. Finally, we check that A is a large ultra approximate group. To see this note that π( ˜ A ) is contained in a compact subset of L; therefore there are finitely many elements THE STRUCTURE OF APPROXIMATE GROUPS 155 ˜ ˜ x ,..., x such that π( ˜ A ) ⊆ π( ˜ x )U . It follows that 1 k k 0 i=1 k k 2 −1 ˜ ˜ A ⊆ x π˜ U ⊆ x A, k 0 k i=1 i=1 thereby confirming that A is an ultra approximate group. By essentially the same argu- ˜ ˜ ment, A may be covered by finitely many translates of A;thus A is indeed large. We now record some analogues of the above results in the setting of global ultra approximate groups (i.e. ultraproducts of global K-approximate groups for some fixed K), which are closer to the results of Hrushovski [33]. Define a global model π : A → G 8 8 to be the same notion as a good model π : A → G from Definition 3.5,exceptthat A is replaced by the whole group A generated by A, and G is now required to be a global groupratherthanalocalgroup. Proposition 6.10 (Global locally compact model). — Let A be a global ultra approximate group. Then A admits a global model π : A → G by a metrisable locally compact global group G. Proof. — This is obtained by a modification of the proof of Proposition 6.1.The 2 A one main change is that the nesting condition (A ) ⊆ A appearing in Lemma 6.7 i+1 100(i+1) 2 A needs to be strengthened to (A ) ⊆ A , but this is easily accomplished. i+1 Proposition 6.11 (From locally compact models to Lie models). — Let A be a global ultra approximate group and suppose that A admits a global model π : A → G into a locally compact global group G. Then there is a large ultra approximate group A of A which admits a global model π˜ : A → L into a connected Lie group L. Proof. — This is obtained by a modification of the proof of Proposition 6.2.The one main change is that one needs to replace Theorem B.18 with Theorem B.17. Note that in contrast to Proposition 6.2 that we do not assert that the global Lie group L is simply connected (as this is not provided by the global Gleason-Yamabe the- orem (Theorem B.17), which only promises connectedness). And indeed, in general we do not have simple connectedness of the model. For instance, if A ={−N,..., N}⊂ Z/100NZ for some unbounded nonstandard natural number N, then the obvious global model here is the map π : Z/100NZ → R/Z defined by π(x) = st( ) mod 1, and of 100N course the unit circle R/Z is not simply connected. On the other hand, A = Z/100NZ is globally modeled by the trivial group; and so one can still recover simple connected- ness by passing from A to a suitably large power. See [33, Remark 4.11] for some further discussion of this point, as well as Theorem 10.10 below. Combining Proposition 6.10 and Proposition 6.11 we obtain the following result, originally due to Hrushovski [33]. 156 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO Proposition 6.12 (Weak global Lie model theorem). — Suppose that A is a global ultra ap- proximate group. Then there is a large ultra approximate group A of A which admits a global model π˜ : A → L into a connected Lie group L. We will strengthen this proposition in Theorem 10.10 below. Remark 6.13. —Let π : A → L be a good model for an ultra approximate group A = A by a locally compact local group L, and let U be the neighbourhood in n 0 n→α Definition 3.5.Let U be a symmetric neighbourhood of the identity such that U ⊂ U . 1 0 For any continuous function f : L → R with compact support in U ,wecan definea functional I(f ) by the formula F (a) a∈A I(f ) = inf st |A| + + + where F = lim F is the ultralimit of functions F : A → R, with the nonstandard n→α n n n real F and nonstandard natural number |A| defined in the usual fashion as a∈A + + F (a) := lim F (a ) n→α a∈A a ∈A n n and |A|:= lim |A |, n→α + + and the infimum is over all F for which F (a)  f (π(a)) for all a ∈ A. Using Definition 3.5(iii) it is not difficult to also obtain the equivalent formula F (a) a∈A I(f ) = sup st a∈A − − where the supremum is over all F for which F (a)  f (π(a)) for all a ∈ A. From these two definitions we see that I(f ) is both super-linear and sub-linear, and is thus a contin- uous linear functional on the space C (U ) of continuous compactly supported functions c 1 in U . By the Riesz representation theorem, there thus exists a Radon measure μ on U 1 1 such that I(f ) = fdμ for all f ∈ C (U ). From the translation invariant properties of c 1 I(f ), we see that μ(gE) = μ(E) for any measurable subset E of U ,and any g ∈ Lsuch that gEare defined in U ,and similarlyfor gE replaced by Eg.Thus μ is a bi-invariant −1 Haar measure on U ; since A can be covered by finitely many left-translates of π (F) for any compact neighbourhood F of the identity, we see that μ is non-trivial (which im- plies in particular by bi-invariance that the locally compact local group L is unimodular). This Haar measure can then be used to estimate the (nonstandard) cardinality of various THE STRUCTURE OF APPROXIMATE GROUPS 157 nonstandard finite sets that are “close” to A in some sense. Indeed, from the definitions (and the regular nature of Radon measures) we see that μ(F)|A|  A  μ(U)|A| whenever F ⊆ U ⊆ U , F is compact, U is open, and A is a nonstandard set with −1  −1 π (F) ⊂ A ⊆ π (U). We will not use this measure μ in this paper, but see [33] for some further discussion of this measure and its relationship to Kiesler measures from model theory. One can also use μ to relate the volume growth of A to the volume growth of the model group L, giving some rigorous substance to some of the volume growth heuristics invoked in the examples in Section 3, but we will not formalise this relationship here. Remark 6.14. — As remarked in [33], the Lie Model theorem is not only valid in the context of nonstandard finite ultra approximate groups, i.e. the ultraproduct of finite K-approximate groups for a fixed K, but also for “continuous” ultra approximate groups, that is to say the ultraproduct of precompact open subsets of a locally compact local group that obey all of the approximate group axioms other than finiteness. See [52] for the basic theory of such continuous approximate groups. Indeed, one can check that the machinery in Section 5 can be adapted to this setting by replacing the cardinality of finite sets with the Haar measure of various precompact open subsets of a locally compact local group, as in [52]. Some other components of this paper, such as the construction of strong approximate groups and Gleason metrics, can also be extended to this setting after some minor notational changes. However, there will be a key place in the argument in Section 9 in which the (nonstandard) finiteness of the ultra approximate groups is used in an absolutely crucial way, namely to locate an element in such a group element of minimal non-zero “escape norm”. As such, the main result of this paper, Theorem 2.10, does not immediately extend to the continuous setting. Indeed, the basic example of a small ball in a Lie group shows that continuous approximate groups need not resemble coset nilprogressions at all. We will not pursue this matter further here. Some finitary consequences of the Lie Model Theorem. — To illustrate the power of the Lie Model Theorem in the analysis of approximate groups, we offer two fairly quick applications. The reader interested in the proof of our main results may skip ahead to the next section. The first application is a special case of our main theorem (Theorem 2.10), follow- ing Hrushovski [33, Corollary 4.18]. Another, much more minor, place where ultra finiteness is used in Remark 6.13 above, as we implicitly used the trivial fact that counting measure is bi-invariant. In general, one can only conclude that the measure associated to a good model is bi-invariant if each of the individual approximate groups in the ultraproduct is also equipped with a finite bi-invariant measure. 158 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO Theorem 6.15 (Hrushovski). — Suppose that G be a group of exponent m and suppose that A ⊆ G is a K-approximate group. Then A contains a genuine subgroup H of G with |H| |A|. K,m In particular, by Lemma 5.1, A is covered by O (1) left-translates of H. K,m Remark 6.16. — When m = 2 the group G must be abelian, and in this case the theorem is due to Imre Ruzsa [45]. Proof. — Suppose for sake of contradiction that the claim failed. Then we may find fixed K, m and a sequence of K-approximate groups A ⊆ G in groups G of expo- n n n nent m,suchthatfor each n,A does not contain a genuine subgroup H of cardinality |H |  |A |/n. As usual we form the ultra approximate group A := A . The ultra- n n n n→α product group G := G also has exponent m, and by Hrushovski’s Lie model theo- n→α 4  8 rem we can find a large approximate group A ⊆ A with a local Lie model π : (A ) → L. By Definition 3.5(i), we may find a neighbourhood U of the identity in L such that −1 π (U ) ⊆ A and U ⊆ π(A ). Using the fact that the exponential map is a homeo- 0 0 morphism near the identity of L, we may then find a neighbourhood U of the identity with U ⊆ U such that U contains no elements of order m other than the identity. If 0 1 −1 m  m m a ∈ π (U ), then we conclude that a is well-defined in A with π(a) = π(a ) = id, −1 and so π(a) is trivial, which means that π (U ) = ker(π ).As π(A ) is precompact, we conclude that A is covered by a finite number of translates of ker(π );as A is large, A is also covered by M such translates for some (standard) finite M. −1 From Definition 3.5(iii), we see that the set π (U ) = ker(π ) is a nonstandard finite set, and so ker(π ) = H for some finite subsets H of G . Since ker(π ) ⊆ n n n n→α A is a group and A is covered by M translates of ker(π ), we see from Łos’s theorem (Theorem A.6)thatfor all n sufficiently close to α,H ⊆ A is a group and A is covered n n by M translates of H . However if one takes n larger than M then this contradicts the construction of A , and the claim follows. Remark 6.17. — The astute reader will notice that the only properties of the local Lie group L that were really used in the above argument were that L was locally compact and had the NSS (no small subgroups property). Thus, one could prove Theorem 6.15 using a weaker form of the Gleason-Yamabe theorem (Theorem B.17), in which the model group is merely locally compact NSS rather than Lie. (The machinery of Hilbert’s fifth problem implies that these two concepts coincide, but this is considerably deeper.) However, we do not know of a proof of Theorem 6.15 that avoids the machinery of Hilbert’s fifth problem completely, and in particular some variant of the Gleason lemmas is required. Next, we prove (a slight variant of) the main theorem from Hrushovski’s paper [33, Theorem 1.1], which uses the Lie structure (via the Baker-Campbell-Hausdorff formula) more thoroughly than the preceding application. THE STRUCTURE OF APPROXIMATE GROUPS 159 Theorem 6.18 (Hrushovski’s structure theorem). — Let A be a K-approximate group, and let F : N × N → N be a function. Then there exist natural numbers L, M, N with N  F(L, M) and L, M  1, and nested sets K,F {id}⊂ A ⊆ ··· ⊆ A ⊆ A N 1 with the following properties: (i) For each 1  n  N, A is symmetric; (ii) For each 1  n < N, A ⊆ A ; n+1 (iii) For each 1  n  N, A is contained in M left-translates of A ; n n+1 (iv) For 1  n, m, k  N with k < n + m, the set [A , A ]:={[g, h]: g ∈ A , h ∈ A } is n m n m contained in A ; (v) A can be covered by L left-translates of A . Proof. — Suppose this is not the case. Carefully negating all the quantifiers, we (n) conclude that there exist K, F and a sequence A of K-approximate groups, such that for (n) (n) each n and each L, M  n, there does not exist N  F(L, M) and A ,..., A obeying 1 N the conclusions of the theorem. (n) As usual, we form the ultraproduct A := A , which is an ultra approx- n→α imate group. By Theorem 3.10, we may find a large ultra approximate subgroup (n) 8 ˜ ˜ ˜ A = A which has a good model φ : A → Lby a local Lie group. n→α Let l be the Lie algebra of L, and fix an open bounded convex symmetric body B in L.Let ε> 0 be a sufficiently small (standard) real number depending on B, L to be chosen later; in particular we may assume that the exponential map is a homeomorphism from εBtoexp(εB),and that exp(εB) is contained in the neighbourhood U appearing in Definition 3.5. For each standard natural number n  1, we apply Definition 3.5 and Remark 3.7 to find an ultra approximate group A with −1 −n −1 −n π exp 10 εB ⊆ A ⊆ π exp 2 × 10 εB ; In particular we have the nesting ··· ⊆ A ⊆ A ⊆ A . 2 1 From the Baker-Campbell-Hausdorff formula, we have −n−1 −n exp 2 × 10 εB ⊆ exp 10 εB if ε is small enough, and thus A ⊆ A . In a similar spirit, we can find an M depending n+1 −n only on the dimension of L or l such that each ball 10 εB is covered by at most M trans- −n−1 lates of 4 × 10 εB, which by the Baker-Campbell-Hausdorff formula again implies, for small enough ε,thateach A is covered by at most M left-translates of A . Finally, n n+1 another application of the Baker-Campbell-Hausdorff formula reveals that −n −m −k exp 2 × 10 εB , exp 2 × 10 εB ⊆ exp 10 εB 160 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO whenever k < n + m, and hence [A , A ]⊆ A . n m k Finally, since one can cover π(A) by a finite number of translates of exp(εB),we see that A can be covered by at most L left-translates of A for some standard L ∈ N. (n) (n) Now set A = A for some finite sets A , and set N := F(L, M). Applying n→∞ n Łos’s theorem (Theorem A.6) repeatedly (but only finitely many times), we see that for n (n) (n) (n) sufficiently close to α the sets A ,..., A , A obey all the properties in the conclusion 1 N (n) of the theorem. This contradicts the construction of the A for n larger than L, M, and the claim follows. Remark 6.19. — One can also use the Lie Model Theorem to establish a stronger statement than Theorem 6.18, which roughly speaking asserts that given a (finite) K- approximate group A, one can find a large sub-approximate group A which has an ap- proximate homomorphism π : (A ) → L into a local Lie group L with bounded range that obeys an approximate version of Property (i) in Definition 3.5, where the accuracy of these approximations exceeds the “complexity” of the model by any given function F. The precise formulation of this statement, which is in fact a logically equivalent “finiti- sation” of Theorem 3.10, is somewhat complicated. We will not need it elsewhere in the paper, and so we leave it as an exercise to the reader. 7. Strong approximate groups We now give a combinatorial consequence of the Lie Model Theorem (Theo- rem 3.10) which will be important later, involving a concept which we will call a strong approximate group. Definition 7.1 (Strong Approximate Group). — Let A be a K-approximate group for some K  1. We say that A is a strong K-approximate group if it admits a symmetric subset S such that 1000K (7.1) S ⊆ A and for which the following two trapping conditions are satisfied: 2 3 1000 100 (i) (First trapping condition) If g, g , g ,..., g ∈ A then g ∈ A; 6 3 2 10 K (ii) (Second trapping condition) If g, g ,..., g ∈ A then g ∈ S. An ultra strong approximate group is an ultraproduct A = A of strong K- n→∞ approximate groups A ,for some K  1 independent of n. The complexity, which we do not define here, would be some quantity taking account of the dimension and structure constants on the Lie algebra l of L, the diameter of the range of π and the inradius of the neighbourhood U appearing in Definition 3.5(i). THE STRUCTURE OF APPROXIMATE GROUPS 161 At present this definition will seem somewhat unmotivated, although it can be demystified to some extent by remarking that these properties suggest that S and A are behaving like very small neighbourhoods of the identity in a Lie group L, with S much smaller than A. This point should become clearer shortly. The reader should not pay too 3 6 3 much attention to exponents such as 1000K or 10 K in the definition; they are chosen for the sake of concreteness. The main reason for introducing this concept is that we will be able to show, in the next section, that the escape norm g (defined in Definition 4.3) for an ultra strong e,A approximate group A has the pleasant properties outlined in Section 4. There is scope for varying the parameters in the definition of strong approximate group, but the ones we have given here are strong enough to prove the desired properties of the escape norm. It is easy to give examples of strong approximate groups. For instance, if A = {−N,..., N} (and K = 3) then we may take S ={−N ,..., N } with N ∼ N/1000K . If A is a subgroup, then we may simply take S = A. On the other hand, if one randomly 0.01 removes a small number (e.g. N ) of elements symmetrically from {−N,..., N},the resulting set is likely to remain a O(1)-approximate subgroup, but not a strong O(1)- approximate subgroup. The main result of this section implies the following. Proposition 7.2 (Finding a ultra strong approximate group). — Let A be an ultra approximate group. Then there is a large ultra approximate subgroup A of A which is a strong ultra approximate group. For use in Section 9 we will require the following somewhat more precise result. Proposition 7.3 (Balls are ultra strong approximate groups). — Let A be an ultra approximate group with a good model π : A → L to a local Lie group L.Let B be an open bounded convex symmetric subset of the Lie algebra l of L. Then there exists a standard radius r > 0 such that for all 0 < r < r , 0 0 any symmetric nonstandard finite set A with −1 −1 (7.2) π exp(rB) ⊆ A ⊆ π exp(2rB) is a large strong ultra approximate subgroup of A. It is clear that Proposition 7.2 follows from Proposition 7.3,Theorem 3.10,and Definition 3.5(iii); we will, however, only need Proposition 7.3 in the sequel. We now prove Proposition 7.3.Let r > 0 be a sufficiently small quantity depending on A,π, L, B to be chosen later; in particular, we take r so small so that the exponential map is a homeomorphism from 2r Btoexp(2r B),and exp(2r B) is contained inside 0 0 0 the open neighbourhood U of L from Definition 3.5. 0 0 162 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO ˜ ˜ Let A be as in the proposition. In particular A ⊂ A.ByRemark 3.7 we may take A to be a ultra K-approximate group for some K, and therefore an ultra approx- imate subgroup of A. Since π(A) is precompact, it may be covered by finitely many left-translates of exp(rB),and so A can be covered by finitely many left-translates of A. Thus A is a large ultra approximate subgroup of A. It remains to establish that A is a strong ultra approximate subgroup. Suppose that 100 1000 100 g ∈ A is such that g,..., g ∈ A . Applying π , we see that 1000 100 π(g), ...,π(g) ∈ exp(2rB) . Working in exponential coordinates and using the Baker-Campbell-Hausdorff formula we conclude, if r is small enough, that π(g) ∈ exp(rB) and thus g ∈ A.Wehavethus shown the first trapping condition for A. Next, we use Definition 3.5 to find a symmetric nonstandard finite set S with −5 −3 −4 −3 exp 10 K rB ⊆ S ⊆ exp 10 K rB . From the Baker-Campbell-Hausdorff formula we see that 4 3 exp(2rB) 1000K −4 −3 exp 10 K rB ⊆ exp(rB) and thus ˜ 1000K (7.3) (S) ⊆ A. 6 3 10 K ˜ ˜ Finally, suppose that g ∈ A is such that g,..., g ∈ A. Applying π , we conclude that 6 3 10 K π(g), ...,π(g) ∈ exp(2rB). −5 −3 Working in exponential coordinates, we conclude that π(g) ∈ exp(10 K rB) and hence g ∈ S. Thus we have verified the second trapping condition for A. Finally, we need to push the trapping conditions from the ultraproduct A back ˜ ˜ to the finitary setting. Write A = A , A = A and S = S for some n n n n→α n→α n→α finite sets A , A , S . By Łos’s theorem (Theorem A.6), we see that for n sufficiently close n n n 4 4 2 ˜ ˜ ˜ to α, A is symmetric and contains the identity with A ⊂ A ,with A covered by K n n n left-translates of A ,that 4 1000K (S ) ⊆ A , n n ˜ ˜ and that the first and second trapping properties hold for A and S .Thuswesee that A n n n is a strong K-approximate group for n sufficiently close to α. After redefining A suitably for all other values of n, we conclude that A is an ultra strong approximate group as required. This concludes the proof of Proposition 7.3 and hence Proposition 7.2. Indeed, by using the Baker-Campbell-Hausdorff formula one can take K to depend only on the dimension of L, if r is small enough, but we will not need this fact here. 0 THE STRUCTURE OF APPROXIMATE GROUPS 163 Remark 7.4. — This proposition represents by far the most serious use of Hrushovski’s Lie Model Theorem in our paper. Although we use that theorem else- where in the paper, it is only for this proposition that we do not currently have a plausible alternative approach. 8. The escape norm and a Gleason type theorem In this section we prove a variant of “Gleason’s lemmas” in the setting of approx- imate groups. These show that if A is a strong approximate group then the escape norm has pleasant properties with respect to product, conjugation and commutation. The role of these lemmas was briefly discussed in Section 4. Here is a precise statement. Theorem 8.1 (Gleason-type theorem). — Suppose that A is a strong K-approximate group. Consider the escape norm g := inf : n ∈ N; g ∈ A for all 0  i  n , e,A n + 1 with the convention that g = 1 when g is undefined. This has the following properties: e,A 10 h (i) (Conjugation) If g, h ∈ A then g   1000g ; e,A e,A O(1) (ii) (Product) We have g ... g   K (g  +··· +g  ) if 1 n e,A 1 e,A n e,A g ,..., g ∈ A ; 1 n 10 O(1) (iii) (Commutators) If g, h ∈ A then we have [g, h]  K g h . e,A e,A e,A Note that, as a consequence of (i) and (ii), the set of g ∈ Awith g = 0is a e,A subgroup normalised by A . Remark 8.2. — Note that this lemma is trivial when the ambient local group is abelian. For that reason, this section can be ignored by those readers interested in seeing our alternative proof of the (abelian) Freiman’s theorem. Motivation and heuristic discussion. — We will shortly give a self-contained proof of Theorem 8.1, but as motivation we first offer some comments and discussion of the con- text in which these ideas were first invented: the solution of Hilbert’s fifth problem by Gleason, Montgomery-Zippin and Yamabe [20, 21, 39, 40, 60, 61] (see also [23]for the local group analogue of these lemmas). In that context, the Gleason lemmas show the existence, in an arbitrary locally compact group G, of arbitrarily small compact neighborhoods A of the identity whose associated escape norm satisfies properties (i) to (iii) as above. The Gleason lemmas lie at the heart of Hilbert’s fifth problem and are used at several places in its proof, both in 164 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO the reduction step from general locally compact groups to NSS (No Small Subgroups) groups, and in order to deal with NSS groups. For example, if G is NSS, the Gleason lemmas are needed in order to es- tablish that the set of one-parameter subgroups of G forms a vector space. If X(t) and Y(t) are two one-parameter subgroups, then a natural candidate for X + Yis lim (X(t/n)Y(t/n)) . In order to show that such a limit does exist, the bound (ii) n→+∞ on the escape norm of a product is precisely what is needed. For the full story, the reader may wish to consult the classical references [34, 40], themorerecentnon-standardtreat- ments of the Gleason lemmas by Hirschfeld [32] and by Goldbring and van den Dries [57], or the blog posts of the third author. To give a flavour of how the Gleason lemmas are proven, let us discuss a simple case of the product estimate, namely (8.1) uv  C u +v . e,A e,A e,A Here, A is a ball B(id, 1) about the identity in a locally compact group G with the NSS property, where the ball is with respect to some left-invariant distance d,and C issome finite quantity depending on A. In the discussion below we will make use of the following points concerning this situation: (i) We may construct a distance d with the additional property that d(id, x ) Cd(id, x) for g, x ∈ B(id, 2) (for example by the Birkhoff-Kakutani construction [40, §1.22]). (ii) The balls in G enjoy an escape property quite similar to that in the definition of a strong approximate group. More precisely, given ε> 0 there is an M ∈ N 2 M such that if g, g ,..., g ∈ B(id, 1) then g ∈ B(id,ε). The proof of this is by contradiction—taking a limit of putative “bad” gs, one can contradict the NSS property. The key idea behind the proof of the product estimate (8.1) is to relate the escape −1 norm g to the auxillary quantity ∂  ,where ∂ (x) = (g x) − (x) and  is e,A g ∞ g a non-negative “bump” function supported on B(id, 1), let us say with  = (id) = 1. As noted in Lemma 6.3, such a “norm” automatically satisfies the product inequality (with C = 1), and so we need only show that g ∼∂  in a suitable sense, and for e,A g ∞ asuitable  . In one direction, it is easy to link the two quantities. Indeed if ∂   δ for g ∞ some δ> 0, then a simple telescoping sum argument confirms that (g )> 0, and hence g ∈ A whenever i < 1/δ. Therefore (8.2) g  ∂  . e,A g ∞ http://terrytao.wordpress.com/tag/hilberts-fifth-problem/. THE STRUCTURE OF APPROXIMATE GROUPS 165 2 n Suppose, conversely, that we know that g, g ,..., g ∈ A = B(id, 1). Then certainly, 2 n by the escape property, we have g, g ,..., g ∈ B(id,ε) for some n n.Now if G were aLie group, andif  were smooth with bounded derivatives, we would have (8.3) ∂  ≈ n ∂ , the approximation being better as ε gets smaller. This immediately gives the bound ∂   1/n, and thus we have linked the escape norm and the auxiliary norm g ∞ ε ∂  in both directions. g ∞ Now unfortunately (8.3) is only an approximate identity and, more seriously, G is not known to be a Lie group. In fact, as noted above, these Gleason lemmas are required to prove statements of that form. On a more positive note, observe that we only need to bound ∂  above in terms of g when g = u or g = v, and not for all g.Weare at g ∞ e,A liberty to design the auxillary function  with this in mind. Now the exact version of (8.3) is basically Taylor’s formula, and it reads n−1 (8.4) ∂ n  = n∂  + ∂ ∂ i . g g g g i=0 (We replace n by n for ease of notation.) This makes it desirable to bound the second derivatives ∂ ∂ i  . At this point another key idea enters: it is possible to get good con- g g trol on these second derivatives when  = φ ∗ ψ is the convolution of two “Lipschitz” functions, that is −1 (x) = φ ∗ ψ(x) = φ xz ψ(z) dz, the integral being with respect to Haar measure on G. This is because of the formula −1 (8.5) ∂ ∂ (φ ∗ ψ) = ∂ φ(z)∂ ψ z x dz. g h g h To make this useful, φ is chosen to be somewhat Lipschitz with respect to shifts by g = u and g = v,and ψ is chosen to be Lipschitz with respect to the distance d.Weomit the details. Rigorous argument. — We turn now to the details of such a strategy in the discrete setting, that is to say a rigorous proof of Theorem 8.1. Proof of Theorem 8.1. — To simplify the notation, we will abbreviate  in this e,A proof as  . e 166 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO We start with (i), which is a relatively easy consequence of the first trapping prop- erty in the definition of strong approximate group (Definition 7.1). Indeed suppose that 2 n h h 2 h n h 12 g, g ,..., g ∈ Afor some n; then certainly g ,(g ) ,...,(g ) ∈ A ⊆ A .Bythe first h h 2 h n trapping property this implies that g ,(g ) ,...,(g ) ∈ Afor any n  n/1000, and this confirms (i). The proof of (ii) is significantly trickier and is based on the construction of Glea- son that was briefly discussed earlier. In order to facilitate a certain technical “bootstrap argument”, it will be convenient to temporarily replace the escape norm g by the (ε) regularised version g := g + ε,where ε> 0 is a small quantity. We shall obtain estimates uniform in ε,and then let ε → 0. It is natural to introduce the norm-like quantity (ε) (ε) d (g) := inf g  : g = g ... g , n  1 . i 1 n i=1 It is clear that (ε) (ε) (8.6) d (g)  g . We shall prove an estimate in the opposite direction, namely O(1) (ε) (8.7) g  K d (g). The exponent O(1) will be independent of ε. This implies that, for each positive integer n and all g ,..., g , 1 n O(1) (ε) (ε) g ... g   K g  +··· +g  . 1 n e 1 n e e Letting ε → 0, we recover the product estimate (ii). (ε) In order to establish this we shall, as in Gleason’s argument, relate g and d (g) to an auxillary quantity ∂  ,where  : A →[0, ∞) is a certain “smooth” function g ∞ supported on A .Wewillspecify  shortly; as in Gleason’s argument it will be con- structed as a convolution of two functions φ and ψ . The former is taken to be a kind of (ε) smoothed version of 1 defined using the metric d and Lipschitz for this metric, and the latter constructed using the set S appearing in the definition of strong approximate group (Definition 7.1) and Lipschitz with respect to the word metric on S. One link between these quantities is relatively easy to establish for any function with (id)  1. Indeed suppose that ∂  = δ for some g ∈ A . Then certainly g ∞ i i+1 i 100 |(g ) − (g )|  δ for all i with g ∈ A , which implies by an easy telescoping sum i i argument that (g )  1 − δi for all i. In particular g lies in the support of  , and hence 4 i 100 in A ,for i < 1/δ; note that the hypothesis g ∈ A can be removed by induction. By the first trapping condition in Definition 7.1 this implies that g ∈ Afor i < 1/1000δ,and hence g  1000δ.Thus (8.8) g  1000∂ e g ∞ THE STRUCTURE OF APPROXIMATE GROUPS 167 whenever g ∈ A . To establish (8.7) and hence the product estimate (ii) it therefore suffices to prove a bound O(1) (ε) (8.9) ∂   K d (g) g ∞ 100 100 in the opposite direction for all g ∈ A (the claim for g ∈ A being an easy conse- quence). This argument will depend crucially on the specific form of  . The following two lemmas describe the construction of the functions φ and ψ . (ε) 1000 Lemma 8.3 (Properties of φ). — There is a function φ : A →[0, 1] such that (ε) (i) φ (x) = 1 for x ∈ A; (ε) 2 (ii) φ (x) = 0 if x ∈ / A ; (iii) (Lipschitz bound) For all g ∈ A , one has (ε) # # d (g) (ε) # # ∂ φ  . (ε) c d (id, A ) (ε) (ε) −1 c Here d (y, B) := inf{d (b y) : b ∈ B},and A is the complement of A in G. Proof.—Define (ε) d (x, A) (ε) φ (x) := 1 − . (ε) c d (id, A ) (ε) c Note that this is well-defined since d (id, A ) = 0; this would be an issue without the fudge factor of ε that we have introduced. (ε) (ε) (ε) −1 (ε) Obviously φ (x) = 1for x ∈ A. If φ (x) = 0then d (id, x A) = d (x, A)< (ε) c −1 c 2 d (id, A ),and so x A contains a point outside of A . This implies that x ∈ A . The Lipschitz bound is easily established. Lemma 8.4 (Properties of ψ ). — There is a function ψ : A →[0, 1] such that (i) ψ(x) = 1 for x ∈ A; (ii) ψ(x) = 0 if x ∈ / A ; 4 3 4 (iii) (Lipschitz bound) ∂ ψ  1/10 K for h ∈ S and y ∈ A . h ∞ Proof.—Let Q := S ; recall from the definition of strong approximate group N 4 3 N that Q ⊆ A, where N := 10 K .Define ψ(g) = 0if g ∈ / Q A, ψ(g) = 1if g ∈ Aand i+1 i ψ(g) = 1 − i/Nif g ∈ Q A \ Q Afor i = 0, 1,..., N − 1. The claimed properties of ψ are easily checked.  168 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO We now define  to be the convolution (ε) −1 (x) := φ (y)ψ y x |A| y∈A 100 100 for all x ∈ A , with the convention that (x) = 0for x outside A . We note that 1 1 (ε) −1 (ε) −1 (id) = φ (x)ψ x  φ (x)ψ x = 1, |A| |A| x x∈A a property required in the proof of (8.8). Note also that since φ and ψ are both at most 1 pointwise and are supported on A we have, for all x such that (x) = 0, 1 1 |A | (ε) −1 (ε) −1 3 (x) = φ (y)ψ y x = φ (y)ψ y x   K , |A| |A| |A| y∈A that is to say (8.10)   K . 100 (ε) c Let g ∈ A . Now since id ∈ A we have the crude bound d (id, A )  ε. It follows (ε) (ε) from Lemma 8.3 that ∂ φ   d (g)/ε. From the identity g ∞ (ε) −1 ∂ (x) = ∂ φ (y)ψ y x , g g |A| y∈A we have that 1 K −1 (ε) ∂ (x)  ∂ φ ψ y x  d (g). g g ∞ |A| ε y∈A This immediately yields the crude bound (ε) (8.11) ∂   d (g) g ∞ in the direction of (8.9), the statement we are trying to prove. Denote by P(X) the bound (8.12) ∂   Xd(g) g ∞ 100 3 O(1) for all g ∈ A . We have just demonstrated P(K /ε), and we wish to prove P(K ), which is (8.9). To this end we will implement a bootstrapping argument, showing that THE STRUCTURE OF APPROXIMATE GROUPS 169 P(X) implies a stronger version of itself, namely P(X ) with some X < X, under appro- priate conditions. The hypothesis P(X) (cf. (8.12)) implies an improved Lipschitz bound on φ.To (ε) see this note that if d (g)< 1/1000X then from assumption P(X) we have ∂  < g ∞ 1/1000 and hence, from (8.8), that g < 1. By definition of the escape norm this implies (ε) c that g ∈ A. Phrased in the contrapositive, it follows that d (id, A )  1/1000X, and therefore the Lipschitz bound in Lemma 8.3 implies that (ε) (8.13) ∂ φ  1000Xd (g). g ∞ The bootstrapping argument hinges on the Taylor expansion identity n−1 ∂ n  = n∂  + ∂ i ∂ , g g g g i=0 n 200 valid whenever g,..., g ∈ A (say). This identity implies, using the triangle inequality and (8.10), that n−1 n−1 1 1 2K 1 n i i (8.14) ∂   ∂  + ∂ ∂   + ∂ ∂  . g ∞ g ∞ g g ∞ g g ∞ n n n n i=0 i=0 To use this, we need to focus attention on the first and second derivatives of .To bound the first derivative we use the identity −1 ∂ (x) = φ(y)∂ ψ y x , h h |A| valid for h ∈ A . Since φ  1, this and the Lipschitz bound on ψ given in Lemma 8.4 imply that 4 2 (8.15) ∂   ∂ ψ  1/10 K h ∞ h ∞ |A| y∈A if h ∈ S. We turn to the second derivative ∂ ∂  for g ∈ Aand h ∈ S. Here we use the h g identity −1 ∂ ∂ (x) = (∂ φ)(y)∂ ψ y x . h g g h |A| y 170 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO Recalling that φ, ψ are supported on A and using the Lipschitz bound (8.13)on φ together with the Lipschitz bound on ψ giveninLemma 8.4, we obtain the bound 1 1 (ε) (8.16) ∂ ∂   ∂ φ ∂ ψ  Xd (g) h g ∞ g ∞ h ∞ |A| 10 y∈A if g ∈ Aand h ∈ S. 2 n These bounds are useful in (8.14)providedthat n is such that g, g ,..., g ∈ S. However, the second trapping property in the definition of strong approximate group ensures that this is so for a reasonably large value of n, indeed for n as large as . 6 3 10 K g Taking n this large and substituting into (8.14) yields 7 6 (ε)  (ε) ∂   10 K g + Xd (g)  X g , g ∞ e 7 6 1 where X = 10 K + Xand g ∈ S. The claim also trivially holds when g ∈ S. It is easy to improve this to the stronger statement P(X ) using the triangle inequal- ity ∂   ∂  +∂  , already observed in (6.2). Indeed for every η> 0there gh ∞ g ∞ h ∞ are, by the definition of d , g ,... g such that g = g ... g and 1 n 1 n (ε) (ε) (ε) d (g)> g  +··· +g  − η. 1 n e e Therefore (ε) (ε) ∂   ∂  + ··· + ∂   X g  +··· +g g ∞ g ∞ g ∞ 1 n 1 n e e (ε) X η + d (g) . Since η was arbitrary, we do indeed obtain the bound ∂   X d(g), which is g ∞ the statement P(X ). By repeating this deduction of P(X ) from P(X) many times, we see that the crude 3 9 6 bound P(K /ε), established in (8.11), eventually implies P(10 K ), and hence (8.9). By earlier remarks, this concludes the proof of (ii), the inequality for products. Finally, we turn to the commutator bound (iii). Now that we have the product inequality (ii), we may define a function φ obeying the properties in Lemma 8.3 but using (ε) g instead of the fudged quantity g =g + ε,thatistosay with e e d(g) := inf g  ; g = g ... g , n  1 . i e 1 n i=1 c −O(1) This is because (ii) implies the lower bound d(id, A )  K , and in particular d(id, A ) = 0. Moreover we have the Lipschitz bound O(1) (8.17) ∂ φ  K d(g). g ∞ THE STRUCTURE OF APPROXIMATE GROUPS 171 We will use this function φ in establishing (iii), the bound for commutators. Once again we consider an auxillary function , defined now to be the convolution −1 (x) := φ(y)φ y x |A| y∈A again with the convention that  vanishes outside of A .Weobserve theidentity ∂ ∂  − ∂ ∂  =−T ∂ , g h h g hg [g,h] 10 −1 −1 for g, h ∈ A ,where T denotes the shift defined by T f (x) := f (g x) if g x is well- g g defined, and 0 otherwise. It follows that ∂   ∂ ∂  +∂ ∂  . [g,h] ∞ h g ∞ g h ∞ By the first bound in (8.16) (which holds equally well for this )wehave ∂   ∂ φ ∂ φ . [g,h] ∞ g ∞ h ∞ |A| y∈A From (8.17)weobtain # # O(1) y O(1) y # # ∂   K d(g) sup d h  K g sup h . [g,h] ∞ e 4 4 y∈A y∈A By part (i) , this implies O(1) ∂   K g h . [g,h] ∞ e e To conclude, we note that (8.8) holds for this new auxillary function  as well, since the only fact we used in establishing that other than trapping properties of A was the lower bound (id)  1. This, at last, concludes the proof of Theorem 8.1. To conclude this section we assemble the main results of it and the previous section in a portable form. The following is the only result we shall need from Section 7 and the present section going forward to the next (and final) part of the paper. Proposition 8.5. — Suppose that A is an ultra approximate group and that π : A → L is a good model for A into a connected Lie group L with Lie algebra l.Let B be an arbitrary compact convex neighbourhood of 0 in l. Then, for sufficiently small r, r with 2r > r > r > 0, we may find a large strong ultra approximate subgroup A of A such that −1  −1 (i) π (exp(rB)) ⊂ A ⊂ π (exp(r B)); (ii) (A ) is well defined; 172 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO (iii) The escape norm g  satisfies e,A 10 −1 (a) (Conjugation) If g, h ∈ (A ) then h gh = O(g ); e e (b) (Product) If n is a nonstandard natural number and g ,..., g ∈ (A ) is a non- 1 n standard finite sequence of elements of (A ) (i.e. an ultraproduct of standard finite sequences, see Section A) then g ... g  = O( g  ); 1 n e i e i=1 (c) (Commutators) If g, h ∈ (A ) then we have [g, h] = O(g h ). e e e (iv) The set H := {g ∈ A ;g = 0} is a global internal subgroup, that is to say it is of the form H = H ,where H ⊂ A contains id and is stable under multiplication and n n n n→α inverse, which is contained in A and is normalised by A . Proof. — The existence of A satisfying (i) and (ii) follows from part (iii) of Definition 3.5.If r, r are small enough then Proposition 7.3 ensures that A is a ultra strong approx- imate group in the sense of Definition 7.1. Properties (iii)(a), (b) and (c) then follow imme- diately from Theorem 8.1 and taking ultraproducts, and (iv) then follows from (iii). Remark 8.6. — Observe that if A is a strong ultra approximate group, that is to say an ultraproduct of K-strong finite approximate groups, and if L is a locally compact model of A as given for example by Proposition 6.1, then from the strong approximate group hypothesis made on A we see that the standard part of the escape norm st(g ) e,A and the escape norm of π(g) ∈ L with respect to the neighborhood of the identity π(A) of L are comparable. Namely π(g)  st(g )  π(g) . As a consequence, if e,π(A) e,A e,π(A) we take the standard parts of the escape norm in properties (i) to (iii), then what we obtain is precisely the analogous properties for the escape norm in L with respect to π(A).In that case, the three properties are essentially equivalent to the original Gleason lemmas in the literature on Hilbert’s fifth problem, applied to the locally compact (local) group L. In the sequel however, it will be very important that the three bounds (i) to (iii) obtained in Proposition 8.5 hold at the ultra level in R and not only at the level of standard parts. 9. Proof of the main theorem In this section, we complete the proof of our main theorem, Theorem 4.2.We will do so by first reducing to the case when A has no global internal subgroup. For convenience, we introduce the following definition. Definition 9.1 (No small subgroups). — An ultra approximate group A has the NSS property if A does not contain any non-trivial global internal subgroup. By a global internal subgroup of A = A , we mean a subset of the form n→α H ,where H ⊆ A is a genuine subgroup. Note that A is NSS if and only if, n n n n→α for any g ∈ A\id, the escape norm g is non-zero (though it may be infinitesimal). We e,A remark that an analogous NSS condition for locally compact groups plays a key role in the theory of Hilbert’s fifth problem. THE STRUCTURE OF APPROXIMATE GROUPS 173 Example 22. —Let N ∈ N be an unbounded (nonstandard) integer. Then the interval A := [−N, N] (in the nonstandard integers Z) is NSS. Note that while A contains global subgroups such as Z or {x ∈ Z : x = o(N)}, such subgroups are not internal (they are not the ultralimits of standard sets). Clearly, any ultra approximate subgroup of an NSS ultra approximate group is also an NSS ultra approximate group. Using the Gleason lemmas from Section 8 we can reduce the proof of our main theorem to consideration of the NSS case. Proposition 9.2 (NSS reduction). — Let A be an ultra approximate group. Then there exists a 1000 4 large ultra approximate subgroup A of A, with (A ) well-defined and contained in A , and a global internal subgroup H contained in A and normalised by (A ) , such that A /H is an NSS ultra approximate subgroup, which admits a connected Lie group as a good model. We refer the reader to Definition 3.5 for the definition of a good model. Here A /H denotes the quotient local group as defined in Lemma B.12. Proof. — By Proposition 7.2 there is a ultra (strong) approximate group A ⊆ A 10 4 which is large relative to A, for which (A ) is well-defined and contained in A ,and a good model π : (A ) → L, where L is a connected Lie group. Let B be an open bounded convex symmetric neighbourhood of the identity in the Lie algebra of L. Then for suffi- ciently small r > 0, exp(rB) contains no non-trivial subgroups of L. Let H denote the global internal subgroup H ={g ∈ A ;g = 0} given by Propo- sition 8.5. Since H is normalised by A , it is also normalised by (A ) .Wemay then 100  8 apply Lemma B.12 and consider the quotient local group (A ) /H. Then (A ) /H = (A /H) is well-defined. Since A , H are nonstandard finite symmetric sets, A /His also; 2  2 since (A ) can be covered by finitely many left-translates of A, (A /H) can be covered by finitely many left-translates of A /H. We conclude that A /H is an ultra approximate group. Since exp(rB) contains no non-trivial subgroups, the image of H under π has to be trivial, thus the homomorphism π descends to a homomorphism of A /HtoL, which satisfies the conditions for a good model (see Definition 3.5). By construction, every element g ∈ A that is not in H has positive (but nonstandard) escape norm g .If g ∈ A e,A and [g] ⊆ A /H, where [g] is the class of g in A /H, then g ⊆ A . On the other hand A is a strong ultra approximate group, and thus g  is non-zero if and only if g 2 e,A e,A is non-zero. This implies that every non-identity element [g] in A /H also has positive escape norm [g]  .Thus A /H is NSS and the claim follows. e,A /H Let us now state Theorem 4.2 in the special case of NSS groups, and show how the general case of Theorem 4.2 follows from it. Theorem 9.3 (NSS approximate groups contain large nilprogressions). — Let A be an NSS ultra approximate group which admits a connected Lie group L as a good model. Then A contains a 174 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO nondegenerate ultra nilprogression P in normal form, which is large relative to A. Furthermore, the rank andstepof P are no greater than the dimension of L. Proof that Theorem 9.3 implies Theorem 4.2.—Let A be an ultra approximate group. We may find a large ultra approximate subgroup A of A which satisfies the conclu- sions of Proposition 9.2. We may then apply Theorem 9.3 to A /Hand find in (A ) /H a nondegenerate ultra nilprogression P in normal form with |P | |A/H|.Wecan 0 0 write P = P (u ,..., u ; N ,..., N ),where N ∈ N are unbounded and u ∈ A /H. We 0 s 1 r 1 r i i may then pick arbitrary lifts u ∈ A and set P = P(u ,..., u ; N ,..., N ).Then HP is i 1 r 1 r 5 4 a nondegenerate ultra coset progression in normal form contained in (A ) ⊆ A ,and |HP|  |H||P | |A| as desired. We turn now to the proof of Theorem 9.3, which will occupy the remainder of this section. We begin with a brief sketch, fleshing out a little more the overview given in Section 4. The proof will proceed by induction on the dimension of the connected Lie group L. The base case of the induction, when dim L = 0, is trivial as in this case the NSS ultra approximate group A is also trivial. To treat the induction step we will consider an element u of A with smallest possible escape norm. The existence of such an element is guaranteed by our standing hypothesis that approximate groups are finite objects, i.e. that each A in A = A is finite. Then we will mod out A by the n n n→α geometric P := {u , |n|  1/u },where u is an element of A which the smallest possible e,A escape norm u . The quotient local group A/P (in the sense of Lemma B.12)willbe e,A shown to be both NSS and to admit a Lie group with dimension at most dim L − 1 as a good model. It is at this step that we crucially rely on the fact that we are only quotienting out by a local group, the progression P, rather than a global one such as the group u generated by u. We do this in order to avoid accidentally creating torsion with an excessively large quotient. Indeed, it is because of this component of the induction that it was necessary to cast the entire argument in the setting of local groups rather than global groups, even if one had been willing to restrict the main results of the paper to the global group case. Finally, making key use of the properties of the escape norm given by the Gleason lemmas, we will lift the nilprogression from A/Pto A. Let us turn to the details. Proof of Theorem 9.3.—Let A be an NSS ultra approximate group which admits a connected Lie group L as a good model π : A → L. We proceed by induction on dim L and first dispose of the trivial case when L has dimension zero. As L is connected, it must thus be trivial. Applying Definition 3.5(iii), we conclude that A is a large global internal subgroup of A. Since A is NSS, this kernel must therefore be trivial. Therefore A is trivial. Now suppose that dim L  1, and that the claim has already been proven for con- nected Lie groups of smaller dimension. To complete the proof of Theorem 9.3 it suffices to establish the following lemma. THE STRUCTURE OF APPROXIMATE GROUPS 175 Lemma 9.4 (Induction step). — Suppose that A is an ultra approximate group admitting a connected Lie group L of positive dimension as a good model. Then A contains large ultra approximate subgroups A ⊆ A ⊆ A ⊆ A with the following properties. Let u ∈ A be such that u is minimal e,A n  10 and non zero, and set P := {u :|n| < 1/u }.Then P commutes with (A ) and obeys the e,A following properties: (i) the quotient A /P is an ultra approximate group which admits a connected Lie group of dimension dim L − 1 as a good model, whose Lie algebra is formed from the Lie algebra of L by quotienting out by a one-dimensional central subalgebra; (ii) if A is NSS,sois A /P; (iii) to any large ultra nilprogression Q in A /P in normal form, one can associate a large ultra nilprogression Q in A in normal form, whose rank exceeds the rank of Q by at most one, and similarly for the step; and (iv) (A ) ⊆ A . Proof of Theorem 9.3. — Indeed apply the induction hypothesis to A /P, which we can do by (i) and (ii). We may then conclude, using (iv), that A /P contains a large ultra nilprogression. Finally, apply (iii) to conclude. Proof of Lemma 9.4. — Take B to be some small convex neighbourhood of 0 in the Lie algebra l of L. We shall take A , A , A to be such that −1  −1 (9.1) π exp(B) ⊆ A ⊆ π exp(1.001B) and −1  −1 (9.2) π exp(δB) ⊆ A ⊆ π exp(1.001δB) and δ δ −1  −1 (9.3) π exp B ⊆ A ⊆ π exp 1.001 B , 10 10 where δ> 0 is a small (standard) real number to be specified later. It follows from Proposition 8.5 that large ultra approximate subgroups of A exist with these properties, and furthermore that, if B is small enough, the escape norm · e,A satisfies the conjugation, product and commutator inequalities laid out in (iii) of that proposition. Note also that A , A admit L as a good model and have the NSS property. Property (iv) of the lemma is essentially immediate; we turn to the more substantial (i), (ii) and (iii). We begin with the proof of (i). Recall that u ∈ A is chosen so that u  is minimal e,A and nonzero. Observing that x  100δ for x ∈ (A ) , it follows from the commuta- e,A tor estimate of Proposition 8.5 (iii)(b) that for such x we have # # # # [x, u] = O x u < u e,A e,A e,A e,A provided that δ is chosen sufficiently small in terms of the implied constant O(·). 176 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO Note that [x, u] lies in A , rather than merely (A ) , since its escape norm is less than 1. From the extremal property of u, it follows that [x, u]= id, that is to say x com- mutes with u, whenever x ∈ (A ) . Recall that we are taking P := {u : n  1/u }. e,A Since (A ) is well defined, we may apply Lemma B.12 and form the quotient local group A /P  A /P , which is clearly also an ultra approximate group. We now n→α n show that A /P admits a proper quotient of L as a good model. To do this, we first verify that π(P) is a non-trivial central one-parameter local subgroup of L. Since dim L  1, the groups A , A are non trivial, and this implies that u  is e,A infinitesimal, i.e. that M := 1/u is unbounded. Let n ∈ N be such that n = o(M ). e,A 0 0 n kn We must have u ∈ ker π , because u ∈ A for all k ∈ N.Define amap φ :[−100, 100]→ L by setting tM (9.4) φ(t) := π u , where · is the (nonstandard) greatest integer function. Then π(u ) = φ(st(n/M )) for all n ∈[−100M , 100M ],and φ is a local homomorphism in the sense that φ(t)φ (s) = 0 0 φ(t + s) whenever t, s, t + s ∈[−100, 100]. Also π(P) = φ([−1, 1]). Finally, we verify that φ is continuous. Because of the local homomorphism property, it is enough to check this k 1 at 0. If t is small, then (φ (t)) = φ(tk) ∈ exp(1.001B) for every integer k ∈[0, ],hence φ(t) ∈ exp(1.001B/k) is close to the identity in L, which gives the desired continuity. As φ is a continuous homomorphism from [−1, 1] to the Lie group L, there exists an element X of the Lie algebra l such that φ(t) = exp(tX) for all t ∈[−1, 1].Moreover X ∈ 1.001B. On the other hand by the definition of the escape norm we have u ∈ / A , and hence φ(1) = exp(X)/ ∈ exp(B) and thus X ∈ / B. In particular X is non-zero, and it follows that φ([−1, 1]) is a non-trivial local one-parameter subgroup of L. Finally it is tM central in L, because L is connected and φ(t) = π(u ) commutes with the neighbour- hood of identity π(A ) as shown above. Thus X lies in the centre of the Lie algebra l. If we choose a neighbourhood U of the identity in L small enough, then by Lemma B.12 we may form the quotient space U/φ ([−1, 1]), which one easily verifies to be a local Lie group of dimension dim L − 1, whose Lie algebra is obtained from the Lie algebra of L by quotienting out by a one-dimensional central subalgebra. By Lie’s third theorem every local Lie group is locally identifiable with an open neighbourhood of a global connected Lie group L , which in our case still has dimension dim L − 1. Thus, by shrinking U if necessary, we may find a local homomorphism η : U → L whose kernel lies in φ([−1, 1]). The local homomorphism η ◦ π : (A ) → L then pushes down to a local homomorphism ψ : (A ) /P → L . Choosing δ smaller if necessary, we may assume that ψ is defined on all of (A /P) , thus making L a good model for A /P. Note we may also ensure that π(A ) contains no non-trivial subgroup of L , a property that will be needed in Lemma 9.5 below. This completes the proof of (i). We turn now to (ii), which asserted that A /P isNSS.Infactweshall provethe same statement for A /P, from which the statement for A /P follows (or note that an THE STRUCTURE OF APPROXIMATE GROUPS 177 identical proof works). Key to this endeavour is the following lifting lemma, which we will require again in the proof of (iii). 8  8 Lemma 9.5 (Lifting lemma). — Let g ∈ A /P,and let κ : (A ) → (A ) /P be the projection map. Then there exists g˜ ∈ A such that κ(g˜) = gand ˜ g  = O(g  ). e,A e,A /P Letusfirstremarkonwhy theNSS property of A /P follows quickly from this. Indeed suppose that g ∈ A /P is not the identity. Then the element g˜ generated by the above lemma is not the identity either, and hence has positive escape norm since A is NSS. By the lemma, g also has positive escape norm. Since g = id was arbitrary, this establishes the NSS property for A /P. Proof of Lemma 9.5.—Fix g ∈ A /P. Let g˜ be a lift of g in A which minimizes the escape norm ˜ g among all possible lifts of g.If g˜ is trivial, then so is g and there is e,A nothing to prove. Therefore we may assume that g˜ is not the identity and hence, since A is NSS, that it has positive escape norm. Suppose, by way of contradiction, that g = e,A /P o(˜ g ). Our goal will be to reach a contradiction by finding another lift of g with e,A strictly smaller escape norm than g˜. Set M := 1/˜ g ∈ N. e,A We now make an important deduction from our hypothesis. For every n ∈ N such that n = O(M ),wehave g ∈ A /P. In particular, for every (standard) integer k ∈ N, kM  M 1 1 g ∈ A /P. This implies that the group generated by g lies in A /P. However, in projection to the Lie model, A /P gets mapped into a neighbourhood of the identity in L , which we chose small enough so as not to contain any non-trivial subgroup. We M  M 1 1 thus conclude that g maps to the identity in L , and therefore g˜ maps into the local one-parameter subgroup φ([−1, 1]). Now there is another element which maps to φ([−1, 1]),namely u, the element for which u  is minimal. e,A In order to motivate the rest of the argument, let us temporarily work in a heuristic setting (using informal notation such as ≈), returning to tighten the argument rigorously later. Since −1 (9.5) A ≈ π exp(δB) , and since M is the least n for which g˜ escapes A ,wehave (9.6) π g˜ ≈ φ(δ) = exp(δX). Similarly (9.7) π u ≈ φ(δ). 178 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO n n  −1 Now u takes at least as long as g˜ to escape from A ≈ π (exp B). Hence (roughly) −1 it takes as least as long to escape from A ≈ π (exp δB) as well, which means that (9.8)M  M . 1 0 We are trying to find a lift of g with smaller escape norm than that of g˜. To do this −m ∗ it is sensible to look for elements of the form h := gu ˜ , m ∈ N.Providedthat m is chosen judiciously, h will also be a lift of g since (by definition) u lies in P. Since, measured in φ([−1, 1]) by applying π , the element u is “shorter” than g˜, it seems reasonable that by an appropriate choice of m we can make h shorter than g˜ as well. Being a little more precise, suppose that m, n ∈ N. Since u is central in A n n −mn n we have h =˜ g u whenever these expressions are well-defined, and hence π(h ) = n −mn π(g˜ )π(u ).From(9.6)and (9.7)wehave n  −mn (9.9) π g˜ ≈ φ st δn/M and π u˜ ≈ φ st −δmn/M . 1 0 These expressions will be legitimate if m, n are chosen so that the arguments of the φ ’s always lie in [−50, 50] (say). It follows that 1 m (9.10) π h ≈ φ st − δn . M M 1 0 However (by the Euclidean algorithm) there is a choice of m ∈ N such that |1/M − m/M |  1/2M . Comparing with (9.10) we see that for n = 1,..., 2M we have 0 0 0 n   −1 π(h ) = φ(δ ) with δ  δ. Since π (φ ([0,δ])) ⊆ A ,wemustraise h to at least the power 2M before it escapes A . Since 2M > M  M , this h is a lift of g with smaller 0 0 0 1 escape norm than g˜. Note that the computations (9.10) are legitimate for this choice of m and for n  2M . We now perform the above argument rigorously. Instead of the heuristic statement (9.5), we must work with the inclusions −1  −1 (9.11) π exp(δB) ⊆ A ⊆ π exp(1.001δB) . To get a precise form of (9.6), note that by definition of the escape norm we M −1  M  M −1 1 1 1 have g˜ ∈ A , whilst g˜ ∈ / A . In particular, as a consequence of (9.11), π(g˜ ) ∈ exp(1.001δB), whilst π(g˜ )/ ∈ exp(δB). Since M is unbounded, the first of these actu- ally implies that π(g˜ ) ∈ exp(1.001δB). M −1 M 0 0 Similarly π(u ) ∈ exp(1.01δB), whilst π(u )/ ∈ exp(δB). Once again, the first of these implies that π(u ) ∈ exp(1.001δB). Since B is convex, comparison of these facts shows that π(g˜ ) = φ(t) and π(u ) = φ(t ) with (9.12) t, t ∈[0.9δ, 1.1δ]. THE STRUCTURE OF APPROXIMATE GROUPS 179 Suppose that M and M are the escape times of g˜ and u from A , respectively. Since 1 0 u ∈ A was assumed to have minimal escape norm, M  M . On the other hand (9.11) 1 0 implies that M /M , M /M ∈[0.99δ, 1.01δ],and so 0 0 1 1 (9.13)M  1.1M . 1 0 −m ∗ ∗ As in the heuristic discussion above, take h := gu ˜ ,for some m ∈ N. Let n ∈ N.Then we have n  −mn π g˜ = φ st tn/M and π u = φ −st t mn/M 1 0 provided that the arguments of the φ ’s are in [−50, 50], which will always be the case later on in the argument. Since u is central we have t mt (9.14) π h = φ st − δn . δM δM 1 0 Roughly as before, we use the Euclidean algorithm to find m ∈ N such that |1/M − mt /δM |  t /2δM .By(9.12)and (9.13) it follows that 1 0 0 t mt t t 1 0.9 −  + 1 − < . δM δM 2δM δ M M 1 0 0 1 1 n  n It follows from this and (9.14)that π(h ) ∈ φ([0,δ]) for n  M , and hence h lies in A for these same values of n. As a consequence, h has smaller · escape norm than g˜, e,A contrary to assumption. Finally we prove item (iii) of Lemma 9.4. Suppose then that Q is a nondegen- erate large ultra nilprogression in A /P in normal form; we wish to lift this to a large ultra nilprogression Q in A of at most one higher rank and step, while preserving the nondegeneracy and normal form properties. The main difficulty is that if one lifts the generators of Q arbitrarily then there is no guarantee that the progression they generate, or even a significant part of it, will be contained in A . The key to ensuring that we do achieve this lies in making judicious use of the lifting lemma (Lemma 9.5)and theprod- uct and commutator properties of the escape norm (Proposition 8.5(iii) (b) and (c)). At this point we advise the reader to quickly review Definition 2.6 and Appendix C,where nilprogressions in C-normal form are discussed. We may write the non-degenerate ultra nilprogression in normal form as Q = P(u ,..., u ; N ,..., N ), 1 r 1 r where the u are in A /P, the N ∈ N are unbounded, and r is the rank of Q, and some i i standard step s. From the normal form hypothesis (and taking ultraproducts), we have the following properties: 180 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO (i) (Upper-triangular form) For every 1  i < j  r and  , ∈{−1, +1}, one has i j N N j+1 r (9.15) u , u ∈ P u ,..., u ; O ,..., O . j+1 r i j N N N N i j i j 1 n (ii) (Local properness) The expressions u ... u for nonstandard integers n ,..., n 1 r 1 r with |n |  N for all 1  i  r are all well-defined and distinct, if C is a suffi- i i ciently large standard real. (iii) (Volume bound) One has N ... N |Q| N ... N . 1 r 1 r (Note that as the N are unbounded, 2N + 1and N are comparable.) i i i Also, since u ∈ Q ⊆ A /Pfor all1  i  r and |n |  N ,wehave i i u   . i e,A /P By Lemma 9.5,wemay findlifts u ∈ A which project to u in the quotient local i i group A /P, and are such that u   = O u i e,A i e,A /P and thus (9.16) u    . i e,A In order to include P in the lifted progression, we set u := u, the generator of P, r+1 and N := 1/u .From(9.4) we see that r+1 e,A (9.17)M  N  M . r+1 0 0 We then define Q := P(u ,..., u , u ; εN ,...,εN ) 1 r r+1 1 r+1 for some sufficiently small standard ε> 0. We claim that Q is well-defined in A as a nondegenerate ultra nilprogression in normal form, of rank (r + 1) andstepatmost s + 1. We begin with the claim that Q is well-defined in A .From(9.16) and Proposition 8.5 one has g  ε e,A for all g ∈ Q, and in particular every product in Q lies in A as required. THE STRUCTURE OF APPROXIMATE GROUPS 181 It is clear that Q is a nondegenerate ultra non-commutative progression of rank (r + 1). To show that it is a nilprogression of step at most s + 1, it suffices to show that ±1 ±1 any iterated commutator g of length s + 2 in the generators u ,..., u is trivial. Using 1 r+1 commutator identities such as the Hall-Witt identity zy xy −1 −1 −1 −1 (9.18) z, [x, y] = y , z , x z, x , y y −1 where x := y xy (using the unbounded nature of the N to justify all operations) we may ±1 restrict attention to iterated commutators g of the form g =[h, u ] where h is an iterated commutator of length s + 1and 1  i  r + 1. But by projecting down to A /P, we know that the image of h vanishes and thus h ∈ P. Since P is central in A , the claim follows. Finally, we need to show that Q is in normal form. We begin by establishing the upper triangular form (2.1), i.e. that N N j+1 r u , u ∈ P u ,..., u ; O ,..., O j+1 r i j N N N N i j i j whenever 1  i < j  r + 1and  , ∈{−1, +1}. i j If j = r + 1, then u = u commutes with every element of A , and in particular with u , so the claim follows in this case. Now suppose that j  r.From(9.15)wethenhave j+1 r i j u , u ∈ P u ,..., u ; O ,..., O i j j+1 r N N N N i j i j which lifts to j+1 r u , u ∈ P u ,..., u ; O ,..., O · P. j+1 r i j N N N N i j i j i n Thus we may write [u , u ]= gu ,where i j j+1 r g ∈ P u ,..., u ; O ,..., O j+1 r N N N N i j i j and n  1/u  = M .From(9.16) and Proposition 8.5 one has e,A # # # # u , u  , i j e,A N N i j and therefore e,A N N i j and hence # # # # (9.19) u  . e,A N N i j 182 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO n n In particular, u   is infinitesimal, which implies that π(u ) = id and hence n = o(M ) e,A by (9.4). Since u  = 1/N , we conclude that e,A r+1 r+1 |n| N N i j and thus N N j+1 r+1 u , u ∈ P u ,..., u ; O ,..., O . j+1 r+1 i j N N N N i j i j Noting that ε> 0 is standard and thus can be absorbed into the O() notation, this gives the desired upper triangular property. Next, we establish the local properness. Suppose that n n 1 r+1 1 r+1 u ... u = u ... u 1 r+1 1 r+1 for some |n |, |n |  εN . Quotienting by P, we conclude that i i n n n n 1 r 1 r u ... u = u ... u . 1 r 1 r This quotienting can be justified because all products here lie in Q and hence in A .By the local properness of Q, we conclude if ε is small enough that n = n for all 1  i  r; we may then cancel and conclude that n −n r+1 r+1 u = id. Since u  = 1/N and |n − n | < N , this implies that n = n , giving the e,A r+1 r+1 r+1 r+1 r+1 r+1 desired local properness. From local properness one immediately has the lower bound |Q| N ... N N . 1 r r+1 Now we establish the matching upper bound |Q| N ... N N . 1 r r+1 We first recall from the normal form of Qthat |Q| N ... N . 1 r From construction it is also clear that the image of Q under projection by P lies in Q. It therefore suffices to show that the preimage of any element in Q contains at most O(N ) elements of Q. By construction of the quotient map, we see that the preimage is contained in a translate of P, and thus has cardinality O(M ); the claim then follows from (9.17). This concludes the proof of Lemma 9.4 and thus Theorem 9.3.  THE STRUCTURE OF APPROXIMATE GROUPS 183 We can now conclude the proof of Theorem 2.10, the most basic form of our main theorem. Proof of the first part of Theorem 2.10. — We argue by contradiction. Negating the quantifiers, we see that there exists some K  1 and an infinite sequence of local groups G and finite K-approximate groups A ⊆ G , n ∈ N, for which the conclusion of the n n n theorem fails, namely for which A does not contain any coset nilprogression of rank and step at most n in n-normal form and of cardinality at least |A |. Now form the ultraproduct A = A inside G = G .ByŁos’s Theo- n n n→α n→α rem (Theorem A.6), G is a local group and A an ultra approximate subgroup. We can now apply Theorem 4.2, whose proof we just completed, to conclude that A contains an ultra coset nilprogression P in normal form with |P| |A|. Using Łos’s theorem again we conclude that P = P ,where foran α-large set of n,P is a 1/c-proper coset n n n→α nilprogression contained A of rank andstepatmost 1/c and of size at least c|A | for some standard positive number c > 0. But this contradicts the construction of the A , thereby yielding the claim. To conclude this section we record another useful conclusion from the above anal- ysis: Hrushovski’s Lie model is nilpotent. Proposition 9.6 (Nonstandard finite approximate groups have nilpotent Lie models). — Suppose that A is an ultra approximate group and that π : A → L is a good model for A into a connected Lie group L with Lie algebra l. Then l and L are nilpotent. Proof. — By Proposition 8.5 we may find a large strong ultra approximate subgroup A of A obeying the conclusion of that proposition. By quotienting out the elements H of A of zero escape norm as in the proof of Proposition 9.2, we obtain an NSS ultra approximate group A /H. Now one runs the argument in Theorem 9.3. An inspection of this argument shows that if one unfolds the induction from Lemma 9.4, the Lie algebra l of L is repeatedly quotiented out by central algebras until it becomes trivial. Thus, l can be obtained from the trivial Lie algebra by a finite tower of central extensions and is therefore nilpotent as required. The nilpotence of L is an immediate consequence of this and basic Lie theory. 10. A dimension bound In this section we prove Theorem 2.12, in which it is shown that the rank of the O(1) nilprogression P in the main theorem may be taken to be O(K ). We will also show 4 O (1) that so long as we work in a global group G, and replace A with A ,itmay be taken to be O(log K). By the usual ultraproduct argument, it will suffice to establish the following nonstandard analysis formulation of the theorem. 184 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO Theorem 10.1. — Suppose that A is an ultra global K-approximate group, thus A = A for some finite K-approximate groups, each contained in a global group G . Then A con- n n n→α tains an ultra coset nilprogression P in normal form with |P| |A| and rank at most O(K log K). Moreover, there exists a standard natural number m such that A contains an ultra coset nilprogression P in normal form with |P| |A| and rank at most 6log K. Recall that the step of a nilprogression in normal form is always less of equal to its rank. The derivation of Theorem 2.12 from Theorem 10.1 proceeds analogously to the derivation of Theorem 2.10 from Theorem 4.2 and is omitted. It remains to establish Theorem 10.1. The arguments here are inspired by some remarks of Hrushovski in [33, §4]. In particular, a key tool will be the following lemma from [33, Lemma 4.9]. Lemma 10.2 (Doubling in a simply connected nilpotent Lie group). — Let G be a connected, simply connected nilpotent Lie group of dimension d , and let A be a measurable subset of G.Let μ be a Haar measure on G(note that nilpotent groups are automatically unimodular, and so there is no 2 d distinction between left and right Haar measure).Then μ(A )  2 μ(A). Proof. — We use an argument of Gelander from [33, Lemma 4.9]. As is well known (e.g. see [12]), in a simply connected nilpotent Lie group, the exponential map exp : g → G is a diffeomorphism, which pushes forward the Lebesgue measure μ on the d - dimensional vector space g, the Lie algebra of G, to the Haar measure μ on G. Thus it 2 d will suffice to show that μ (log(A ))  2 μ (log A), where log is the inverse of exp. But g g 2 2 2 as A contains {a : a ∈ A},log(A ) contains the dilate 2 · log A of log A, and the claim follows. One is tempted to combine this theorem with the Hrushovski Lie Model Theo- rem directly (i.e. Theorem 3.10), to get some dimensional control on the Lie group L. However, there is a technical obstruction; the Lie model is only available for an ultra ap- proximate subgroup A of A, and the covering parameter K of this subgroup A may be much worse than the covering parameter K of the original ultra approximate subgroup A. To get around this problem, we need to choose the subgroup A more carefully. A clue as to how to proceed is provided by the following basic observation (cf. [31, Lemma 7.3]). Lemma 10.3 (Slicing approximate groups by genuine subgroups). — Let A be a (possibly infinite) K-approximate group in a global group G,and let G be a genuine subgroup of G. Then A := A ∩ G 3 4  3 is a K -approximate subgroup and A ∩ G can be covered by at most K left translates of A . 2 4  4 Proof. — Since (A ) ⊆ A ∩ G , it suffices to show that A ∩ G can be covered 3  4 3 by K left-translates of A .But A ,can be coveredby K left-translates of A since A is a K-approximate group. Next, observe that if a left-translate gA of A intersects A ∩ G in THE STRUCTURE OF APPROXIMATE GROUPS 185 at least one point g ,then gA ∩ A ∩ G ⊆ gA ∩ G ⊆ g A . 4  3 Thus A ∩ G can be covered by K left-translates of A , as required. Lemma 10.3 suggests that we should look for Lie models of approximate groups A that are formed by slicing A with a genuine subgroup G of G. We turn to the details. Let A be a sequence of K-approximate groups in global groups G ,and let A = A be their ultraproduct; thus A is a ultra K-approximate n n n→α group that lies inside an ultra genuine group G .ByProposition 6.10,wemay find n→α amodel π : A → Gof A by a locally compact group G. −1 4 Let U be the neighbourhood in Definition 3.5.Wehave π (U ) ⊆ A and 0 0 U ⊆ π(A ).ByTheorem B.17, there is an open subgroup G of G and a closed subgroup H of G contained in U and normalized by G such that L := G /H is a connected Lie group. Let U ⊆ G be an open subset such that H ⊆ U ⊆ U ⊆ U and let φ : G → L 1 1 0 denote the quotient map. 4 −1  2 Now set A := A ∩ π (G ). From Lemma 10.3 applied to A , we see that A is aK -approximate group. We now also claim that A is a nonstandard finite set, which would make A an ultra K -approximate group. To see this, observe first from Definition 3.5(ii) that π(A ) is contained in some compact set F. As G is an open subgroup of G, it is also closed and F ∩ G is compact. We then see from Definition 3.5(iii) that we can find −1  −1   4 a nonstandard finite set A such that π (F ∩ G ) ⊆ A ⊆ π (G ).Thus A = A ∩ A , ∗ ∗ ∗ and so A is a nonstandard finite set as required. Note that π(A ) contains the open set U ∩ G and is itself contained in a compact subset of G . Hence the set E := φ ◦π(A ) is precompact and contains a neighbourhood of −1 the identity in L. Moreover (φ ◦ π) (φ (U )) ⊆ A , hence it follows that φ ◦ π : A → L is a good model for A . From Lemma 9.6, we conclude that L is nilpotent. Every con- nected nilpotent Lie group admits a unique maximal compact subgroup which, more- over, is central. Let N be the maximal compact subgroup of L and θ : L → L/Nbe the quotient map. We claim that dim(L/N)  6log K. To see this note that, as A is a K -approximate 2 6 2 group, we see that E is covered by at most K left-translates of E. Therefore θ(E) can be covered by at most K left-translates of θ(E), and hence θ(E) can be covered by at most K translates of θ(E),where θ(E) is the topological closure of θ(E), a compact set with non-empty interior. If we let μ be a Haar measure on L/N, it follows that μ θ(E)  K μ θ(E) . On the other hand, from Lemma 10.2 one has dim(L/N) μ θ(E)  2 μ θ(E) . 186 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO Since θ(E) has non-empty interior, μ(θ(E)) = 0 and so comparison of these two inequal- ities implies that (10.1)dim(L/N)  6log K. We now explain how to derive the second part of Theorem 10.1 from the above; we −1 will turn to the first part later. We consider φ (N), which is the kernel of the projection map from G to L/N(= (G /H)/N). This is a compact subgroup of G . Since π(A ) contains an open neighbourhood of the identity, we conclude that there exists a standard −1 m−1 m natural number m such that φ (N) ⊆ π(A ), and this implies that A contains the 8m kernel of θ ◦ φ ◦ π , which implies that θ ◦ φ ◦ π : A → L/N is a good model. By 4m Proposition 9.2 we conclude that A contains a large approximate subgroup A with 1000 4m (A ) well-defined and contained in A , and a global internal subgroup H of A such that A /H is an NSS approximate subgroup with a connected Lie group as a good model. 1000 m An inspection of the proof of that proposition reveals that we may take (A ) inside A andthatwemay take theconnected Liegroup to be L/N, which we have shown to have dimension at most 6 log K. Applying Theorem 9.3, we see that (A /H ) contains a large ultra nilprogression in normal form of rank at most 6 log K, and thus (A ) contains a large ultra coset nilprogression in normal form of rank at most 6 log K. As (A ) is m 4m contained in A , which is in turn contained in A , the second part of Theorem 10.1 follows (after redefining m). We now turn to the first part of Theorem 10.1. As we see from the last paragraph, −1 thedifficultyhereisthat φ (N) may not be contained in π(A ). We will show that −1 nevertheless π(A ) still contains a subgroup φ (N ),where N is a closed subgroup of 0 0 N with small codimension. For this the key is the following lemma, which is potentially d d d of interest its own right. Here, and below, we write T := R /Z for the d -dimensional torus. By a subtorus we mean a closed connected subgroup of T . Lemma 10.4. —Let K, d  1 and A be a closed K-approximate group in T containing a neighbourhood of 0. That is, A is closed, contains a neighbourhood of 0, is centrally symmetric, and there is a finite set X ⊆ T , |X|  K, such that A + A ⊆ A + X. Then 4A := A + A + A + A contains d 2 a subtorus T ⊆ T of codimension at most O(K log K). Before proving Lemma 10.4, we explain how to conclude the proof of Theo- 2 −1 rem 10.1 with Lemma 10.4 in hand. First we observe that setting A := A ∩ π (G ), 4 4 π(A ) is a neighbourhood of id in G. Indeed A is and A ⊆ XA for some finite X, so that π(A) has non-empty interior, and hence π(A ) is a neighborhood of id. We −1  2 now apply the lemma to A = φ ◦ π(A ), and conclude that φ (T) ⊆ π(A ) ⊆ π(A ). 1 1 Writing θ : L → L/T for the projection map, we see that A contains the kernel of θ ◦ φ ◦ π and this implies that A admits the connected nilpotent Lie group L/Tas a d d 2 good model. Moreover dim L/T = dim L/T + dim T /T = O(K log K) by (10.1)and by Lemma 10.4. The rest of the proof is then identical to the previous case: by Proposition THE STRUCTURE OF APPROXIMATE GROUPS 187 9.2 and it’s proof we conclude that A contains a large approximate subgroup A with 1000 3 (A ) well-defined and contained in A , and a global internal subgroup H of A such that A /H is an NSS ultra approximate subgroup admitting L/N as a good model. By Theorem 9.3, we see that (A /H ) contains a large ultra nilprogression in normal form 2  4 of rank at most O(K log K),and thus (A ) contains a large ultra coset nilprogression 2  4 3 12 in normal form with rank at most O(K log K).As (A ) ⊆ A ⊆ A , the first part of Theorem 10.1 follows. We now turn to the proof of Lemma 10.4.Let μ be the normalized Haar measure d d d d on T . Note that the group T of characters of T identifies with Z . Our main tool is the notion of the (α-)large spectrum of an additive set A, defined by Spec (A) := ξ ∈ Z : 1 (ξ )  αμ(A) . See [54, Definition 4.34] for this definition and a further discussion. If S ⊆ Z is a set of characters, we write S := ker ξ. ξ∈S ⊥ d d Note that S is a closed subgroup of T with codimension the rank of the subgroup of Z generated by S. To prove Lemma 10.4 we first reduce to the case in which μ(A) is somewhat large by establishing the following lemma. Lemma 10.5. —Suppose that A ⊆ T is a closed K-approximate group containing a neigh- d d bourhood of 0. Then there is a subtorus T ⊆ T with dim(T )  d − O(log K) and some x ∈ T 0 0 0 −O(Klog K) such that, writing μ for the Haar measure on T , we have μ ((A + x ) ∩ T ) e . 0 0 0 0 0 Then we handle the case in which μ(A) is somewhat large by proving the follow- ing, which is a straightforward continuous analogue of the so-called Bogolyubov-Chang lemma [10]. Lemma 10.6. — Suppose that A ⊆ T is measurable, that μ(A)  α, and that μ(2A) Kμ(A).Then 2A − 2A contains a subtorus T ⊆ T of codimension at most O(Klog(1/α)). To deduce Lemma 10.4 from Lemmas 10.5 and 10.6, we proceed as follows. Lo- cate an x as in Lemma 10.5, and suppose furthermore that for this x the measure of 0 0 (A + x ) ∩ T is close to maximal in the sense that 0 0 (10.2) μ (A + x ) ∩ T  μ (A + x) ∩ T 0 0 0 0 0 for all x ∈ T .Set A := (A + x ) ∩ T . Then , since A + A ⊆ A + X, we have 1 0 0 A + A ⊆ (A + X + 2x ) ∩ T . 1 1 0 0 188 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO By (10.2) it follows that μ(2A )  2Kμ(A ). We are now in a position to apply Lemma 1 1 −O(Klog K) 10.6 to A ,with α = e . We conclude that there is a further subtorus T ⊆ T of 1 0 codimension O(K log K) inside 2A − 2A . Since 2A − 2A ⊆ 4A, this concludes the 1 1 1 1 proof of Lemma 10.4. For the proofs of both Lemmas 10.5 and 10.6 we will require the following lemma of Bogolyubov type. Lemma 10.7 (Bogolyubov-type lemma). — Let A ⊆ T have positive measure and let k  2 be a natural number. Suppose that 1/(2k−2) δ  μ(A)/2μ(kA) . Then kA − kA contains (Spec (A)) . Proof. — It suffices (in fact, it is equivalent) to show that if x ∈ (Spec (A)) then ∗k ∗k f (x)> 0, where f = 1 ∗ 1 = 1 ∗···∗ 1 ∗ 1 ∗··· 1 is the convolution of k copies A A −A −A A −A of 1 and k copies of 1 . Now by the Fourier inversion formula we have A −A 2k 2k 2k $ $ $ (10.3) f (x) = 1 (ξ ) ξ(x)  1 (ξ ) − 1 (ξ ) A A A ξ∈Spec (A) ξ/ ∈Spec (A) ξ∈Z δ δ 2k 2k $  $ 1 (ξ ) − 2 1 (ξ ) , A A ξ/ ∈Spec (A) ξ∈Z where we have used the fact that ξ(x) = 1if ξ ∈ Spec (A) and x ∈ (Spec (A)) .Now δ δ Parseval’s identity and the Cauchy-Schwarz inequality imply that " " 2k ∗k 2 ∗k 1 (ξ ) = 1 (x) dμ(x)  1 (x)dμ(x) A A d μ(kA) d x∈T x∈T ξ∈Z 2k μ(A) = . μ(kA) On the other hand, by a second application of Parseval’s identity, we have 2k 2 2k−2 2k−2 2k−2 2k−1 $  $ 1 (ξ ) <δ μ(A) 1 (ξ ) = δ μ(A) . A A ξ/ ∈Spec (A) ξ∈Z Substituting these inequalities into (10.3) yields 2k 2k−1 μ(A) μ(A) 2k−2 2k−1 2k−2 f (x)  − 2δ μ(A) = μ(A) − 2δ μ(kA) . μ(kA) μ(kA) The lemma follows immediately.  THE STRUCTURE OF APPROXIMATE GROUPS 189 Lemma 10.6 is an immediate consequence of the case k = 2 of this lemma and (the continuous variant of) “Chang’s lemma” [10], which is the following statement. For a proof, see [54, Lemma 4.36]. Lemma 10.8 (Chang’s lemma). — Suppose that α< 1/2 and that A ⊆ T is a measurable d −2 set with μ(A)  α. Then Spec (A) generates a subgroup of Z of rank at most O(δ log(1/α)). Proof of Lemma 10.6. — Noting that μ(A)/μ(2A)  1/K, the lemma follows from Lemma 10.7 with k = 2and δ := 1/2 K followed by an application of Lemma 10.8. To prove Lemma 10.5 we will apply Lemma 10.7 with a much larger value of k,as well as the following result. Lemma 10.9. — There is an absolute constant c > 0 with the following property. Suppose that A ⊆ T is a closed K-approximate group containing a neighbourhood of 0. Then Spec (A) 1−c/ log K generates a subgroup of Z of rank O(log K). Proof.—Let ε = c/ log K, where c > 0 is to be chosen later. Suppose that ξ ∈ d 2iπη(x) Spec (A) and that ξ = 0. Let η : T → R/Z be such that ξ(x) = e .Then 1−ε 1 (x)ξ(x) dμ(x) is real, since A is symmetric, and at least (1 − ε)μ(A).Thus 1 (x) cos 2πη(x) dμ(x)  (1 − ε)μ(A), and hence the (symmetric) subset A (ξ ) ⊆ A, consisting of those x for which cos(2πη(x)) 99/100, has measure μ(A (ξ ))  (1 − 100ε)μ(A). In particular η(x) < whenever x ∈ A (ξ ),where θ:= inf |θ − z|. z∈Z Suppose now that Spec (A) contains elements ξ ,...,ξ which are linearly in- 1 m 1−ε dependent over Q,and let η ,...,η be the corresponding R/Z-valued characters. Con- 1 m sider the set A := A (ξ );providedthat m < 1/100ε, this will have μ(A )  μ(A). i=1 Note that A =−A . d m Consider now the homomorphism ψ : T → T given by x → (η (x), ...,η (x)). 1 m The image of A under ψ lies in a box of diameter 1/5. m m Now any subset U of T = (R/Z) which lies in a box of diameter < is Freiman 2-isomorphic to an open subset of R , and thus by the abelian case of Gelander’s Lemma 10.2 (which, in this case, is just a very simple case of the Brunn-Minkowski inequality), we have μ (2U)  2 μ (U), m m 190 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO m m where μ is the normalized Haar measure on T = (R/Z) .However,wehave 4 4 μ(5A )  μ(5A)  K μ(A)  2K μ(A ), and an application of the Ruzsa covering lemma (here our Lemma 5.2)shows that 2A is a 4K -approximate group, and con- sequently so is U := ψ(2A ). Therefore, noting that μ (U) = 0 since μ(A )  μ(A)> 0, we obtain m 4 8 2  4K = (2K) , a contradiction if m > 8log 2K. Such a choice of m is acceptable if ε< c/ log K with c sufficiently small, and so we are forced to conclude that ξ ,...,ξ cannot exist. The 1 m lemma follows. |X| K Proof of Lemma 10.5. — Note that kA ⊆ (k − 1)X+ A, and that |mX|  m = m for all natural numbers m. It follows that μ(kA)  k μ(A), and so Lemma 10.7 is applicable with δ = 1 − c/ log K for some k  K . The conclusion is that 2kA contains the subtorus T := (Spec (A)) which, by Lemma 10.9, has codimension O(log K).However 1−c/ log K K O(Klog K) 2kA is covered by at most (2k) = e translates of A, and so one of these translates −O(Klog K) has μ (A + x ) e , which was precisely what we claimed. 0 0 To conclude this section we record the observation that the above arguments also yield the following more precise version of Proposition 6.12, the weak global Lie model theorem. This builds upon a previous result in this direction by Hrushovski: see [33, Theorem 4.2] and the discussion before [33, Lemma 4.9]. Theorem 10.10 (Strong global Lie Model Theorem). — Suppose that A is a global ultra K- approximate group. Then there is a large ultra approximate subgroup A of A for some standard m  1 which admits a global model π˜ : A → L into a connected, simply connected nilpotent Lie group L of dimension at most 6log K. Furthermore, there exists a large ultra approximate group A of A which admits a global model π : A → L , a connected nilpotent Lie group, whose maximal (central) compact subgroup N verifies L /N  L. 11. Applications to growth in groups and geometry In this section we collect a variety of applications of our main results, in particular proving the various results stated in the introduction. As an application of his method Hrushovski [33] established the following strength- ening of Gromov’s theorem on groups with polynomial growth. Theorem 11.1. —Let G be a finitely generated group and let K  1. Suppose G = A , n1 where A is an increasing union of finite subsets of G such that |A |  K|A | for all n  1. Then G n n is virtually nilpotent. THE STRUCTURE OF APPROXIMATE GROUPS 191 This is indeed a strengthening of Gromov’s theorem because if G has polynomial growth with respect to some generating set S then the A may be taken to be some subsequence of the word metric balls relative to S. Unsurprisingly, our main theorem also admits an application of this kind. The following is a corollary of Theorem 2.10 and subsumes Theorem 11.1 above. Corollary 11.2 (Gromov-type theorem). — Let K  1. Then there is some K ,depending on K, such that the following holds. Assume G is a group generated by a finite symmetric set S containing 2 K the identity. Let A be a finite subset of G such that |A |  K|A| and S ⊆ A. Then there is a finite normal subgroup N  G and a subgroup G  G containing N such that (i) G has index O (1) in G; 1 K (ii) G /N has step and rank O (1). 1 K In particular G is virtually nilpotent. Proof of Corollary 11.2. — First we make the following simple observation. Suppose be a subgroup of index G is a group generated by a finite symmetric set S and let G n =[G : G ].Thenfor every k < n the ball S meets at least k + 1 different left cosets of i i+1 G in G. Indeed if not then by the pigeonhole principle we have S G = S G for some 0 0 0 k k+1 i < k, and so by multiplying on the left with S it follows that S G = S G . Multiplying 0 0 on the left by further copies of S implies that S G = S G = G, and so G has index at 0 0 0 most k in G, contrary to assumption. Now, we apply Corollary 1.7. Thus there exists a subgroup G of G and a nor- mal subgroup H of G such that A may be covered by K left-translates G for some 0 0 K = O (1) depending only on K ,and G /H is nilpotent of step and rank O (1).In K 0 K particular, G is finite-by-nilpotent. Using this value of K , we see by assumption that S is contained in A and thus S is covered by at most K cosets of G . From our initial observation we conclude that [G : G ]  K . Note that for some s = O (1) the s-th term of the central descending series C (G ) K 0 −1 is contained in H. Moreover, G := gG g is a normal subgroup of G with index at 1 0 g∈G most O (1) contained in G .Hence N := C (G ) is a normal subgroup of G contained K 0 1 in H. On the other hand, G /N is nilpotent of complexity bounded in terms of K only and it has index O (1) in G/N. To conclude from this that G is virtually nilpotent, it suffices to show that G is. However G is actually finite-by-nilpotent (the finite group being N) and any such group is virtually nilpotent. To see this note that the kernel of the action by conjugation on N is a nilpotent subgroup of finite index. Remark 11.3. — Recall that the condition |A |  K|A| implies the existence of an O(1) O(1) approximate group Z of size O(K |A|) and of O(K ) left translates of Z which cover 192 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO A (see [52, Theorem 4.6]). Using Remark 1.9 and Theorem 2.12, we then see that G can be taken so that G /NisO(log K)-nilpotent in the sense that it admits a generating set u ,..., u with  = O(log K) such that [u , u ]∈ u ,..., u for all i < j.Inparticular 1  i j j+1 such a group admits a normal series with cyclic factors of length at most O(log K). Remark 11.4. — If one assumes that A is a K-approximate group instead of the doubling condition |A |  K|A| in Corollary 11.2, then we may also conclude from The- orem 1.6 that N and a generating set of G are contained in A . Using Theorem 2.12 we see that if one additionally wishes to ensure the logarithmic bound as in the previous O (1) remark, then one can only guarantee that N lies inside A . The following corollary is reminiscent of Gromov’s theorem but it involves a weaker type of polynomial growth condition in which the generating set may be arbi- trarily large. Furthermore it only requires that at one scale. Corollary 11.5. —Let d > 0. Then there is R(d)> 0 such that the following holds. Suppose that G is generated by a finite symmetric set S and that there is some scale r > R(d) such that |S | r |S|. Then there is a finite normal subgroup N  G and a subgroup G  G containing N such that (i) N ⊆ S ; (ii) G has index O (1) in G; 1 d (iii) G /N is O(d)-nilpotent (see Remark 1.9 for a definition). r d d Proof. — Our assumption is that |S |  r |S|.Let K = 2 · 10 and C be such that, in the last part of Remark 11.4,Nlies in A . We claim that there is some r , r r 5 d 2 r  r/2C ,suchthat A := S has |A |  10 |A|.Notethat A is then a K-approximate 0 K group with K = 2 · 10 (see Lemma 5.2). Applying Corollary 11.2 and Remark 11.4 and ensuring that R(d) is so large that R(d)>(K ) (K being the quantity in Corollary 11.2), we obtain a finite normal subgroup N  G and a subgroup G  G containing N such that G has index O (1) in G and G /NisO(log K) = O(d)-nilpotent. Furthermore N 1 d 1 2C 2r C r K 0 K and a set of generators for G are contained in A = S ⊆ S . √ √ i+1 i 5 r d 5 r It remains to justify the claim. If it is false then |S | > 10 |S | whenever √ √ i r d log ( r/10C )−1 r 5 r < r/10C , and in particular |S |  (10 ) |S |.If r is greater than some absolute constant, this is greater than r |S|, contrary to assumption. Remark 11.6. — Note that there is no bound on the size of N. Indeed, if G is a large finite simple group and S = G then N must equal G, which shows that |N| can be arbitrarily large compared to d, r. In [51] Y. Shalom and the third author gave a quantitative refinement of Gromov’s theorem inspired by Kleiner’s recent new proof (see also [38, Corollary 4.2] for an earlier THE STRUCTURE OF APPROXIMATE GROUPS 193 result in that direction). A consequence of their result is that a polynomial growth condi- tion at one large scale is enough to guarantee virtual nilpotence. We take the opportunity to record that this follows easily from Corollary 11.2. Corollary 11.7. —Let d > 0.Thenthere is R(d)> 0 such that the following holds. Suppose that G is generated by a finite symmetric set S containing the identity and that there is some scale r > R(d) r d such that |S |  r .Then G contains G ,where (i) G has index O ((r )!) in G; (ii) G is nilpotent with step O (1). Proof. — We apply Corollary 11.5 to obtain groups N, G with the properties stated r d there. As N is contained in S , it has cardinality at most r .The group G acts on N by conjugation; since the permutation group of N has cardinality at most (r )!, we conclude that the stabiliser G of this action has index at most (r )! in G .As G /N is nilpotent of 1 1 step O (1), we conclude that G is nilpotent of step O (1) + 1 = O (1), and the claim d d d follows. Remark 11.8. — As observed in the last section of Gromov’s original paper [27], Gromov’s theorem on polynomial growth already easily implies a weaker result of this r d kind in which the hypothesis is that |S |  r for all r = 1, 2,..., R(d). Note that this result of Gromov (and, a fortiori, Corollary 11.7) have content even when the group G is r d finite. Another weakening of the above result appears in [58], where |S |  r is assumed for infinitely many r rather than for all r. Corollary 11.7 is stronger than the results in [38, 51] in the sense that the bounds do not depend on the cardinality |S| of S. On the other hand, the results in [38, 51], which follow a strategy close to that of Kleiner’s work [37], yield more effective quanti- tative control on the index and step of G , especially in the case when S is of bounded cardinality. Another consequence of our main theorem is that polynomial growth in the sense of Corollary 11.5 at one large scale implies polynomial growth at all subsequent scales. Corollary 11.9. —Let d > 0. Then there is R (d)> 0 such that the following holds. Suppose r d that G is generated by a finite symmetric set S and that |S |  r |S| for some r  R (d). Then r  O (1) |S |  (r ) |S| for all r  r. Proof. — A simple modification of the proof of Corollary 11.5 shows that there is 5r r d 0 0 some r , r  r  r/6, such that |S |  K|S | where K = 100 (say). Applying Corol- 0 0 r 4r 0 0 lary 11.2 with A := S (as before) we obtain a normal subgroup H ⊆ S such that G/H is virtually nilpotent with the index, step and number of generators of the nilpotent sub- group G /H all being O (1). 1 d 194 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO 2 2r Now by Corollary 5.2,A = S is a 2K-approximate group. This means that there 4r 2r 0 0 is some set X, |X|  2K,suchthat S ⊆ XS . From this it follows that 2mr m 2r 0 0 (11.1)S ⊆ X S for every positive integer m. Let π : G → G/H be the quotient homomorphism. We have 2mr 2mr 0  0 (11.2) S  |H| π S , since the cardinality of any fibre is at most |H|. From (11.1) and the fact that π is a homomorphism we have 2mr m 2r 0 0 (11.3) π S ⊆ π(X) π S . 4r On the other hand, since H ⊆ S ,wehave 2r 6r 0 0 (11.4) |H| π S  S . Moreover, since r  r  r/6, we have 6r r d 2d (11.5) S  S  r |S|  r |S|. Putting (11.2), (11.3), (11.4)and (11.5) together gives 2mr m 2d (11.6) S  π(X) r |S|. Now π(X) is a set of size O (1), contained in a virtually nilpotent group in which the index and step of the nilpotent subgroup are O (1). Every such group is a quotient of one fixed virtually nilpotent group with number of generators, index and step of the nilpotent subgroup also O (1) and whose generators are lifts of the elements in π(X). Hence there is a bound of the form m O (1) π(X)  m for all m > 1. Comparing this with (11.6)confirms that r O (1) S  r |S| whenever r is a multiple 2mr with m > 1. It is not hard to see that the same estimate therefore holds for all r , at the expense of increasing the exponent O (1) if necessary. A consequence/reformulation of the preceding result is the following. THE STRUCTURE OF APPROXIMATE GROUPS 195 Corollary 11.10. —Let α> 0. Then there are r ∈ N and β> 0 with lim β(α) = 0 0 α→0 such that the following holds. Let G be a finite group generated by a symmetric set S and as- sume that the diameter of the associated Cayley graph satisfies diam (G)  (|G|/|S|) . Then r 1/β |S |  min{r |S|, |G|} if r  r (α). Proof. — If this does not hold for some r, r and β then as soon as r is large 0 0 n e enough (in terms of β ) Corollary 11.9 applies and yields |S |  n |S| for all n  r and some e = e(β) > 0. In particular, when n reaches the diameter of G, we obtain S = Gso |G|  (diam (G)) |S|. This contradicts our hypothesis if e < 1/α. We shall apply Corollary 11.10 later on to deduce an isoperimetric inequality; see Corollary 11.15. Finally we show that by repeatedly applying Corollary 11.2 we canobtainthe following more precise result, which says something non trivial for finite groups as well. We say that a polycyclic group has length at most L if it is obtained from the trivial group by at most L successive extensions by a cyclic group. Corollary 11.11. —Let G be a group which has a left-invariant metric d : G× G →[0, ∞) satisfying the following conditions for some K  1: (i) (Uniform doubling property) We have |B(2r)|  K|B(r)| for every r > 0; (ii) (Finiteness condition) There are at most K different subgroups of the form B(r) as r ranges over (0, ∞). Then G has a subgroup of index at most O (1) which is polycyclic of length O (1). K K Proof.—Given d ∈ N and R  0 we claim that if there are at most d groups of the form B(r) for r  R, then B(R) contains a polycyclic subgroup of index O (1). This K,d is clearly enough to establish the corollary. To prove the claim, we proceed by induction on d.Itisclear for d = 1, since B(R) is then the trivial group. Let R be the upper bound of those R  0 such that there are at most d − 1 groups of the form B(r) for r  R . Without loss of generality 0 < R  R. Then B(r) = B(R) whenever R  r  R. By the induction hypothesis, B(R /2) contains 0 0 a polycyclic subgroup P of index O (1) and length O (1). K,d K,d Let K = O (1) be the constant obtained in Corollary 11.2. Setting S = B(R ) and K 0 A = B(K R ), we may apply Corollary 11.2 and conclude that G = B(R ) contains a 0 0 subgroup G of index O (1) such that G has a normal subgroup N ⊂ B(4K R ) with 1 K 1 0 G /N nilpotent with step and number of generators O (1). It is enough to show that G 1 K 1 has a polycyclic subgroup of index O (1), because then so will G = B(R) . K,d By the uniform doubling assumption and a covering argument, B(4K R ) can be covered by O (1) translates of B(R /2). It follows that N can be covered by O (1) K 0 K,d Recall that B(R) is the closed ball {g ∈ G; d(1, g)  R}. 196 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO translates of P, and in particular [N : N ∩ P]= O (1).Now N ∩ P is a subgroup of K,d P and hence is also polycyclic of length O (1); in particular, it is generated by O (1) K,d K,d elements. Therefore so is N, and hence N , the intersection of all subgroups of N of index at most [N : N ∩ P], has index O (1) in N. (To see this recall Schreier’s theorem that K,d 2k−1 if S is a symmetric generating set for a group , and if    has index k,then S contains a set of generators for  .) The group N , being a subgroup of N ∩ P, is polycyclic. It is also characteristic in N and hence, since N is normal in G ,N is also normal in G . 1 0 1 However G acts by conjugation N/N , and the kernel of this action is a subgroup 1 0 G of G with index O (1).Now (N ∩ G )/N is central in G /N and of size O (1). 1 K,d 0 0 K,d 1 1 1 We thus have N  N ∩ G  G , where each successive quotient is polycyclic of length 1 1 O (1). It follows that G is polycyclic of length O (1), which is what we wanted to K,d K,d establish. Remark 11.12. — There are examples of groups which satisfy the assumptions of Corollary 11.11 yet have no nilpotent subgroup of index O (1). For instance, let p be a large prime and set G := (Z/pZ)  Z, where the action is by an element of SL (Z/pZ) −1 ∗ which is a diagonal matrix γ of the form γ := diag(x, x ),where x ∈ F is a generator of the multiplicative group of F . Then no subgroup of G of index less than p− 1 is nilpotent (note that such a subgroup must contain (Z/pZ) and be the preimage of the subgroup of Z with that index). However we can endow G with a uniformly doubling weighted word metric (with 3 generators) by letting the two standard generators of (Z/pZ) each have weight and γ have weight 1. We turn now to some geometric applications of the above results. Manifolds with a lower bound on Ricci Curvature. — A. Petrunin suggested to us some years ago that a result such as Corollary 11.5 would give a purely group-theoretical proof of a theorem of Fukaya and Yamaguchi [16] according to which fundamental groups of almost non-negatively curved manifolds are virtually nilpotent. Recall that a closed manifold M is said to be almost non-negatively curved if one can find a sequence of Riemannian metrics on it for which diam(M)  1 while K  −1/n where K is the M M sectional curvature. Indeed, a simple application of the Bishop-Gromov inequalities com- bined with Corollary 11.5 yields the following improvement assuming only a condition on the Ricci curvature. Corollary 11.13 (Ricci gap). — Given d ∈ N,there is ε(d)> 0 such that the following holds. Let M be an d -dimensional compact Riemannian manifold with Ricci curvature bounded below by −ε and diameter at most 1. Then π (M) has normal subgroup of index O (1), which is finite-by-(O(d)- 1 d nilpotent). In particular π (M) is virtually nilpotent. See also http://mathoverflow.net/questions/11091. THE STRUCTURE OF APPROXIMATE GROUPS 197 Proof. — Fix a base point x on the universal cover Mand letF be a Dirichlet fundamental domain based at x for the action of  := π (M):thatis, 0 1 F := p ∈ M : d(x , p)  d(γ · x , p) for all γ ∈  . 0 0 Set S := {γ ∈  : d(γ · x , x )  3}. Note that diam(F )  1 and that S is symmetric 0 0 and contains 1. Observe further that S generates  and that for every integer r  1we have B(x , r) ⊂ S ·F ⊂ B(x , 3r + 1),where B(x , r) is the ball of radius r on Mfor the 0 0 0 Riemannian metric lifted from M. It follows that |S | |B(x , 3r + 1)| (11.7)  . |S| |B(x , 1)| From the assumed Ricci curvature bound and the Bishop-Gromov volume com- parison estimates (see [17, Theorem 4.19]) we have the bound |B(x , r)| |B (r)| 0 −ε |B(x , 1)| |B (1)| 0 −ε where B (r) is a metric ball in the comparison model space with constant curva- −ε ture −ε and dimension d . The volume of this ball is |B (r)|= |B (r/ ε)|= −ε −1 ( ε) sinh( εt) d−1 c ( ) dt,where c > 0 is the volume of the d − 1-dimensional unit sphere (see d d 0 ε [17, p. 138] for this volume computation). As ε tends to 0, this tends to c r /d . Combining this with (11.7)weobtainthatfor every R  1 there is some ε = ε (d, R ) such that 0 0 0 0 |S | 2(3r + 1) |S| for all r  R provided that 0 <ε <ε . Letting R = R (2d) be as in Corollary 11.5, 0 0 0 0 we obtain the existence of some ε = ε(d)> 0 for which the conclusion of that statement holds. This completes the proof. Remark 11.14. —The fact that π (M) is virtually nilpotent under the above Ricci bounds assumptions was obtained by Cheeger and Colding in [11] (and had been con- jectured earlier by Gromov) and their proof was recently completed and extended by Kapovitch and Wilking [36], who also established that the index of the nilpotent sub- group is uniformly bounded by a constant depending on the dimension d only,anim- provement which seems beyond the scope of our methods. This extended earlier work of Kapovitch, Petrunin and Tuschmann in [35] which proved the same result under sec- tional curvature bounds instead of Ricci. The work of these authors is, unlike our work, differential-geometric in nature. The linear dependence in d of the nilpotency length proven in our Corollary 11.13 seems new however. 198 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO An isoperimetric inequality. — It has been well-known since the work of Varopoulos on Kesten’s conjecture ([42, 59]) that isoperimetric inequalities on Cayley graphs are closely related to lower bounds on the volume growth. Using this idea and Corollary 11.10,we can derive the following property of finite Cayley graphs with a polynomial upper bound on the diameter. Corollary 11.15 (Isoperimetric inequality on finite groups). — Let α> 0. Then there are r ∈ N and β> 0 with lim β(α) = 0 such that the following holds. Let G be a finite group generated α→0 by a symmetric set S and assume that the diameter of the associated Cayley graph satisfies diam (G) 1 1 1 α β 1−β (|G|/|S|) . Then for every subset E in G with r  |E|  |G|, |∂ E|  |S| |E| . 2 2 8 Proof. — This follows almost immediately from Corollary 11.10 and the follow- ing well-known lemma, which may be found in [28, Chapter 5] or [42] and references therein. For the convenience of the reader we offer a self-contained proof. Lemma 11.16 (Isoperimetry versus growth). — Let G be group and S some finite symmetric generating set containing 1.Let B(r) = S be the word ball of radius r in the word metric. Let ∂ E = SE \ E be the boundary of a subset E ⊂ G.If E ⊆ G is a set, write r(E) for the infimum of those r for which |B(r)|  2|E|. Then for all E with |E| < |G|/2 we have |E|  4r(E)|∂ E|. Proof. — We give a proof for the reader’s convenience. Let f = 1 the indicator function of the set E, and f := g · f be the average of f over balls of radius g∈B(r) |B(r)| r. By the triangle inequality we have g · f − f   |g|· max s · f − f  ,where |g| is 1 s∈S 1 the distance to the identity in the word metric. Moreover s · f − f  =|sE  E|  2|∂ E| for every s ∈ S. Hence f − f   2r|∂ E|.Onthe otherhandfor every x ∈ E, there are r 1 at most |E| elements g ∈ B(r) such that g · 1 (x) = 0. Therefore if |B(r)|  2|E| then 1 1 f (x)  and hence f − f   |E|. The claim follows. r r 1 2 2 In [1], Benjamini and Kozma conjecture that one can take β = α in the Corollary 11.15 (at the expense of introducing a possible multiplicative constant c in place of |S| /8 in (ii)). This, however, is beyond the scope of our method. We would like to thank Itai Benjamini for drawing our attention to their work and its connection to Gromov-type theorems. A generalized Margulis lemma. — In hyperbolic geometry, the Margulis lemma asserts that there is a constant ε = ε(n)> 0, the Margulis constant, such for any discrete sub- n n group  of isometries of the hyperbolic n-space H , and any point x ∈ H ,the almost stabiliser  (x) := {γ ∈  : d(γ · x, x)<ε} is virtually cyclic. This lemma is important for describing the geometry of cusps in hyperbolic manifolds, or for establishing volume lower bounds (see e.g. [55]). Various generalisations of this lemma have been established in the past for more general Riemannian manifolds under curvature upper and lower bounds (e.g. [9, Chapter 6]). Typically in these results, unless the manifold has strictly THE STRUCTURE OF APPROXIMATE GROUPS 199 negative curvature, “virtually cyclic” in the conclusion of the lemma must be replaced by “virtually nilpotent”. In [28, §5.F] Gromov raises the issue of establishing a generalized Margulis lemma un- der very weak assumptions on the metric space and he proposes a conjectural statement in this direction. Below we answer Gromov’s question affirmatively. A metric space X is said to have bounded packing with packing constant K if there is K > 0 such that every ball of radius 4 in X can be covered by at most K balls of radius 1. Say that a subgroup  of isometries of X acts discretely on X if every orbit is discrete in the sense that {γ ∈  : γ · x ∈ } is finite for every x ∈ X and for every bounded set  ⊆ X. Corollary 11.17 (Generalized Margulis Lemma). — Let K  1 be a parameter. Then there is some ε(K)> 0 such that the following is true. Suppose that X is a metric space with packing constant K, and that  is a subgroup of isometries of X which acts discretely. Then for every x ∈ X the “almost stabiliser”  (x) = S (x) ,where S (x) := {γ ∈  : d(γ · x, x)<ε}, is virtually nilpotent. ε ε ε Proof. — Each set S (x) is symmetric and contains the identity. Now by the assump- tion on X the ball B(x, 4) can be covered by collection of balls B(x , 1), i = 1, 2,..., K. Suppose that for i = 1, 2,..., k there is at least one element γ ∈ S (x) with γ · x ∈ B(x , 1). i 4 i i Suppose now that γ ∈ S (x) is arbitrary; then there is some i ∈{1, 2,..., k} such that −1 γ · x ∈ B(x , 1). But this means that d(γ · x,γ · x)< 2, and therefore γ γ ∈ S (x). This i i 2 implies that S (x) ⊆ γ S (x), which yields (since S (x) ⊆ S (x)) the doubling esti- 4 i 2 2 4 i=1 mate |S (x) |  K|S (x)|. 2 2 Let K = K (K)> 0 be the constant from Corollary 11.2.Set ε := 2/K ,S = S (x) and A = S (x). A direct application of Corollary 11.2 shows that  (x) = S is virtually 2 ε nilpotent. Remark 11.18. — This confirms Gromov’s conjecture, which suggested the same conclusion under the slightly stronger hypotheses that every ball of radius R in X can be covered by at most C(R/r) balls of radius r for all 0 < r < R  1 and some fixed constants C, m > 0. The assumptions of this generalized Margulis lemma are satisfied for example if X is a complete Riemannian manifold with a lower bound on its Ricci curvature, by an immediate application of the Bishop-Gromov volume comparison estimates. In this case, the result was proved by Cheeger-Colding [11] and Kapovitch-Wilking [36], namely: Corollary 11.19. —Let d  1 be an integer. Then there is ε = ε(d)> 0 with the following property. Suppose that M is a d -dimensional complete Riemannian manifold with a Ricci curvature lower bound Ric  −(d − 1) and that  is a subgroup of Isom(M) which acts properly discontinuously by isometries on M.Thenfor every x ∈ M the “almost stabliser”  (x) := {γ ∈  : d(γ · x, x)<ε} is virtually nilpotent. 200 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO In fact the result in [36, Theorem 1] is a stronger version of Corollary 11.19, establishing that  (x) has a nilpotent subgroup of index O (1). This stronger result ε d seems to be beyond the scope of our method. We also note that Corollary 11.11 applies to the Margulis lemma in the context of Riemannian d -manifolds with a lower bound on sectional curvature, because then the Gromov short basis has bounded cardinality from Toponogov’s theorem (see for instance [9, 37.3]). We thus get this way an alternate proof of the Fukaya-Yamaguchi theorem [16] according to which almost non-negatively curved n-manifolds have O (1)-virtually poly- cyclic fundamental group. Again, by [35] we know better, namely that they are O (1)- virtually nilpotent, but once again this seems beyond the scope of our method. Finally we would like to remark that the usual proofs of the classical Margulis lemma bear some resemblance to the proof of our main theorem in as much as they use a similar “shrinking commutator trick” to establish nilpotence. While we proved this shrinking commutator estimate for the escape norm associated to an approximate group as part of the Gleason lemmas (Theorem 8.1), in the Margulis lemma, one proves a similar estimate for the norm γ  = d(γ · x, x) by a riemannian geometric argument using the assumed curvature bounds. This “shrinking commutator trick” dates back at least to Bieberbach [2] in his proof of Jordan’s theorem on finite linear groups. Acknowledgements EB is supported in part by the ERC starting grant 208091-GADA. He also ac- knowledges support from MSRI where part of this work was finalized. TT is supported by a grant from the MacArthur Foundation, by NSF grant DMS-0649473, and by the NSF Waterman award. The first author would like to thank E. Lindenstrauss from whom he first learned about these questions and for several related discussions. We also acknowledge the huge intellectual debt we owe to prior work of Hrushovski [33], without which we would not have started this project. We are grateful to him for several enlightening discussions re- garding his work and the subject matter of the present paper. We also thank I. Goldbring and L. van den Dries for showing us a preliminary version of their notes on Hilbert’s fifth problem and its local versions, B. Hayes for corrections, and J. Lott for help with the references. Finally, all three authors would like to thank T. Sanders for a number of valuable discussions concerning this work. Appendix A: Basic theory of ultralimits and ultraproducts In this appendix we review the machinery of ultralimits and ultraproducts. We will bor- row some terminology from nonstandard analysis in order to do this, although we will not rely too heavily on nonstandard machinery in this paper. THE STRUCTURE OF APPROXIMATE GROUPS 201 We will assume the existence of a standard universe U which contains all the objects and spaces that one is interested in (such as the natural numbers N,the real numbers R, the classical Lie groups, etc.). The precise construction of this universe is not particularly important for our purposes, so long as it forms a set. We refer to objects and spaces inside the standard universe as standard objects and standard spaces, with the latter being sets whose elements are in the former category. We will rely heavily on the existence of a nonprincipal ultrafilter. Lemma A.1 (Ultrafilter lemma). — There exists a collection α of subsets of the natural numbers N with the following properties: (i) (Monotonicity) If A ∈ α and B ⊇ A, then B ∈ α. (ii) (Closure under intersection) If A, B ∈ α, then A ∩ B ∈ α. (iii) (Maximality) If A ⊆ N, then either A ∈ α or N\A ∈ α, but not both. (iv) (Non-principality) If A ∈ α,and A is formed from A by adding or deleting finitely many elements to or from A, then A ∈ α. We refer to a collection α obeying the above axioms as a nonprincipal ultrafilter. Proof. — The collection of cofinite subsets of N already obeys the monotonicity, closure under intersection, and non-principality properties. Using Zorn’s lemma, one can enlarge this collection to a maximal collection which, it may be verified, has all the required properties. Throughout the paper, we fix a non-principal ultrafilter α.Aproperty P(n) de- pending on a natural number n is said to hold for n sufficiently close to α if the set of n for which P(n) holds lies in α. A set of natural numbers lying in α will also be called an α-large set. Once we have fixed this ultrafilter, we can define nonstandard objects and spaces. Definition A.2 (Nonstandard objects). — Given a sequence (x ) of standard objects in U, n n∈N we define their ultralimit lim x to be the equivalence class of all sequences (y ) of standard n→α n n n∈N objects in U such that x = y for n sufficiently close to α. Note that the ultralimit lim x can also n n n→α n be defined even if x is only defined for n sufficiently close to α. An ultralimit of standard natural numbers is known as a nonstandard natural number,an ultralimit of standard real numbers is known as a nonstandard real number,and so on. By using this lemma, our results thus rely on the axiom of choice, which we will of course assume throughout this paper. On the other hand, it is possible to rephrase the purely combinatorial results in this paper, such as Theorem 2.10, in the language of Peano arithmetic. Applying a famous theorem of Gödel [22], we then conclude that Theorem 2.10 is provable in ZFC if and only if it is provable in ZF. In fact it is possible, with significant effort, to directly translate these ultrafilter arguments to a much lengthier argument in which neither ultrafilters nor the axiom of choice are used. However, this would require one to “finitise” or “proof-mine” such infinitary results as the Heine-Borel theorem or Theorem B.18, and this in turn would require finitisations of the construction of Haar measure and the Peter-Weyl theorem. This would lead to a vastly messier argument. 202 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO For any standard object x, we identify x with its own ultralimit lim x. Thus, every standard n→α natural number is a nonstandard natural number, etc. Any operation or relation on standard objects can be extended to nonstandard objects in the obvious manner. Indeed, if O is a k-ary operation, we define 1 k 1 k O lim x ,..., lim x := lim O x ,..., x n n n n n→α n→α n→α 1 k 1 k and if R is a k-ary relation, we define R(lim x ,..., lim x ) to be true iff R(x ,..., x ) is n→α n→α n n n n true for all n sufficiently close to α. One easily verifies that these nonstandard extensions of O and R are well-defined. Example 23. — The sum of two nonstandard real numbers lim x ,lim y is n→α n n→α n the nonstandard real number lim x + lim y = lim x + y , n n n n n→α n→α n→α and the statement lim x < lim y means that x < y for all n sufficiently close n→α n n→α n n n to α. Definition A.3 (Ultraproducts). — Let (X ) be a sequence of standard spaces X in U n n∈N n indexed by the natural numbers. The ultraproduct X of the X is defined to be the space of n n n→α all ultralimits lim x , where x ∈ X for all n. We refer to the ultraproduct of standard sets as an n→α n n n nonstandard set; in a similar vein, an ultraproduct of standard groups is a nonstandard group, and an ultraproduct of standard finite sets is a nonstandard finite set. We refer to X := X n→α as the ultrapower of a standard set X; the identification of x with lim x causes X to be identified n→α ∗ ∗ withasubsetof X. We will refer to the ultrapower U of the standard universe U as the nonstandard universe. Remark A.4. — Nonstandard sets in nonstandard analysis behave analogously in some ways to measurable sets in measure theory; for instance, the union or intersection of two nonstandard sets is again a nonstandard set. Also, just as a subset of a measurable set need not be measurable, a subset of a nonstandard set need not be another nonstan- dard set. For instance, the nonstandard natural numbers N is a nonstandard set (being the ultraproduct of the sequence N, N,...), but the standard natural numbers N, despite being a subset of N, is not a nonstandard set. A fundamental property of ultralimits is that they preserve first-order statements and predicates, a fact known as Łos’s theorem. Here is one formalisation of this theorem. Actually, the notion of an elementary set (e.g. a finite union of intervals) would be an even closer analogy here than the notion of a measurable set. THE STRUCTURE OF APPROXIMATE GROUPS 203 Theorem A.5 (Łos’s theorem with parameters). — Let m be a standard natural number, and for each 1  i  m, let x = lim x be a nonstandard object. If P(y ,..., y ) is a predicate, then i n→α i,n 1 m P(x ,..., x ) is true (as quantified over the nonstandard universe U) if and only if P(x ,..., x ) 1 m 1,n m,n is true for all n sufficiently close to α (as quantified over the standard universe U). Proof. — (Sketch) By definition, Łos’s theorem is true for “primitive” predicates which take the form R(x ,..., x ) for some primitive k-ary relation R and objects 1 k x ,..., x ,orofthe form x = O(x ,..., x ) for some primitive k-ary operator O. From 1 k k+1 1 k the ultrafilter axioms, we also see that Łos’s theorem is closed with respect to boolean operations; for instance, if Theorem A.6 holds for P(x ,..., x ) and Q(x ,..., x ),then 1 m 1 m it also holds for ¬PorP ∧ Q. Now, we claim that if Łos’s theorem holds for the predicate P(x ,..., x ),then 1 m it also holds for the quantified predicates ∃x : P(x ,..., x ) and ∀x : P(x ,..., x ) m 1 m m 1 m (where now there are only m − 1free variables x ,..., x ,with x being bound). We 1 m−1 m show this just for the existential quantifier ∃, as the case of the universal quantifier ∀ is similar (and can be deduced from the existential case by negation). Suppose first that ∃x : P(x ,..., x ) is true in U. Then there exists x = lim x such that P(x ,..., x ) m 1 m m n→α m,n 1 m holds; by hypothesis, this implies that P(x ,..., x ) holds for n sufficiently close to 1,n m,n α,and thus ∃x : P(x ,..., x , x ) holds for n in U sufficiently close to α as de- m 1,n m−1,n m sired. Conversely, if ∃x : P(x ,..., x , x ) holds in U for n sufficiently close to α, m 1,n m−1,n m then by the axiom of (countable) choice, we may find x ∈ U for such n such that m,n P(x ,..., x , x ) holds. Setting x := lim x , we conclude that P(x ,..., x ) 1,n m−1,n m,n m n→α m,n 1 m holds, and the claim follows. The above discussion yields Łos’s theorem for any predicate that can be built out of primitive predicates by a finite number of boolean operations and quantifications. However, it is easy to see that all predicates are logically equivalent to a predicate of this form. For instance, ∀a∀b∀c : (a + b) + c = a + (b + c) is equivalent to ∀a∀b∀c∃d∃e∃f : (d = a + b) ∧ (e = b + c) ∧ (f = d + c) ∧ (f = a + e). This completes the proof. In applications, we will actually use a slight generalisation of Łos’s theorem. Theorem A.6 (Łos’s theorem with parameters and ultraproducts). — Let m, k be standard natural numbers. For each 1  i  m, let x = lim x be a nonstandard object, and for each 1  j  k, let i n→α i,n A = A be a nonstandard set. If P(y ,..., y ; B ,..., B ) is a predicate over m objects and j j,n 1 m 1 k n→α k sets, with the sets A ,..., A only appearing in P through the membership predicate x ∈ B for various 1 k j j and various objects B ,then P(x ,..., x ; A ,..., A ) is true (as quantified over the nonstandard j 1 m 1 k universe U) if and only if P(x ,..., x ; A ,..., A ) is true for all n sufficiently close to α (as 1,n m,n 1,n k,n quantified over the standard universe U). 204 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO Proof. — We replace each appearance of x ∈ B in P with a new primitive relation R (x, n), which is interpreted in U as x ∈ A . This replaces the pred- j j,n icate P(y ,..., y ; B ,..., B ) by a predicate Q(y ,..., y , n),with P(x ,..., x ; 1 m 1 k 1 m 1,n m,n A ,..., A ) logically equivalent to Q(x ,..., x , n). One easily verifies that 1,n k,n 1,n m,n P(x ,..., x ; A ,..., A ) is logically equivalent to Q(x ,..., x , lim n), and the claim 1 m 1 k 1 m n→α now follows from Theorem A.5. Example 24. — Any ultraproduct G := G of groups G is again a group, n n n→α because one can write the property of G being a group as a predicate P(G) that involves −1 membership in G (as well as the constant id and the group operations ·,() , of course). Conversely, if G = G is a group, then G is a group for all n sufficiently close to α. n n n→α Example 25. —Let G = G be an ultraproduct of groups (and thus also a n→α group), and let A = A and B = B be subsets of G that are nonstandard n n n→α n→α sets. Then, for n sufficiently close to α,A and B are subsets of G and B (because n n n n this statement can be written as a predicate involving membership in A , B , G ). In a n n n similar (but more complicated) spirit, for any standard K ∈ N, A can be covered by K left-translates of B if and only if, for n sufficiently close to α,A can be covered by K left-translates of B . A nonstandard real number x ∈ R is said to be bounded if one has |x|  Cfor some standard C > 0, and unbounded otherwise. Similarly, we say that x is infinitesimal if |x|  c for all standard c > 0; in the former case we write x = O(1), and in the latter x = o(1). For every bounded real number x ∈ R there is a unique standard real number st(x) ∈ R, called the standard part of R,suchthat x = st(x) + o(1), or equivalently that st(x) − ε x  st(x) + ε for all standard ε> 0. Indeed, one can set st(x) to be the supremum of all the real numbers y such that x > y (or equivalently, the infimum of all the real numbers y such that x < y). We write X = O(Y),X  Y, or Y Xif we haveX  CY for some standard C. Given a sequence f : X → Y of standard functions between standard sets n n n X , Y , one can form the ultralimit f := lim f , which is a function from the ultra- n n n→α n product X := X to the ultraproduct Y := Y defined by the formula n n n→α n→α f lim x := lim f (x ). n n n n→α n→α Such ultralimits will be called nonstandard functions (and are also known as internal functions in the nonstandard analysis literature). In particular, since standard finite sequences (a ) n=1 of standard reals a ∈ R with some standard length N ∈ N can be viewed as a function n → a from {1,..., N} to R, one can thus define nonstandard finite sequences (a ) of non- n n n=1 ∗ ∗ standard reals a ∈ R with some nonstandard length N ∈ N as an ultralimit of standard finite sequences (a ) ,thus N = lim N and n ,n n→α n n n =1 a = lim a . lim n n ,n n→α n n n→α THE STRUCTURE OF APPROXIMATE GROUPS 205 One can then transplant various operations on standard finite sequences to their non- standard counterparts, and can in particular define the sum a ∈ R n=1 N n of a nonstandard finite sequence (a ) = lim (a ) by the formula n n→α n ,n n=1 n n =1 N N a := lim a . n n ,n n→α n=1 n =1 Appendix B: Local groups In this appendix we recall the basic definitions and notations of (symmetric) local group theory, following Goldbring [24]. −1 Definition B.1 (Local group). — A symmetric local group G = (G, id, ·,() ) is a topo- logical space G with a distinguished element id ∈ G (the identity element), together with a globally −1 defined inversion map () : G → G and a partially defined product map ·:  → G, obeying the following axioms: (i) (Partial closure)  is an open neighbourhood of (G ×{1}) ∪ ({1}× G) in G × G. −1 −1 (ii) (Continuity) The maps () : x → x and ·: (x, y) → x · y are continuous on G and respectively. (iii) (Local associativity) If g, h, k ∈ G are such that (g · h) · kand g · (h · k) are well-defined (thus (g, h), (g · h, k), (h, k), (g, h · k) all lie in ), then (g · h) · k = g · (h · k). (iv) (Identity) For any g ∈ G,one has id · g = g · id = g. −1 −1 −1 (v) (Invertibility) If g ∈ G,then g · g and g · g are well-defined (i.e. (g, g ), −1 (g , g) ∈ ) and are equal to id. If necessary, we will write id, as id , to reduce confusion. If  = G × G, we call G a global G G group or a topological group. −1 If G has the structure of a smooth finite-dimensional real manifold, and the inversion map () and product map · are smooth maps, we say that G is a local Lie group. Remark B.2. — One can also consider non-symmetric local groups, in which the −1 inversion map () is only defined on an open neighbourhood  of the identity. However, the theory of non-symmetric local groups contains some minor additional technicalities caused by the existence of non-invertible elements which we wish to avoid here. As we will not consider non-symmetric local groups anywhere in this paper, we will often omit the adjective “symmetric” from the term “local group” when there is no chance of confusion. 206 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO Following [24], we do not explicitly assume that G is Hausdorff. In practice, though, one can reduce to the Hausdorff case because the closure of the identity ele- ment will turn out to be a closed normal subgroup that one can quotient out by. Example 26. — If G is a symmetric local group and U is a symmetric open −1 neighbourhood of the identity (thus g ∈ U whenever g ∈ U), then U can also be viewed as a symmetric local group, by restricting the domain  of the product maps to {(g, h) ∈  ∩ (U × U) : g · h ∈ U} (and also restricting the topological structure of G to U). We will sometimes write this symmetric local group as G to emphasise that it is the restriction of G to U. In particular, an important source of local groups comes from restricting a global group to an open symmetric neighbourhood of the identity. One can also restrict G to non-open symmetric neighbourhoods of the identity, but the resulting object obtained is not necessarily a symmetric local group (see e.g. Example 28 below). We say that two symmetric local groups G, G are locally identical if they have a common restriction, thus there exists a U which is an open symmetric neighbourhood of the identity 1 = 1  in both G and G for which the group operations on G and G ,when G G restricted to U, agree completely (in particular, they have the same domain and range). This is an equivalence relation, and we will focus on those properties of symmetric local groups that are preserved up to local identity. In a similar spirit, we say that two subsets A, B of a symmetric local group in G are locally identical if there exists an open neighbourhood U of the identity in G such that A ∩ U = B∩ U. For instance, all neighbourhoods of the identity are locally identical. Note that every open neighbourhood if the identity contains an open symmetric neighbourhood, so we can assume here that U is symmetric without loss of generality. Remark B.3. — Symmetric local groups are defined as topological groups, but if one wishes, one can restrict attention to discrete symmetric local groups, in which every set is open. In this case, all references to continuity, openness, and the Hausdorff property in Definition B.1 can be omitted as being automatically satisfied. On the other hand, all discrete local groups are locally equivalent to the trivial local group {id}. Example 27. —If g is a (finite-dimensional) Lie algebra, and B is a sufficiently small symmetric open neighbourhood of the identity in g,then exp(B) is a symmetric local group, with the multiplication law given by the Baker-Campbell-Hausdorff formula. Example 28. — The closed interval [−1, 1] in R with the addition operation is not a symmetric local group, because the set {(x, y) ∈[−1, 1]×[−1, 1]: x + y ∈[−1, 1]} is not open in [−1, 1]×[−1, 1].However,the open interval (−1, 1) is a symmetric local group. THE STRUCTURE OF APPROXIMATE GROUPS 207 Given any finite number of elements g ,..., g in a global group G, one can use 1 m the associativity axiom to unambiguously define the product g ... g . In a symmetric local 1 m group, one can only define this product g ... g locally. We formalise this as a definition: 1 m Definition B.4 (Finite products). — Let g ,..., g be a finite number of elements in a symmetric 1 m local group G. We say that the product g ... g is well-defined in G (or well-defined for short) if, 1 m for each 1  i  j  m, we can find a group element g ∈ G with the following properties: [i,j] • For each 1  i  m, we have g = g . [i,i] i • If 1  i  j < k  m, the product g · g is well-defined (i.e. (g , g ) ∈ )and [i,j] [j+1,k] [i,j] [j+1,k] equal to g . [i,k] By induction we see that if these group elements g exist, then they are unique. We then define [i,j] g ... g := g .If g = ··· = g = g, we abbreviate g ... g as g . By abuse of notation, we also 1 k [1,k] 1 k 1 k write g ... g ∈ G to denote the assertion that g ... g is defined in G. 1 m 1 m We adopt the convention that g ... g = id when m = 0. 1 m An easy induction using the local associativity axiom shows that if g ,..., g ∈ Gis 1 m such that g ... g is well-defined whenever 1  i < j  m with (i, j) = (1, m),and (g ... g ) · i j i j (g ... g ) is well-defined whenever 1  i  j < k  m,then g ... g is well-defined, and j+1 k 1 m we have (g ... g ) = (g ... g ) · (g ... g ) i k i j j+1 k for all 1  i  j < k  m. Remark B.5. — It is worth pointing out one subtlety here: in order for g ... g to be 1 m well-defined, it is necessary that all possible ways of decomposing this m-fold product into pairwise products be well-defined. For instance, for g g g to be well-defined, both (g · 1 2 3 1 g ) · g and g · (g · g ) need to be well-defined. Similarly, if g , g , g , g are such that g g g , 2 3 1 2 3 1 2 3 4 1 2 3 (g g g ) · g , g g g ,and g · (g g g ) are well-defined, this is not yet sufficient to deduce that 1 2 3 4 2 3 4 1 2 3 4 g g g g is well-defined, because (g g ) · (g g ) need not be well-defined. For instance, in 1 2 3 4 1 2 3 4 the (additive) local group {−1, 0, +1}, the expression (+1) + (−1) + (−1) + (+1) is not well-defined, because (−1) + (−1) is not well-defined. Related to this is the well-known fact that local associativity does not imply global associativity: it is possible for two different ways of decomposing an m-fold product into pairwise products to both exist, but give distinct values; see [41] for further discus- sion. For instance, there exists a local group G and elements g , g , g , g ∈ Gsuch that 1 2 3 4 ((g · g ) · g ) · g and g · (g · (g · g )) both exist, but are not equal to one another. Of 1 2 3 4 1 2 3 4 course, in this case, we do not consider g g g g to be well-defined. 1 2 3 4 Another easy induction also shows that for each m  1, the set of tuples m m (g ,..., g ) ∈ G for which g ... g is well-defined is an open subset of G . 1 m 1 m 208 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO Now we extend the notion of products and inverses from individual group elements to sets of such elements. Definition B.6. —Let G be a symmetric local group. A subset A of G is said to be sym- −1 −1 metric if the set A := {g : g ∈ A} is contained in A.If A ,..., A are subsets of G,wesay 1 m that A ... A is well-defined in G (or well-defined for short) if g ... g is well-defined for all 1 m 1 m g ∈ A ,..., g ∈ A , in which case we write A ... A := {g ... g : g ∈ A ,..., g ∈ A }.If 1 1 m m 1 m 1 m 1 1 m m A = ··· = A = A,weabbreviate A ... A as A . By abuse of notation, we write A ... A ⊂ G 1 m 1 m 1 m for the assertion that A ... A is well-defined in G. We adopt the convention that A ... A ={id} 1 m 1 m when m = 0. In particular, A ={id} for any A ⊂ G. An easy induction (see [24, Lemma 2.5]) shows that for any local group G and any open neighbourhood U of the identity, there exists a nested sequence U ⊃ U ⊃ U ⊃ 0 0 1 2 ··· of symmetric open neighbourhoods of the identity such that U ⊂ U for every m+1 m  0, which in particular implies that U is well-defined in U ,and thus A ... A is 0 1 m well-defined in U whenever A ,..., A ⊂ U . 0 1 m m We make the trivial remark that multiplication of sets is associative: if A ... A 1 m is well-defined, then for any 1  i  j < k  m, (A · A ) · (A ... A ) and A ... A are i j j+1 k i k well-defined and equal to each other. By passing to neighbourhoods such as U , one can improve the group-like proper- ties of a local group. To illustrate this principle, let us first introduce the following defini- tion. Definition B.7 (Cancellative local groups). — A symmetric local group G is said to be can- cellative if the following assertions hold: (i) Whenever g, h, k ∈ G are such that gh and gk are well-defined and equal to each other, then −1 −1 h = k. (Note that this implies in particular that (g ) = g.) (ii) Whenever g, h, k ∈ G are such that hg and kg are well-defined and equal to each other, then h = k. −1 −1 −1 −1 −1 (iii) Whenever g, h ∈ G are such that gh and h g are well-defined, then (gh) = h g . (In particular, if U ⊂ G is symmetric and U is well-defined in G for some m  1, then U is also symmetric.) Clearly all global groups are cancellative. A local group need not be cancellative everywhere; however, we can restrict to a large subset on which it is cancellative, by using the following proposition. Proposition B.8. —Let G be a symmetric local group, and let U be an open symmetric neigh- bourhood of the identity in G such that U is well-defined. Then the restriction of G to U is cancellative. In particular, the restriction of G to the open symmetric neighbourhood U dis- cussed earlier is cancellative. We shall see later that the property of being cancellative is THE STRUCTURE OF APPROXIMATE GROUPS 209 hereditary in that it is inherited by passing to subgroups and quotients, and because of this we will be able to easily restrict attention to the cancellative case in our arguments. −1 −1 −1 Proof.—If g, h ∈ U, then (gh) ghh g is well-defined in G. By evaluating this well-defined expression in two different ways we conclude property (iii). In a similar spirit, −1 −1 by evaluating g gh and g gk for g, h, k ∈ U in two different ways, we obtain (i); and similarly for (ii). Lemma B.9. —Let G be a symmetric local group, and let U, V be open sets with id ∈ V. Then U ⊂ U · V if U · V is well-defined, and similarly U ⊂ V · U if V · U is well-defined. Proof. — We prove the first claim only, as the second is similar. Suppose that g is an adherent point of U. By continuity, we can find an open neighbourhood W of g and −1 −1 an open neighbourhood Y of the identity such that g · g · W · Y is well-defined and −1 −1 Y ⊂ V. By continuity, the set {h ∈ W : g h ∈ Y} is an open neighbourhood of g,and −1 −1 −1 thus contains an element h of U. Writing v := g h and expanding out g · g · h · v in −1 two different ways, we conclude that g = hv ,and thus g ∈ U · V as required. We can give the class of local groups the structure of a category by defining the notion of a (continuous) homomorphism. Definition B.10 (Homomorphisms). — Let G, H, K be symmetric local groups. A continuous homomorphism φ : G → H is a continuous map from G to H with the following properties: (i) φ maps the identity of G to the identity of H: φ(id ) = id . G H −1 −1 (ii) For every g ∈ G, we have φ(g) = φ(g ). (iii) If g, h ∈ G are such that g · h is well-defined, then φ(g) · φ(h) is well-defined and is equal to φ(g · h). We will often omit the adjective “continuous” when G is discrete. A local homomorphism from G to H is a continuous homomorphism φ : U → H from a symmetric open neighbourhood U of the identity of G to H,where of course we give U the structure of the restricted local group G from Example 26. Two local homomorphisms φ : U → H, φ : U → H are equivalent if there exists a neighbourhood V of the identity contained in both U and U such that φ and φ agree on V; this is an equivalence relation. A local morphism is an equivalence class of local homomorphisms. Given two local homomorphisms φ : U → H and ψ : V → K from G to H and H to K respectively, we define the composition map ψ ◦ φ : U → K by ψ ◦ φ(g) := ψ(φ(g)),where U := {g ∈ U : φ(u) ∈ V}. This allows one to define a composition of two local morphisms in the obvious manner. Example 29. — There are no non-trivial global morphisms from the unit circle R/Z to R. However, there do exist non-trivial local morphisms, such as (the equivalence 210 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO class of) the map φ from (−1/4, 1/4) mod 1 to R defined by setting φ(x mod 1) := x for all x ∈ (−1/4, 1/4). The concept of a local homomorphism is closely related to that of a Freiman homomorphism in additive combinatorics, as discussed for example in [54]. One easily verifies that continuous homomorphisms and local morphisms both obey the axioms of a category; in particular, the composition of two continuous homo- morphisms is a continuous homomorphism, and the composition of two local morphisms is again a local morphism. As usual in category theory, we can now say that two local groups G, G are locally isomorphic if there exists a local morphism φ from G to G with an inverse φ from G to G which is also a local morphism, such that the compositions φ ◦ φ or φ ◦ φ are equivalent to the identity. Thus, for instance, the unit circle R/Z and the line R are locally isomorphic. This notion of local isomorphism generalises the notion of local identity from Remark 26. Definition B.11 (Sub-local groups [24]). — Given two symmetric local groups G and G,we say that G is a sub-local group of G if G is the restriction of G to a symmetric neighbourhood of the identity, and there exists an open neighbourhood V of G with the property that whenever g, h ∈ G are such that gh is defined in V,then gh ∈ G ; we refer to V as an associated neighbourhood for G .If G is also a global group, we say that G is a subgroup of G. If G is a sub-local group of G, we say that G is normal if there exists an associated neigh- −1 bourhood V for G with the additional property that whenever g ∈ G , h ∈ V are such that hg h is −1 well-defined and lies in V, then hg h ∈ G . We call V a normalising neighbourhood of G . Example 30. —If G, G are the (additive) local groups G := {−2, −1, 0, +1, +2} and G := {−1, 0, +1},then G is a sub-local group of G (with associated neighbourhood V = G ). Note that this is despite G not being closed with respect to addition in G; thus we see why it is necessary to allow the associated neighbourhood V to be strictly smaller than G. In a similar vein, the open interval (−1, 1) is a sub-local group of (−2, 2). The interval (−1, 1) ×{0} is also a sub-local group of R ; here, one can take for instance (−1, 1) as the associated neighbourhood. As all these examples are abelian, they are clearly normal. Example 31. —Let T : V → V be a linear transformation on a finite-dimensional vector space V, and let G := Z  V be the associated semi-direct product. Let G := {0}× W, where W is a subspace of V that is not preserved by T. Then G is not a normal subgroup of G, but it is a normal sub-local group of G, where one can take {0}× Vas a normalising neighbourhood of G . Observe that any sub-local group of a cancellative local group is again a cancella- tive local group. One also easily verifies that if φ : U → H is a local homomorphism from G to H for some open neighbourhood U of the identity in G, then ker(φ) is a normal sub-local THE STRUCTURE OF APPROXIMATE GROUPS 211 group of U, and hence of G. Note that the kernel of a local morphism is well-defined up to local identity. If H is Hausdorff, then the kernel ker(φ) will also be closed. Conversely, normal sub-local groups give rise to local homomorphisms into quo- tient spaces. Lemma B.12 (Quotient spaces [24]). — Let G be a cancellative local group, and let H be a normal sub-local group with normalising neighbourhood V.Let W be a symmetric open neighbourhood of the identity such that W ⊂ V. Then there exists a cancellative local group W/H andasurjective continuous homomorphism φ : W → W/H such that, for any g, h ∈ W, one has φ(g) = φ(h) if and −1 −1 only if gh ∈ H, and for any E ⊂ W/H, one has E open if and only if φ (E) is open. −1 Proof. — We define an equivalence relation on W by declaring g ∼ h if gh ∈ H. Using the cancellative properties of V (and hence of W ) we see that this is indeed an equivalence relation. We let W/H := {[g] : g ∈ W} be the set of equivalence classes [g] := {h ∈ W : g ∼ h}, with the obvious projection map π : W → W/H. We define −1 −1 an inversion relation on W/H by setting [g] := [g ] , and a product operation by setting [g] [h] to equal [g h ] if g h ∈ W for at least one representative g , h of [g] , [h] ∼ ∼ ∼ ∼ ∼ respectively. We now verify that these relations are well-defined. To make the inversion relation −1 −1 well-defined, we need to verify that if g ∼ h,then g ∼ h . But from the cancellative 6 −1 −1 −1 −1 −1 −1 6 properties of W ,wehave g (h ) = g (gh ) g, and the claim follows as W is a normalising neighbourhood for H. Similarly, to make the multiplication relation well- defined, we need to verify that if g, g , h, h are such that g ∼ g , h ∼ h ,and gh, g h ∈ W, −1  −1   −1  −1 6 then gh ∼ g h .But (gh)(g h ) = (g(g ) )g (h(h ) )(g ) , and the claim follows as W is a normalising neighbourhood for H. Similar arguments (which we omit) show that W/H obeys the identity, inverse, and local associativity axioms. Next, we give W/H the quotient topology, declaring a set E in W/Hopen iff its −1 inverse image π (E) is open in W (or equivalently, in G). One easily verifies that W/H becomes a symmetric local group, and the claim follows. Example 32. — Let G be the additive local group G := (−2, 2) , and let H be the sub-local group H := {0}× (−1, 1), with normalising neighbourhood V := (−1, 1) . If we then set W := (−0.1, 0.1) , then the hypotheses of Lemma B.12 are obeyed, and W/H can be identified with (−0.1, 0.1), with the projection map φ : (x, y) → x. Example 33. — Let G be the torus (R/Z) , and let H be the sub-local group H ={(x,αx) mod Z : x ∈ (−0.1, 0.1)},where 0 <α < 1 is an irrational number, with 2 2 2 2 normalising neighbourhood (−0.1, 0.1) mod Z .Set W := (−0.01, 0.01) mod Z . Then the hypotheses of Lemma B.12 are again obeyed, and W/Hcan be identified with the interval I := (−0.01(1 + α), 0.01(1 + α)), with the projection map φ : (x, y) 2 2 mod Z → y − αx for (x, y) ∈ (−0.01, 0.01) . Note, in contrast, that if one quotiented Gby the global group H ={(x,αx) mod Z : x ∈ R} generated by H, the quotient 212 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO would be a non-Hausdorff space (and would also contain a dense set of torsion points, in contrast to the interval I which is “locally torsion free”). It is because of this patho- logical behaviour of quotienting by global groups that we need to work with local group quotients instead. Remark B.13. — As we have seen in the above discussion, many familiar concepts in (global) group theory have analogues in the local group setting. We will however men- tion one important global group-theoretic concept that does not have a convenient local analogue, and that is the notion of the global group A generated by a set A of genera- tors. The problem is that this global group A consists of words in A of arbitrarily length, whereas in a local group one can typically only multiply together a bounded number m −1 m of elements of A. However, sets such as A or (A ∪ A ∪{id}) for various choices of exponent m can sometimes serve as a partial substitute for this concept in local group theory, though one of course has to keep track of the precise value of m throughout the argument. Locally compact local groups. — Recall that a topological space X is said to be locally compact if and every point in X has a compact neighbourhood. In particular, one can speak of a locally compact symmetric local group. To verify local compactness of a symmetric local group, it suffices to do so at the identity. Lemma B.14. —Let G be a symmetric local group. Then G is locally compact if and only if there is a compact symmetric neighbourhood of the identity. Proof.—[24, Lemma 2.16] The “only if ” part is clear (since id already has a compact neighbourhood). Now we turn to the “if ” part. Let K be a compact symmetric neighbourhood of the identity. By continuity, there exists an open neighbourhood V of g −1 −1 −1 −1 −1 such that g · V · V · g is well-defined and g · V · V · g ⊂ K. In particular, h → g h −1 −1 −1 is a homeomorphism from V · V · g to g · V · V · g which is inverted by the map −1 k → gk. By Lemma B.9, we conclude that h → g h is also a homeomorphism from Vto −1 −1 −1 g · V = g · V. In particular, since g · V is a closed subset of K, it is compact, and so Vis compact also. Thus g has a precompact neighbourhood as required. Corollary B.15. —If G is a locally compact symmetric local group, and U is a symmetric open neighbourhood of the identity, then U is also a locally compact local group. Proof. — By Lemma B.14, G contains a symmetric precompact open neighbour- hood V of the identity. By continuity, one can find a symmetric open neighbourhood W of the identity such that W · W is well-defined in V ∩ U. By Lemma B.9, we conclude that the closure W in U is thesameasthe closureof W in G; as it is contained in the THE STRUCTURE OF APPROXIMATE GROUPS 213 precompact set V, it is thus precompact. The claim then follows from another application of Lemma B.14. An important subclass of the locally compact local groups are the (symmetric) local Lie groups, defined as those (symmetric) local groups which are also smooth finite-dimensional real manifolds, such that the group operations are smooth on their domain of definition. We have the following basic theorem. Theorem B.16 (Lie’s third theorem). — Every local Lie group is locally isomorphic to a global Lie group. Furthermore, one can take the global Lie group to be both connected and simply connected. See e.g. [50] for a proof. We have the following deep structure theorem for locally compact global groups, due to Gleason and Yamabe [61]. Theorem B.17 (Gleason-Yamabe). — Suppose that G is a locally compact global group. Then thereisanopensubgroup G of G with the following property: inside any neighbourhood of the identity U ⊆ G , there is a compact normal subgroup H such that G /H is isomorphic to a connected global Lie group. The analogous theorem for locally compact local groups was established more recently by Goldbring. Theorem B.18 (Goldbring). — Suppose that G is a locally compact local group. Then some restriction G of G to a symmetric neighbourhood of the identity has the following property. Inside any neighbourhood of the identity U ⊆ G , there is a compact normal subgroup H such that G /H is isomorphic to a local Lie group. Proof. — The only self-contained proof of Theorem B.18 in the literature is in the thesis [23], where it follows from a combination of Section 4.5 and [23,Proposi- tion 4.7.1]. A more easily accessible account of essentially the same material follows by combining [56, Proposition 4.1] (reduction to the NSS case) with [24, §8] (treatment of the NSS case). Alternatively (though ultimately more circuitously) one may apply the main result of [56], which shows that G has a restriction in common with a global locally compact group, followed by Theorem B.17. For our applications, we only need to apply Theorem B.18 when G is metrisable, although the general case can be deduced from the metrisable case without much effort. Appendix C: Nilprogressions and related objects In this appendix we prove two basic facts about coset nilprogressions in normal form, namely that after shrinking the length parameter slightly they are approximate groups, 214 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO and are globalisable: that is to say isomorphic to subsets of a global group. The proofs of these facts are quite short due to the strength of the normal form axioms. One can establish similar assertions without the normal form hypothesis, but the arguments are much more complicated in that they require one to work with an explicit basis for the free nilpotent group. They are not needed in this paper. Lemma C.1. —Let P = P (u ,..., u ; N ,..., N ) be a coset nilprogression in C-normal H 1 r 1 r form. Then for all ε> 0 that are sufficiently small depending on r, C, one has (C.1) (1 + N )...(1 + N )|H| P (u ,..., u ; εN ,...,εN ) 1 r ε,C,r H 1 r 1 r (1 + N )...(1 + N )|H| C 1 r and hence, by the volume bounds on P, P (u ,..., u ; εN ,...,εN ) |P|. H 1 r 1 r ε,C,r Furthermore, P (u ,..., u ; εN ,...,εN ) is a O (1)-approximate group. H 1 r 1 r ε,C,r Proof. — By quotienting out the finite group H, which is normalised by P (u ,..., H 1 u ; εN ,...,εN ) (say) if ε is small enough, we may assume that H is trivial. The upper r 1 r bound in (C.1) is then immediate from the upper bound in (2.2), while the lower bound follows from the local properness axiom in Definition 2.6. From (C.1) and the Ruzsa covering lemma we see that for ε small enough, P (u ,..., u ; 2εN ,..., 2εN ) H 1 r 1 r is covered by O (1) translates of P (u ,..., u ; εN ,...,εN ), and so the final claim ε,C,r H 1 r 1 r follows from Lemma 5.1. Remark C.2. —Itisinfactpossibletoshowthat |P (u ,..., u ; εN ,...,εN )| H 1 r 1 r decays at a polynomial rate in ε,and that P (u ,..., u ; εN ,...,εN ) is a O (1)- H 1 r 1 r C,r approximate group uniformly in ε, but we will not need these stronger conclusions here. Lemma C.3. —Let P = P (u ,..., u ; N ,..., N ) be a coset nilprogression in C-normal H 1 r 1 r form. Then for all ε> 0 that are sufficiently small depending on r, C, the set P (u ,..., u ; H 1 r εN ,...,εN ) is isomorphic to a subset of a global group G. 1 r From this lemma (and Lemma C.1) we see that Theorem 2.13 follows immediately from Theorem 2.10. Proof. — We first establish the claim under the additional hypothesis that the N ,..., N are sufficiently large depending on r, C; we will remove this hypothesis at 1 r the end of the argument. THE STRUCTURE OF APPROXIMATE GROUPS 215 Let v ,...,v be lifts of the generators u ,..., u of P/H to P. By Definition 2.6 1 r 1 r and the normality of H, one has N N j+1 r (C.2) [v ,v ]∈ P v ,...,v ; O ,..., O H i j j+1 r C C N N N N i j i j for all 1  i < j  r; note that the hypothesis that the N are large ensure that the right- hand side is well-defined in P. j+1 N Consider a word in P(v ,...,v ; O ( ), ..., O ( )), which therefore con- j+1 r C C N N N N i j i j N N j+1 ±1 j+2 ±1 tains O ( ) copies of v ,O ( ) copies of v , and so forth. Let us the leftmost C C j+1 j+2 N N N N i j i j ±1 ±1 copy of v and move it all the way to the left. Each time it passes through a v for some j+1 k N ±1 j + 1 < k  r,weuse (C.2)and create O ( ) new copies of v for each l > k, plus N N j+1 k an element of H which can be pushed all the way to the right using the normality of H. N ±1 Thus, if one initially had a copies of v for each j + 1 < k  r before one started N N i j ±1 moving the leftmost v to the left, then by the end of the move, one would have j+1 N a N N l k k l (C.3) a + O l C N N N N N N i j i j j+1 k j+1<k<l ±1 copies of v for each j + 1 < l  r. We may simplify the expression (C.3)as 1 N a + O a . l C k N N N j+1 i j j+1<k<l Thus we have effectively replaced the sequence (a ) by the sequence k j+1<kr a + O a . l C k j+1 j+1<lr j+1<k<l j+1 We iterate this process O ( ) = O (N ) times, and note that the a were initially of C C j+1 k N N i j size O (1), and end up at a sequence, all of whose entries are of size O (1).Inother C C,r ±1 words, after moving all copies of v to the left, and all copies of H to the right, we end j+1 N ±1 up with O ( ) copies of v in the middle for each j + 1 < k  r. We conclude that C,r N N i j n j+2 r i,j,j+1 [v ,v ]∈ v P v ,...,v ; O ,..., O H i j j+2 r C,r C,r j+1 N N N N i j i j j+1 for some n = O ( ); note that as long as the N are large enough, all words that i,j,j+1 C i N N i j appear in this reorganisation will lie inside P and so the algebraic manipulations can be 216 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO justified. Iterating this procedure r − j times (which will be justified if the N are large enough) we see that i,j,j+1 i,j,r (C.4) [v ,v ]= v ...v h i j i,j j+1 r for some n = O ( ) and h ∈ H. Also, one has i,j,k C,r i,j N N i j −1 (C.5) v hv = φ (h) i i for some (outer) automorphism φ : H → HofH. Now let G be the global group generated by H and formal generators e ,..., e , 1 r subject to the relations i,j,j+1 i,j,r (C.6) [e , e ]= e ... e h i j i,j j+1 r and −1 (C.7) e he = φ (h) i i for 1  i < j  r.Weclaim that for ε small enough, there is an injective homomorphism from P (u ,..., u ; εN ,...,εN ) to G, which will give the claim. H 1 r 1 r To see this, first observe from the normality of H that P (u ,..., u ; εN ,...,εN ) = P(v ,...,v ; εN ,...,εN )H. H 1 r 1 r 1 r 1 r Organising the words in P(v ,...,v ; εN ,...,εN ) by moving all occurrences of v to 1 r 1 r 1 the left (using (C.4)) and all occurrences of H to the right (using the normality of H) we then have (C.8)P (u ,..., u ; εN ,...,εN ) H 1 r 1 r ⊆ P(v ; εN )P v ,...,v ; O (εN ), ..., O (εN ) H 1 1 2 r C,r 2 C,r r assuming ε is small enough in order to justify all the algebraic manipulations. Iterating this we see that (C.9)P (u ,..., u ; εN ,...,εN ) ⊆ P v ; O (εN ) ... P v ; O (εN ) H. H 1 r 1 r 1 C,r 1 r C,r r Thus it suffices to establish an injective homomorphism φ from the set n n (C.10) v ...v h : n = O (εN ); h ∈ H i C,r i 1 r to G. From the local properness property in Definition 2.6, all the products in (C.10)are distinct if ε is small enough. We may thus define φ by the formula n n n n 1 r 1 r φ v ...v h := e ... e h. 1 r 1 r THE STRUCTURE OF APPROXIMATE GROUPS 217 Next, we show that φ is injective. Indeed, suppose that there exist n , n = O (εN ) i C,r i and h, h ∈ Hwith n n n 1 r 1 φ v ...v h = φ v ...v h 1 r 1 r and thus n n 1 r 1 e ... e h = e ... e h . 1 r 1 r By the universal properties of G, there is a homomorphism from G to Z that maps e to 1 and annihilates the other e and H. This implies that n = n . We can then eliminate i 1 n , n and work with the subgroup G of G generated by e ,..., e and H. From abstract 1 2 2 r nonsense we see that G is universal with respect to the constraints (C.6), (C.7)for i  2, and that G is the semidirect product of G with Z using the conjugation action of e on 2 1 G defined using (C.6), (C.7)for i = 1. In particular, there is a homomorphism from G to 2 2 Z that maps e to 1 and annihilates the e and H for i > 2. This gives n = n . Continuing 2 i 2 in this fashion we see that n = n for all i and hence h = h , which establishes injectivity. Finally, we need to show that φ is a homomorphism. It suffices to show that if n , n , n = O (εN ) and h, h , h ∈ Hare such that i C,r i i i n  n n n n  n 1 1 1 r r r (C.11) v ...v hv ...v h = v ...v h 1 r 1 r 1 r then n n n n n  n 1 r 1 1 r r (C.12) e ... e he ... e h = e ... e h . 1 r 1 r 1 r To see this, we rearrange the word on the left-hand side of (C.11)bymoving alloccur- rences of v to the left, and all occurrences of elements of H to the right, using (C.4) and (C.5); if ε is small enough, then all manipulations take place inside P and can thus be justified. Iterating this process, we must eventually be able to express this word in the n˜ 1 n˜ ˜ ˜ form v ...v h for some n˜ = O (εN ) and h ∈ H. By injectivity, we then have n˜ = n i C,r i i 1 r i and h = h . But then if one formally replaces all the v by e and uses (C.6), (C.7) in place i i of (C.4), (C.5) in the rearrangement procedure just described, we conclude (C.12), and the claim follows. Now we remove the hypothesis that the N ,..., N are sufficiently large depending 1 r + + on r, C. Let F : R → R be a function depending on r, C to be chosen later. By the pigeonhole principle, we can find a threshold M ≥ 1withM = O (1) such that every length N is either less than M, or larger than F(M).Ifwelet 1  i < ··· < i  r i 1 r be those indices i with N > F(M), then we see (if F is sufficiently rapidly growing) j i that P (u ,..., u ; N ,..., N ) will be a coset nilprogression in O (1)-normal form. H i i i i C,r,M 1 r 1 For F sufficiently rapidly growing, the preceding argument then applies to conclude that P (u ,..., u ; εN ,...,εN ) is isomorphic to a subset of a global group if ε is small H i i i i 1  1 r enough depending on C, r, M, and the claim follows.  218 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO Remark C.4. —From(C.9) we see that every element in P (u ,..., u ; H 1 r εN ,...,εN ) takes the form 1 r 1 r (C.13) v ...v h 1 r for some integers a ,..., a with a = O (εN ) and h ∈ H. Conversely, it is clear that i r i C,r i if |a |  εN then all expressions of the form (C.13) lie in P (u ,..., u ; εN ,...,εN ). i i H 1 r 1 r Informally, we thus see that the nilprogression P (u ,..., u ; εN ,...,εN ) is comparable H 1 r 1 r in some sense to the nilbox 1 a v ...v h :|a |  εN ; h ∈ H . i i We will however not exploit this description of nilprogressions in this paper. A variant of the above analysis also gives polynomial growth of progressions in C-normal form in the global case. Proposition C.5 (Polynomial growth). — Let P = P (u ,..., u ; N ,..., N ) be a coset H 1 r 1 r m O (1) C,r nilprogression in C-normal form in a global group. Then for all m  1, one has |P | m |P|. C,r Proof. — We allow all implied constants to depend on C, r. As H is normalised by P, we may quotient out by H and reduce to the case when H is trivial. Then P ⊆ P(u ,..., u ; mN ,..., mN ) 1 r 1 r and so it suffices (by the volume bound (2.2)) to show that O(1) P(u ,..., u ; mN ,..., mN )  m (N + 1)...(N + 1). 1 r 1 r 1 r By modifying the proof of (C.8), one easily verifies that P(u ,..., u ; mN ,..., mN ) 1 r 1 r 2 2 ⊆ P(v ; mN )P v ,...,v ; O m N ,..., O m N ; 1 1 2 r 2 r iterating this, one sees that O(1) O(1) P(u ,..., u ; mN ,..., mN ) ⊆ P v ; O m N ... P v ; O m N , 1 r 1 r 1 1 r r and the claim follows.  THE STRUCTURE OF APPROXIMATE GROUPS 219 REFERENCES 1. I. BENJAMINI and G. KOZMA, A resistance bound via an isoperimetric inequality, Combinatorica, 25 (2005), 645–650. 2. L. BIEBERBACH, Über einen Satz des Herrn C. Jordan in der Theorie der endlichen Gruppen linearer Substitutionen,Sitzber.Preuss. Akad. Wiss, Berlin, 1911. 3. Y. BILU, Addition of sets of integers of positive density, J. Number Theory, 64 (1997), 233–275. 4. Y. BILU, Structure of sets with small sumset, Astérisque, 258 (1999), 77–108. Structure theory of set addition. 5. E. BREUILLARD and B. GREEN, Approximate groups. I: the torsion-free nilpotent case, J. Inst. Math. Jussieu, 10 (2011), 37–57. 6. E. BREUILLARD and B. GREEN, Approximate groups. II: the solvable linear case, Q. J. of Math., Oxf., 62 (2011), 513–521. 7. E. BREUILLARD and B. GREEN, Approximate groups. III: the unitary case, Turk.J.Math., 36 (2012), 199–215. 8. E. BREUILLARD,B.GREEN,and T. TAO, Approximate subgroups of linear groups, Geom. Funct. Anal., 21 (2011), 774– 9. Y. D. BURAGO andV.A.ZALGALLER, Geometric Inequalities, Grundlehren der Mathematischen Wissenschaften [Funda- mental Principles of Mathematical Sciences], vol. 285, Springer, Berlin, 1988. Translated from the Russian by A. B. Sosinski˘ ı, Springer Series in Soviet Mathematics. 10. M.-C. CHANG, A polynomial bound in Freiman’s theorem, Duke Math. J., 113 (2002), 399–419. 11. J. CHEEGER and T. H. COLDING, Lower bounds on Ricci curvature and the almost rigidity of warped products, Ann. Math., 144 (1996), 189–237. 12. L. J. CORWIN and F. GREENLEAF, Representations of Nilpotent Lie Groups and Their Applications, CUP, Cambridge, 1990. 13. E. CROOT and O. SISASK, A probabilistic technique for finding almost-periods of convolutions, Geom. Funct. Anal., 20 (2010), 1367–1396. 14. D. FISHER,N.H.KATZ,and I. PENG, Approximate multiplicative groups in nilpotent Lie groups, Proc. Am. Math. Soc., 138 (2010), 1575–1580. 15. G. A. FREIMAN, Foundations of a Structural Theory of Set Addition, American Mathematical Society, Providence, 1973. Translated from the Russian, Translations of Mathematical Monographs, vol. 37. 16. K. FUKAYA and T. YAMAGUCHI, The fundamental groups of almost non-negatively curved manifolds, Ann. Math., 136 (1992), 253–333. 17. S. GALLOT,D.HULIN,and J. LAFONTAINE, Riemannian Geometry, Universitext, Springer, Berlin, 1987. 18. N. GILL and H. HELFGOTT, Growth in solvable subgroups of GL (Z/pZ), preprint (2010), arXiv:1008.5264. 19. N. GILL and H. HELFGOTT, Growth of small generating sets in SL (Z/pZ), Int. Math. Res. Not., 18 (2011), 4226–4251. 20. A. M. GLEASON, The structure of locally compact groups, Duke Math. J., 18 (1951), 85–104. 21. A. M. GLEASON, Groups without small subgroups, Ann. Math., 56 (1952), 193–212. 22. K. GÖDEL, Consistency of the axiom of choice and of the generalized continuum-hypothesis with the axioms of set theory, Proc. Natl. Acad. Sci, 24 (1938), 556–557. 23. I. GOLDBRING, Nonstandard Methods in Lie Theory, Ph.D. Thesis, University of Illinois at Urbana-Champaign, 2009. 24. I. GOLDBRING, Hilbert’s fifth problem for local groups, Ann. Math., 172 (2010), 1269–1314. 25. B. GREEN and I. Z. RUZSA, Freiman’s theorem in an arbitrary abelian group, J. Lond. Math. Soc., 75 (2007), 163–175. 26. B. GREEN and T. TAO, Compressions, convex geometry and the Freiman-Bilu theorem, Q. J. Math., 57 (2006), 495– 27. M. GROMOV, Groups of polynomial growth and expanding maps, Publ. Math. IHÉS, 53 (1981), 53–73. 28. M. GROMOV, Metric Structures for Riemannian and Non-Riemannian Spaces, Modern Birkhäuser Classics, Birkhäuser, Boston, 2007. Based on the 1981 French original, With appendices by M. Katz, P. Pansu and S. Semmes, Translated from the French by Sean Michael Bates. 29. M. HALL Jr., The Theory of Groups, Chelsea Publishing Co., New York, 1976. Reprinting of the 1968 edition. 30. H. A. HELFGOTT, Growth and generation in SL (Z/pZ), Ann. Math., 167 (2008), 601–623. 31. H. A. HELFGOTT,Growthin SL (Z/pZ), J. Eur. Math. Soc., 13 (2011), 761–851. 32. J. HIRSCHFELD, The nonstandard treatment of Hilbert’s fifth problem, Trans. Am. Math. Soc., 321 (1990), 379–400. 33. E. HRUSHOVSKI, Stable group theory and approximate subgroups, J. Am. Math. Soc., 25 (2012), 189–243. 34. I. KAPLANSKY, Lie Algebras and Locally Compact Groups, The University of Chicago Press, Chicago, 1971. 35. V. KAPOVITCH,A.PETRUNIN,and W. TUSCHMANN, Nilpotency, almost nonnegative curvature, and the gradient flow on Alexandrov spaces, Ann. Math., 171 (2010), 343–373. 36. V. KAPOVITCH and B. WILKING, Structure of fundamental groups of manifolds with Ricci curvature bounded below, preprint (2011), arXiv:1105.5955. 220 EMMANUEL BREUILLARD, BEN GREEN, TERENCE TAO 37. B. KLEINER, A new proof of Gromov’s theorem on groups of polynomial growth, J. Am. Math. Soc., 23 (2010), 815–829. 38. J. LEE and Y. MAKARYCHEV, Eigenvalue multiplicity and volume growth, preprint (2008), arXiv:0806.1745. 39. D. MONTGOMERY and L. ZIPPIN, Small subgroups of finite-dimensional groups, Ann. Math., 56 (1952), 213–241. 40. D. MONTGOMERY and L. ZIPPIN, Topological Transformation Groups, Interscience Publishers, New York, 1955. 41. P. J. OLVER, Non-associative local Lie groups, J. Lie Theory, 6 (1996), 23–51. 42. C. PITTET and L. SALOFF-COSTE, A survey on the relationships between volume growth, isoperimetry, and the behavior of simple random walk on Cayley graphs, with examples, survey, preprint (2000). 43. L. PYBER and E. SZABÓ, Growth in finite simple groups of Lie type of bounded rank, preprint (2010), arXiv:1005.1858. 44. I. Z. RUZSA, Generalized arithmetical progressions and sumsets, Acta Math. Hung., 65 (1994), 379–388. 45. I. Z. RUZSA, An analog of Freiman’s theorem in groups, Astérisque, 258 (1999), 323–326. 46. T. SANDERS, From polynomial growth to metric balls in monomial groups, preprint (2009), arXiv:0912.0305. 47. T. SANDERS, On a non-abelian Balog-Szemerédi-type lemma, J. Aust. Math. Soc., 89 (2010), 127–132. 48. T. SANDERS, On the Bogolyubov-Ruzsa lemma. Anal. Partial Differ. Equ. (2010), to appear, arXiv:1011.0107. 49. T. SANDERS, A quantitative version of the non-abelian idempotent theorem, Geom. Funct. Anal., 21 (2011), 141–221. 50. J.-P. SERRE, Lie Algebras and Lie Groups, Lecture Notes in Mathematics, vol. 1500, Springer, Berlin, 2006. 1964 lectures given at Harvard University, Corrected fifth printing of the second (1992) edition. 51. Y. SHALOM and T. TAO, A finitary version of Gromov’s polynomial growth theorem, Geom. Funct. Anal., 20 (2010), 1502–1547. 52. T. TAO, Product set estimates for non-commutative groups, Combinatorica, 28 (2008), 547–594. 53. T. TAO, Freiman’s theorem for solvable groups, Contrib. Discrete Math., 5 (2010), 137–184. 54. T. TAO and V. VU, Additive Combinatorics, Cambridge Studies in Advanced Mathematics, vol. 105, Cambridge University Press, Cambridge, 2006. 55. W. P. THURSTON, Three-Dimensional Geometry and Topology, vol. 1, Princeton Mathematical Series, vol. 35, Princeton University Press, Princeton, 1997. Edited by Silvio Levy. 56. L. van den DRIES and I. GOLDBRING, Globalizing locally compact local groups, J. Lie Theory, 20 (2010), 519–524. 57. L. van den DRIES and I. GOLDBRING, Seminar notes on Hilbert’s 5th problem, preprint (2010). 58. L. van den DRIES and A. J. WILKIE, Gromov’s theorem on groups of polynomial growth and elementary logic, J. Algebra, 89 (1984), 349–374. 59. N. T. VAROPOULOS,L.SALOFF-COSTE,and T. COULHON, Analysis and Geometry on Groups, Cambridge Tracts in Mathe- matics, vol. 100, Cambridge University Press, Cambridge, 1992. 60. H. YAMABE, A generalization of a theorem of Gleason, Ann. Math., 58 (1953), 351–365. 61. H. YAMABE, On the conjecture of Iwasawa and Gleason, Ann. Math., 58 (1953), 48–54. E. Breuillard Laboratoire de Mathématiques, Bâtiment 425, Université Paris Sud 11, 91405 Orsay, France emmanuel.breuillard@math.u-psud.fr B. Green Centre for Mathematical Sciences, Wilberforce Road, Cambridge CB3 0WA, England b.j.green@dpmms.cam.ac.uk THE STRUCTURE OF APPROXIMATE GROUPS 221 T. Tao Department of Mathematics, UCLA, 405 Hilgard Ave, Los Angeles, CA 90095, USA tao@math.ucla.edu Manuscrit reçu le 7 novembre 2011 Manuscrit accepté le 18 septembre 2012 publié en ligne le 19 octobre 2012.

Journal

Publications mathématiques de l'IHÉSSpringer Journals

Published: Oct 19, 2012

There are no references for this article.