Measuring Intelligence and Growth Rate: Variations on Hibbard’s Intelligence Measure

Samuel Alexander; Bill Hibbard

doi:10.2478/jagi-2021-0001

Measuring Intelligence and Growth Rate: Variations on Hibbard’s Intelligence Measure

Alexander, Samuel; Hibbard, Bill 2021-01-01 00:00:00 In 2011, Hibbard suggested an intelligence measure for agents who compete in an adversarial sequence prediction game. We argue that Hibbard's idea should actually be considered as two separate ideas: rst, that the intelligence of such agents can be measured based on the growth rates of the runtimes of the competitors that they defeat; and second, one speci c (somewhat arbitrary) method for measuring said growth rates. Whereas Hibbard's intelligence measure is based on the latter growth-rate-measuring method, we survey other methods for measuring function growth rates, and exhibit the resulting Hibbard-like intelligence measures and taxonomies. Of particular interest, we obtain intelligence taxonomies based on Big-O and Big-Theta notation systems, which taxonomies are novel in that they challenge conventional notions of what an intelligence measure should look like. We discuss how intelligence measurement of sequence predictors can indirectly serve as intelligence measurement for agents with Arti cial General Intelligence (AGIs). 1. Introduction In his insightful paper, Hibbard (2011) introduces a novel intelligence measure (which we will here refer to as the original Hibbard measure ) for agents who play a game of adversarial sequence prediction (Hibbard, 2008) \against a hierarchy of increasingly dicult sets of " evaders (environments that attempt to emit 1s and 0s in such a way as to evade prediction). The levels of Hibbard's hierarchy are labelled by natural numbers, and an agent's original Hibbard measure is the maximum n 2 N such that said agent learns to predict all the evaders in the nth level of the hierarchy, or implicitly an agent's original Hibbard measure is 1 if said agent learns to predict all the evaders in all levels of Hibbard's hierarchy. The hierarchy which Hibbard uses to measure intelligence is based on the growth rates of the runtimes of evaders. We will argue that Hibbard's idea is really a combination of two orthogonal ideas. First: that in some sense the intelligence of a predicting agent can be measured based on the growth rates of the runtimes of the evaders whom that predictor 1. Hibbard does not explicitly include the 1 case in his de nition, but in his Proposition 3 he refers to agents having \ nite intelligence", and it is clear from context that by this he means agents who fail to predict some evader somewhere in the hierarchy. 1 S. Alexander & B. Hibbard learns to predict. Second: Hibbard proposed one particular method for measuring said growth rates. The growth rate measurement which Hibbard proposed yields a corresponding intelligence measure for these agents. We will argue that any method for measuring growth rates of functions yields a corresponding adversarial sequence prediction intelligence measure (or ASPI measure for short) provided the underlying number system provides a way of choosing canonical bounds for bounded sets. If the underlying number system does not provide a way of choosing canonical bounds for bounded sets, the growth-rate-measure will yield a corresponding ASPI taxonomy (like the big-O taxonomy of asymptotic complexity). The particular method which Hibbard used to measure function growth rates is not very standard. We will survey other ways of measuring function growth rates, and these will yield corresponding ASPI measures and taxonomies. The structure of the paper is as follows. In Section 2, we review the original Hibbard measure. In Section 3, we argue that any method of measuring growth rates of functions yields an ASPI measure or taxonomy, and that the original Hibbard measure is just a special case resulting from one particular method of measuring function growth rate. In Section 4, we consider Big-O notation and Big- notation and de ne corresponding ASPI taxonomies. In Section 5, we consider solutions to the problem of measuring growth rates of functions using majorization hierarchies, and de ne corresponding ASPI measures. In Section 6, we consider solutions to the problem of measuring growth rates of functions using more abstract number systems, namely the hyperreal numbers and the surreal numbers. We do not assume previous familiarity with these number systems. In Section 7, we give pros and cons of dierent ASPI measures and taxonomies. In Section 8, we summarize and make concluding remarks. 2. Hibbard's original measure Hibbard proposed an intelligence measure for measuring the intelligence of agents who compete to predict evaders in a game of adversarial sequence prediction (we de ne this formally below). A predictor p (whose intelligence we want to measure) competes against evaders e. In each step of the game, both predictor and evader simultaneously choose a binary digit, 1 or 0. Only after both of them have made their choice do they see which choice the other one made, and then the game proceeds to the next step. The predictor's goal in each round is to choose the same digit that the evader will choose; the evader's goal is to choose a dierent digit than the predictor. The predictor wins the game (and is said to learn to predict e, or simply to learn e) if, after nitely many initial steps, eventually the predictor always chooses the same digit as the evader. De nition 1 By B, we mean the binary alphabet f0; 1g. By B , we mean the set of all nite binary sequences. By hi we mean the empty binary sequence. 2 Measuring Intelligence and Growth Rate De nition 2 (Predictors and evaders) 1. By a predictor, we mean a Turing machine p which takes as input a nite (possibly empty) binary sequence (x ; : : : ; x ) 2 B (thought of as a sequence of evasions) and 1 n outputs 0 or 1 (thought of as a prediction), which output we write as p(x ; : : : ; x ). 1 n 2. By an evader, we mean a Turing machine e which takes as input a nite (possibly empty) binary sequence (y ; : : : ; y ) 2 B (thought of as a sequence of predictions) 1 n and outputs 0 or 1 (thought of as an evasion), which output we write as e(y ; : : : ; y ). 1 n 3. For any predictor p and evader e, the result of p playing the game of adversarial sequence prediction against e (or more simply, the result of p playing against e) is the in nite binary sequence (x ; y ; x ; y ; : : :) de ned as follows: 1 1 2 2 (a) The rst evasion x = e(hi) is the output of e when run on the empty prediction- sequence. (b) The rst prediction y = p(hi) is the output of p when run on the empty evasion- sequence. (c) For all n > 0, the (n + 1)th evasion x = e(y ; : : : ; y ) is the output of e on n+1 1 n the sequence of the rst n predictions. (d) For all n > 0, the (n + 1)th prediction y = p(x ; : : : ; x ) is the output of p on n+1 1 n the sequence of the rst n evasions. 4. Suppose r = (x ; y ; x ; y ; : : :) is the result of a predictor p playing against an evader 1 1 2 2 e. For every n 1, we say the predictor wins round n in r if x = y ; otherwise, the n n evader wins round n in r. We say that p learns to predict e (or simply that p learns e) if there is some N 2 N such that for all n > N , p is the winner of round n in r. Note that if e simply ignores its inputs (y ; : : : ; y ) and instead computes e(y ; : : : ; y ) 1 n 1 n based only on n, then e is essentially a sequence. Thus De nition 2 is a generalization of sequence prediction, which many authors have written about (such as Legg (2006), who gives many references). In the future, it could be interesting to consider variations of the game involving probability in various ways, for example, where the predictor wins if his guesses have > 50% win rate, or where the predictor states how con dent he is about each guess, or other such variations. In the following de nition, we dier from Hibbard's original paper because of a minor (and fortunately, easy-to- x) error there. De nition 3 Suppose e is an evader. For each n 2 N, let t (n) be the maximum number of steps that e takes to run on any length-n sequence of binary digits. In other words, t (0) is the number of steps e takes to run on hi, and for all n > 0, t (n) = max (number of steps e takes to run on (b ; : : : ; b )): e 1 n b ;:::;b 2f0;1g 1 n 2. The measures we introduce in this paper would also work if we de ned predictors as not-necessarily- computable functions B ! B, but this would not add much insight. We prefer to emphasize the duality between predictors and evaders when each is a Turing machine. 3 S. Alexander & B. Hibbard Example 4 Let e be an evader. Then t (2) is equal to the number of steps e takes to run on input (0; 0), or to run on input (0; 1), or to run on input (1; 0), or to run on input (1; 1)|whichever of these four possibilities is largest. De nition 5 Suppose f : N ! N and g : N ! N. We say f majorizes g, written f g, if there is some n 2 N such that for all n > n , f (n) > g(n). 0 0 De nition 6 Suppose f : N ! N. We de ne E to be the set of all evaders e such that f t . De nition 7 (The original Hibbard measure) Let g ; g ; : : : be the enumeration of the 1 2 primitive recursive functions given by Liu (1960). For each m > 0, de ne f : N ! N by f (k) = max max g (j): m i 0<im jk For any predictor p, we de ne the original Hibbard intelligence of p to be the maximum m > 0 such that p learns to predict e for every e 2 E (or 0 if there is no such m, or 1 if p learns to predict e for every e 2 E for every m > 0). The following result shows that the measure in De nition 7 does not overshoot the agents being measured. Proposition 8 For any integer m 1, there is a predictor p with original Hibbard measure m. Proof This is part of Proposition 3 of Hibbard (2011), but we give a self-contained proof here because we will state similar results about other measures below. For each computable f : N ! N, let p be the predictor who proceeds as follows when given evasion-sequence (x ; : : : ; x ) 2 B as input. 1 n First, by calling itself recursively on inputs hi; (x ); (x ; x ); : : : ; (x ; : : : ; x ); 1 1 2 1 n1 p determines the prediction-sequence (y ; : : : ; y ) as in De nition 2. 1 n Next, p considers the rst n Turing machines, T ; : : : ; T . For 1 i n, say that T is f 1 n i an nth-order e-lookalike if the following requirements hold: If T halts in f (0) steps on input hi, then T outputs x on that input. i i 1 If T halts in f (1) steps on input (y ), then T outputs x on that input. i 1 i 2 If T halts in f (2) steps on input (y ; y ), then T outputs x on that input. i 1 2 i 3 : : : If T halts in f (n 1) steps on input (y ; : : : ; y ), then T outputs x on that i 1 n1 i n input. T halts in f (n) steps on input (y ; : : : ; y ) with some output X . i 1 n i;n 4 Measuring Intelligence and Growth Rate By simulating T ; : : : ; T as needed (which only requires nitely many steps), p determines 1 n f if any T is an nth-order e-lookalike. If so, p outputs p (x ; : : : ; x ) = X for the minimal i f f 1 n i;n such i. If not, p outputs 0. Claim: For every computable f : N ! N, for every evader e, if (x ; y ; x ; y ; : : :) is the 1 1 2 2 result of p playing against e, then for all-but- nitely-many n 2 N, if f (n) > t (n) then f e x = y . n+1 n+1 Let f; e; (x ; y ; : : :) be as in the claim and assume f (n) > t (n). Being an evader, e is a 1 1 e Turing machine, say, the kth Turing machine. Since f (n) > t (n), it follows that T is an e k nth-order e-lookalike. It follows that, on input (x ; : : : ; x ), p will play the output given 1 n by some nth-order e-lookalike T 0 , k k. For any k < k, say that p is tricked by T 0 on input (x ; : : : ; x ) if, on said input, p f k 1 n 0 0 0 identi es T as the rst nth-order e-lookalike and so plays y = X but X 6= x k n+1 k ;n k ;n n+1 (loosely speaking: p is led to believe the evader is T 0 , and this false belief causes p to f k f incorrectly predict the evader's next digit). It follows that p will not identify T as an f k 0 0 n th-order e-lookalike when run on (x ; : : : ; x 0 ) for any n > n. Thus, p can only be 1 n f tricked at most once by T for any particular k < k. If p is not so tricked, then either k f p identi es T 0 as the rst nth-order e-lookalike for some k < k (in which case p plays f k f 0 0 y = X = x , lest p be tricked by T ), or else p identi es T as the rst nth-order n+1 n+1 k ;n f k f k e-lookalike, in which case p plays y = X = x since e = T . Either way, after f n+1 k;n n+1 k possibly nitely many exceptions caused by being tricked, p always plays y = x f n+1 n+1 whenever f (n) > t (n), proving the claim. We claim p has original Hibbard measure m. To see this, let e 2 E , we must f f m m show p learns e. Let (x ; y ; : : :) be the result of p playing against e. Since e 2 E , 1 1 f f m m f t , so for-all-but- nitely-many n 2 N, f (n) > t (n). And, by the above Claim, with m e m e at most nitely many exceptions, whenever f (n) > t (n), x = y . It follows that m e n+1 n+1 p learns e, as desired. Unfortunately, the original Hibbard measure is not computable (unless the background model of computation is contrived), as the following proposition shows . Proposition 9 Assume the background model of computation is well-behaved enough that there is an evader e which always outputs 0 and whose runtime t is bounded by some 0 e primitive recursive function. Then the original Hibbard measure is not computable: there is no eectively computable procedure, given a predictor, to compute its original Hibbard measure. In fact, there is not even an eectively computable procedure to tell if one given predictor has a higher original Hibbard measure than another given predictor. Proof Let p be a predictor which always outputs 1, and let m be its original Hibbard measure. By the existence of e , it follows that m < 1 since certainly p does not learn e . 0 1 0 Let p be a predictor with original Hibbard measure > m (Proposition 8). If the proposition were false, then we could solve the halting problem as follows. Given any Turing machine M , in order to determine whether or not M halts, proceed as follows. 3. In fact, assuming a non-contrived background model of computation, for any strictly increasing total computable function f , even the following can be shown to be non-computable: given an evader e, to determine whether or not f t . 5 S. Alexander & B. Hibbard Let p be the predictor which, on input (x ; : : : ; x ), outputs p(x ; : : : ; x ) if M halts in M 1 n 1 n n steps, or 1 otherwise. Clearly, M halts if and only if p has a higher original Hibbard measure than p . Likewise, the variations on Hibbard's measure which we present in this paper are also non-computable. To quantify their precise degrees of computability (e.g., where they fall within the arithmetical hierarchy) would be beyond the scope of this paper. We will, however, state one conjecture. If we modi ed the original Hibbard measure by replacing Liu's enumeration of the primitive recursive functions by an enumeration of all total computable functions, then we conjecture the resulting measure would be strictly higher in the arithmetical hierarchy (i.e., would require strictly stronger oracles to compute), essentially because the set of total computable functions is not computably enumerable, whereas the primitive recursive functions are. 2.1 Predictor intelligence and AGI intelligence De nition 7, and similar measures and taxonomies which we will de ne later, quantify the intelligence of predictors in the game of adversarial sequence prediction. But any method for quantifying the intelligence of such predictors can also approximately quantify the intelligence of (suitably idealized) agents with Arti cial General Intellience (that is, the intelligence of AGIs). The idealized AGIs we have in mind should be capable of understanding, and obedient in following or trying to follow, commands issued in everyday human language (this is not to say that all AGIs must necessarily be obedient, merely that for the purposes of this paper we restrict our attention to obedient AGIs). For example, if such an idealized AGI were commanded, \until further notice, compute and list the digits of pi," it would be capable of understanding that command, and would obediently compute said digits until commanded otherwise . It is unclear how an AGI ought to respond if given an impossible command, such as \write a computer program that solves the halting problem", or Yampolskiy's \Disobey!" (Yampolskiy, 2020). But an AGI should be capable of understanding and attempting to obey an open-ended command, provided it is not impossible. For example, we could command an AGI to \until further notice, write an endless poem about trees," and the AGI should be able to do so, writing said poem line-by-line until we tell it to stop. This is despite the fact that the command is open-ended and under-determined (there are many decisions involved in writing a poem about trees, and we have left all these decisions to the AGI's discretion). The AGI's ability to obey such open-ended and under-determined commands exempli es its ability to \adapt with insucient knowledge and resources" (Wang, 2019). One well-known example of an open-ended command which an AGI should be perfectly 4. It is somewhat unclear how explicitly an AGI would obey certain commands. To use an example of Yampolskiy (2020), if we asked a car-driving AGI to stop the car, would the AGI stop the car in the middle of trac, or would it pull over to the side rst? We assume this ambiguity does not apply when we ask the AGI to perform tasks of a suciently abstract and mathematical nature. 5. Our thinking here is reminiscent of some remarks of Yampolskiy (2013). 6 Measuring Intelligence and Growth Rate capable of attempting to obey (perhaps at peril to us) is Bostrom's \manufacture as many paperclips as possible" (Bostrom, 2003). In particular, such an idealized AGI X should be capable of obeying the following command: \Act as a predictor in the game of adversarial sequence prediction". By giving X this command, and then immediately ltering out all X 's sensory input except only for input about the digits chosen by an evader, we would obtain a formal predictor in the sense of De nition 2. This predictor might be called \the predictor generated by X ". Strictly speaking, if the command is given to X at time t, then it would be more proper to call the resulting predictor \the predictor generated by X at time t": up until time t, the observations X makes about the universe might have an eect on the strategy X chooses to take once commanded to act as a predictor; but as long as we lter X 's sensory input immediately after giving X the command, no further such observations can so alter X 's strategy. In short, to use Yampolskiy's terminology (Yampolskiy, 2012), the act of trying to predict adversarial sequence evaders is AI-easy. Thus, any intelligence measure (or taxonomy) for predictors also serves as an intelligence measure (or taxonomy) for suitably idealized AGIs. Namely: the intelligence level of an AGI X is equal to the intelligence level of X 's predictor. Of course, a priori, X might be very intelligent at various other things while being poor at sequence prediction, or vice versa, so this only approximately captures X 's true intelligence. Of course, the same could be said for any competency measure on any task: we do not make any claims that when we measure X 's intelligence via X 's predictor's performance, that this is in any sense \the one true intelligence measure". One could just as well measure X 's intelligence in terms of the Elo ranking X would obtain if one ordered X to compete at chess. We would oer two motivations to consider adversarial sequence prediction ability as a particularly interesting proxy for AGI intelligence measurement: 1. There seem to be high-level connections between intelligence and prediction in general (Hutter, 2004), of which adversarial sequence prediction is an elegant and parsimonious example. 2. Adversarial sequence prediction ability is not bounded above, in the sense that for any particular predictor p, one can easily produce a predictor p that learns all the evaders which p learns and at least one additional evader. 3. Quantifying growth rates of functions The following is a general and open-ended problem. Problem 10 Quantify the growth-rate of functions from N to N. The de nition of the original Hibbard measure (De nition 7) can be thought of as implicitly depending on a speci c solution to Problem 10, which we make explicit in the following de nition. De nition 11 For each m > 0, let f be as in De nition 7. For each f : N ! N, we de ne the original Hibbard growth rate H (f ) to be minfm > 0 : f fg if there is any such m > 0, and otherwise H (f ) = 1. 7 S. Alexander & B. Hibbard In order to generalize the original Hibbard de nition in a uniform way, we will rearrange notation somewhat. We have, in some sense, more notation than necessary, namely the notation in (Hibbard, 2011) and synonymous notation which is modi ed to generalize more readily. Lemma 12 For every natural m > 0 and every f : N ! N, H (f ) m if and only if f f . Proof Straightforward. De nition 13 For every m 2 N, let E be the set of all evaders e such that H (t ) m. Lemma 14 For every natural m > 0, E = E . m m Proof Let e be an evader. By De nition 13, e 2 E if and only if H (t ) m. By Lemma 12, H (t ) m if and only if f t . But by De nition 6, this is the case if and only if e m e e 2 E . Corollary 15 For every predictor p, the original Hibbard measure of p is equal to the maximum natural m > 0 such that p learns e whenever e 2 E , or is equal to 0 if there is no such m, or is equal to 1 if p learns e whenever e 2 E for all m > 0. Proof Immediate by Lemma 14 and De nition 7. In other words, if S is the set of all the m as in Corollary 15, then the original Hibbard measure of p is the \canonical upper bound" of S, where by the \canonical upper bound" of a set of natural numbers we mean the maximum element of that set (or 1 if that set is unbounded). Remark 16 Corollary 15 shows that the de nition of the original Hibbard measure can be rephrased in such a way as to show that it depends in a uniform way on a particular solution to Problem 10, namely on the solution proposed by De nition 11. For any solution 0 H H to Problem 10, we could de ne evader-sets E in a similar way to De nition 13, and, by copying Corollary 15, we could obtain a corresponding intelligence measure given by H (provided there be some way of choosing canonical bounds of bounded sets in the underlying number system|if not, we would have to be content with a taxonomy rather than a measure, a predictor's intelligence falling into many nested taxa corresponding to many dierent upper 2 3 bounds, just as in Big-O notation a function can simultaneously be O(n ) and O(n )). This formalizes what we claimed in the Introduction, that Hibbard's idea can be decomposed into two sub-ideas, rstly, that a predictor's intelligence can be classi ed in terms of the growth rates of the runtimes of the evaders it learns, and secondly, a particular method (De nition 11) of measuring those growth rates (i.e., a particular solution to Problem 10). 8 Measuring Intelligence and Growth Rate 3.1 A theoretical note on the diculty of Problem 10 In this subsection, we will argue that in order for a solution to Problem 10 to be much good, it should probably measure growth rates using some alternative number system to the real numbers. Essentially, this is because the real numbers have the Archimedean property (the property that for any positive real r > 0 and any real y, there is some n 2 N such that nr > y), a constraint which does not apply to function growth rates. De nition 17 Let N be the set of all functions N ! N. A well-behaved real-measure of N N N is a function F : N ! R satisfying the following requirements. 1. (Monotonicity) For each f; g : N ! N, if f g, then F (f ) > F (g). 2. (Nontriviality) For each r 2 R, there is some f : N ! N such that F (f ) > r. Theorem 18 There is no well-behaved real-measure of N . N N Proof Assume F : N ! R is a well-behaved real-measure of N . By Nontriviality, there are f ; f ; : : : : N ! N such that each F (f ) > n. De ne g : N ! N by 0 1 n g(n) = f (n) + + f (n). Clearly, for every n 2 N, g f . By the Archimedean 0 n n property of the real numbers, there is some n 2 N such that n > F (g). By Monotonicity, F (g) > F (f ), but by choice of f , F (f ) > n > F (g), a contradiction. n n n In light of Theorem 18, we are motivated to investigate solutions to Problem 10 using alternatives to the real numbers, which will yield ASPI measures (or taxonomies) in terms of those alternatives. An informal argument could be made that real numbers might be inadequate for measuring AGI intelligence in general . It at least seems plausible that there are AGIs X ; X ; : : : such that each X is signi cantly more intelligent than X , and another AGI 1 2 i+1 i Y such that Y is more intelligent than each X . At least, if this is not true, it is not obvious that it is not true, and it seems like it would be nontrivial to argue that it is not true . Now, if \signi cantly more intelligent" implies \at least +1 more intelligent", then it follows that the intelligence levels of Y and of X ; X ; : : : could not all be real numbers, or else one of 1 2 the X would necessarily be more intelligent than Y . If, as the above argument suggests, the real numbers might potentially be too constrained to perfectly measure intelligence, what next? How could we measure intelligence other than by real numbers? A key motivation for the measures and taxonomies we will come up with below is to provide examples of intelligence measurement using alternative number systems. It is for this reason that we do not, in this paper, consider variations on Hibbard's intelligence measure that arise from simply replacing Liu's enumeration of the primitive recursive functions with various other N-indexed lists of functions (for example, the list of all total computable functions). 6. This argument was pointed out by Alexander (2019b), and by Alexander (2020b) again, the latter amidst a wider discussion of Archimedean and non-Archimedean measures. 7. To do so would require arguing that 8X ; X ; : : :, if each X is signi cantly more intelligent than X , 1 2 i+1 i then 8Y , 9i such that Y is at most as intelligent as X . 9 S. Alexander & B. Hibbard 4. Big-O and Big- intelligence One of the most standard solutions to Problem 10 in computer science is to categorize growth rates of arbitrary functions by comparing them to more familiar functions using Big-O notation or Big- notation. Knuth (1976) de nes these as follows (we modify the de nition slightly because we are only concerned here with functions from N to N). De nition 19 Suppose f : N ! N. We de ne the following function-sets. O(f (n)) is the set of all g : N ! N such that there is some real C > 0 and some n 2 N such that for all n n , g(n) Cf (n). 0 0 (f (n)) is the set of all g : N ! N such that there are some real C > 0 and C > 0 and some n 2 N such that for all n n , Cf (n) g(n) C f (n). 0 0 Note that De nition 19 does not measure growth rates, but rather categorizes growth rates into Big-O and Big- taxonomies. For example, the same function can be both O(n ) and O(n ), the former taxon being nested within the latter. By Remark 16, De nition 19 yields the following elegant taxonomy of predictor intelligence. De nition 20 Suppose p is a predictor, and suppose f : N ! N. We say p has Big-O ASPI measure O(f (n)) if p learns every evader e such that t is O(f (n)). We say p has Big- ASPI measure (f (n)) if p learns every evader e such that t is (f (n)). Proposition 21 For any computable function f : N ! N, there is a predictor p with Big-O ASPI measure O(f (n)) and Big- ASPI measure (f (n)). Proof De ne g : N ! N by g(n) = nf (n) + 1, and let p be as in the proof of Proposition 8. We claim p has Big-O ASPI measure O(f (n)). To see this, let e be any evader such that t is O(f (n)). Thus there is some C 2 R such that for all-but- nitely-many n 2 N, t (n) Cf (n). It follows that g t . By the Claim in the proof of Proposition 8, for all- e e but- nitely-many n 2 N, if g(n) > t (n) then x = y , where (x ; y ; : : :) is the result e n+1 n+1 1 1 of p playing against e. So in all, with only nitely many exceptions, each x = y , as g n+1 n+1 desired. The proof that p has Big- ASPI measure (f (n)) is similar. 5. ASPI measures based on majorization hierarchies Majorization hierarchies (Weiermann, 2002) provide ordinal-number-valued measures for the growth rates of certain functions. A majorization hierarchy depends on many in nite- dimensional parameters. We will describe two majorization hierarchies up to the ordinal , using standard choices for the parameters, and the ASPI measures which they produce. 10 Measuring Intelligence and Growth Rate De nition 22 (Classi cation of ordinal numbers) Ordinal numbers are divided into three types: 1. Zero: The ordinal 0. 2. Successor ordinals: Ordinals of the form + 1 for some ordinal . 3. Limit ordinals: Ordinals which are not successor ordinals nor 0. For example, the smallest in nite ordinal, !, is a limit ordinal. It is not zero (because zero is nite), nor can it be a successor ordinal, because if it were a successor ordinal, say, + 1, then would be nite (since ! is the smallest in nite ordinal), but then + 1 would be nite as well. Ordinal numbers have an arithmetical structure: two ordinals and have a sum + , a product , and a power . It would be beyond the scope of this paper to give the full de nition of these operations. We will only remark that some care is needed because although ordinal arithmetic is associative|e.g., ( + ) + = + ( + ), and similarly for multiplication|it is not generally commutative: + is not always equal to + , and is not always equal to . For this reason, one often sees products like 2, which are not necessarily equivalent to the more familiar 2 . ! ! The ordinal is the smallest ordinal bigger than the ordinals !; ! ; ! ; : : :. It satis es the equation = ! and can be intuitively thought of as = ! : !+1 ! 5 Ordinals below include such ordinals as !, ! + ! + ! + 3, ! ! ! ! !2+1 4 5 3 ! ! +! +! +3 ! +! 8 ! + ! + ! + ! + 1; and so on. Any ordinal below can be uniquely written in the form 1 2 k ! + ! + + ! where are smaller ordinals below |this form for an ordinal below is 1 k 0 0 !2 called its Cantor normal form. For example, the Cantor normal form for ! 2 + ! 3 + 2 is !2 !2 !2 1 1 1 0 0 ! 2 + ! 3 + 2 = ! + ! + ! + ! + ! + ! + ! : De nition 23 (Standard fundamental sequences for limit ordinals ) Suppose is a limit ordinal . We de ne a fundamental sequence for , written ([0]; [1]; [2]; : : :), inductively as follows. ! ! If = , then [0] = !, [1] = ! , [2] = ! , and so on. 1 k If has Cantor normal form ! + + ! where k > 1, then each 1 k1 k [i] = ! + + ! + (! [i]): 11 S. Alexander & B. Hibbard If has Cantor normal form ! , then each [i] = ! i. [i] 0 0 If has Cantor normal form ! where is a limit ordinal, then each [i] = ! . Example 24 (Fundamental sequence examples) 0+1 0 0 0 The fundamental sequence for = ! = ! is ! 0; ! 1; ! 2; : : :, i.e., 0; 1; 2; : : :. 5 4 4 4 The fundamental sequence for = ! is 0; ! ; ! 2; ! 3; : : :. ! 0 1 2 The fundamental sequence for = ! is ! ; ! ; ! ; : : :. ! ! ! ! The fundamental sequence for = ! + ! is ! + 0; ! + 1; ! + 2; : : :. De nition 25 (The standard slow-growing hierarchy up to ) We de ne functions g : N ! N (for all ordinals ) by trans nite induction as follows. g (n) = 0. g (n) = g (n) + 1 if + 1 . +1 0 g (n) = g (n) if is a limit ordinal. [n] Here are some early levels in the slow-growing hierarchy, spelled out in detail. Example 26 (Early examples of functions in the slow-growing hierarchy) 1. g (n) = g (n) = g (n) + 1 = 0 + 1 = 1. 1 0+1 0 2. g (n) = g (n) = g (n) + 1 = 1 + 1 = 2. 2 1+1 1 3. More generally, for all m 2 N, g (n) = m. 4. g (n) = g (n) = g (n) = n. ! n ![n] 5. g (n) = g (n) + 1 = n + 1. !+1 ! 6. More generally, for all m 2 N, g (n) = n + m. !+m 7. g (n) = g (n) = g (n) = n + n = n 2. !2 !+n (!2)[n] Following Example 26, the reader should be able to ll in the details in the following example. Example 27 (More examples from the slow-growing hierarchy) 1. g 2 (n) = n . 2. g 3 (n) = n . 3. g (n) = n . 3n+1 4. g !3+1 (n) = n + n + 5. ! +!+5 12 Measuring Intelligence and Growth Rate 5. g (n) = n . What about g ? Thinking of as ! ; one might expect g (n) to be n ; but such an in nite tower of natural number exponents makes no sense if n > 1. Instead, the answer de es familiar mathematical notation. Example 28 (Level in the slow-growing hierarchy) The values of g are as follows: g (0) = 0. g (1) = 1 . g (2) = 2 . g (3) = 3 . And so on. Examples 26{28 illustrate how the slow-growing hierarchy systematically provides a family of reference functions against which any particular function can be compared. This yields a solution to Problem 10: we can declare the growth rate of an arbitrary function f : N ! N to be the smallest ordinal < such that g f (or 1 if there is no such ). For any bounded set S of ordinals, there is a canonical upper bound for S, namely, the supremum of S. Thus we obtain an ASPI measure (not just a taxonomy). De nition 29 If p is a predictor, the ASPI measure of p given by the standard slow- growing hierarchy up to is de ned to be the supremum of S (or 1 if 2 S), where S is 0 0 the set of all ordinals such that the following condition holds: for every evader e, if g t , then p learns e. In De nition 25, in the successor ordinal case, we chose to de ne g (n) = g (n) + 1. The resulting majorization hierarchy is referred to as slow-growing because in some sense this makes g just barely faster-growing than g . Dierent de nitions of g would yield +1 +1 dierent majorization hierarchies, such as the following. De nition 30 (The standard fast-growing hierarchy up to , also known as the Wainer hierarchy) We de ne functions h : N ! N (for all ordinals ) by trans nite induction as follows. h (n) = n + 1. 13 S. Alexander & B. Hibbard n n 1 2 h (n) = h (n), where h is the nth iterate of h (so h (x) = h (x), h (x) = h (h (x)), h (x) = h (h (h (x))), and so on). h (n) = h (n) if is a limit ordinal . [n] 0 The functions in the fast-growing hierarchy grow quickly as grows. It can be shown (Wainer and Buchholz, 1987) that for every computable function f whose totality can be proven from the axioms of Peano arithmetic, there is some < such that h f . De nition 31 If p is a predictor, the ASPI measure of p given by the standard fast-growing hierarchy up to is de ned to be the supremum of S (or 1 if 2 S), where S is the set of 0 0 all ordinals such that the following condition holds: for every predictor e, if h t , 0 e then p learns e. Proposition 32 For each < , there is a predictor p (resp. q) whose ASPI measure given by the standard slow-growing (resp. fast-growing) hierarchy up to is . Proof Similar to the proof of Proposition 8. Between De nitions 29 and 31, the former oers a ner granularity intelligence measure for the predictors to which it assigns non-1 intelligence, but the latter assigns non-1 intelligence to more intelligent predictors. De nitions 25 and 30 are only two examples of majorization hierarchies. Both the slow- and fast-growing hierarchies can be extended by extending the fundamental sequences of De nition 23 to larger ordinals , however, the larger the ordinals become, the more dicult it is to do this, and especially the less clear it is how to do it in any sort of canonical way. There are also other choices for how to proceed at successor ordinal stages besides g (n) = g (n) + 1 or h (n) = h (n)|for example, one of the oldest majorization hierarchies is the Hardy hierarchy (Hardy, 1904), where H (n) = H (n + 1). And even for ordinals up to , there are other ways to choose fundamental sequences besides how we de ned them in De nition 23|choosing non-canonical fundamental sequences can drastically alter the resulting majorization hierarchy (Weiermann, 1997). All these dierent majorization hierarchies yield dierent ASPI measures. 5.1 A remark about ASPI measures and AGI intelligence All the ASPI measures and taxonomies we have de ned so far double as indirect intelligence measures and taxonomies for an AGI, by the argument we made in Subsection 2.1. For a given AGI X , a priori, we cannot say much about the predictor which X would act as if X were commanded to act as a predictor. But there is one particularly elegant and parsimonious strategy which X might use, a brute force strategy, namely: Enumerate all the computable functions f which X knows to be total, and for each one, attempt to predict the evader e by assuming that the evader's runtime t satis es 8. Remarkably, the slow-growing hierarchy eventually catches up with the fast-growing hierarchy if both hierarchies are extended to suciently large ordinals (Wainer, 1989; Girard, 1981), a beautiful illustration of how counter-intuitive large ordinal numbers can be. 14 Measuring Intelligence and Growth Rate f t . If the evader proves not to be so majorized (by diering from every computable function whose runtime is so majorized), then move on to the next known total function f , and continue the process. We do not know for certain which predictor X would imitate when commanded to act as a predictor, but it seems plausible that X would use this brute force strategy or something equivalent. For an AGI X who uses the above brute force strategy, ASPI measures of X 's intelligence would be determined by X 's knowledge, namely, by the runtime complexity of the computable functions which X knows to be total. Furthermore, the most natural way for X to know totality of functions with large runtime complexity, is for X to know fundamental sequences for large ordinal numbers, and produce said functions by means of majorization hierarchies . This suggests a connection between 1. ASPI measures like that of De nition 31, and 2. intelligence measures based on which ordinals the AGI knows (Alexander, 2019b). Indeed, Alexander (2020a) has argued that the task of notating large ordinals is one which spans the entire range of intelligence. This is reminiscent of Chaitin's proposal to use ordinal notation as a goal intended to facilitate evolution|\and the larger the ordinal, the tter the organism" (Chaitin, 2011)|and Good's observation (Good, 1969) that iterated Lucas-Penrose contests boil down to contests to name the larger ordinal. 6. Hyperreal numbers and surreal numbers In this section, we will exhibit an abstract ASPI taxonomy based on hyperreal numbers and an abstract ASPI measure based on surreal numbers. We do not assume previous familiarity with either of these number systems. 6.1 The hyperreal ASPI taxonomy In this subsection, we will begin by considering growth rate comparison, which is a strictly simpler problem than growth rate measurement (our proposed solution will then lead to a numerical growth rate measure anyway). Given two functions f and g, does f outgrow g or does f not outgrow g? We would like to say that f outgrows g if and only if f (n) > g(n) for \a majority of " n 2 N, but it is not clear what \majority" should mean. Certainly if f (n) > g(n) for all but nitely many n 2 N, it should be safe to say f outgrows g, and if f (n) g(n) for all but nitely many n 2 N, it should be safe to say f does not outgrow g. But what if there are in nitely many n 2 N such that f (n) > g(n), and in nitely many n 2 N such that f (n) g(n)? Adapting the key insight from Alexander (2019a), consider each n 2 N to be a voter in an election to determine whether or not f outgrows g. Each n votes based on whether or not f (n) > g(n). For example, 532 is a voter in this election. If f (532) > g(532), then 532 9. It may be possible for an AGI to be contrived to know totality of functions that are larger than the functions produced by majorization hierarchies up to ordinals the AGI knows about, but we conjecture that that is not the case for AGIs not so deliberately contrived. 15 S. Alexander & B. Hibbard casts her vote for \f outgrows g"; otherwise, 532 casts her vote for \f does not outgrow g". This reduces the outgrowth problem to an election decision problem: f shall be considered to outgrow g if and only if \f outgrows g" gets a winning bloc of votes. We need to decide what it means for a set N N to constitute a winning bloc of votes. We reason as follows. ; should not be a winning bloc: if no-one votes for you, you lose. If N is a winning bloc and N N , then N should be a winning bloc: if you were 1 1 2 2 already winning, and additional voters switch their votes to you, you should still win. For any N N, either N should be a winning bloc, or its complement N = NnN should be a winning bloc: however the election goes, either you win or your opponent wins. We should insist on the outgrowth relation being transitive: if f outgrows g and g outgrows h, then f should outgrow h. Suppose that N = fn 2 N : f (n) > g(n)g; fg N = fn 2 N : g(n) > h(n)g; and gh N = fn 2 N : f (n) > h(n)g: fh Clearly N \ N N but, a priori, we cannot say more: one can nd functions fg gh fh f; g; h such that N \ N = N . Thus, in order to ensure transitivity of the fg gh fh outgrowth relation, we should insist on the following requirement. Whenever N and N are winning blocs, then N \ N should be a winning bloc. 2 1 2 We could trivially satisfy the above requirements, namely: we could choose some n 2 N and declare that N N is a winning bloc if and only if n 2 N . In electoral 0 0 terms, this would amount to making n a dictator, whose vote decides the election regardless how anyone else votes. In terms of the outgrowth relation, this would amount to declaring that f outgrows g if and only if f (n ) > g(n ). That would be 0 0 a poor method of comparing growth rates. Thus, we should insist that fn g is not a winning bloc for any n 2 N. Is it possible to satisfy all the above requirements, or are they too demanding? It turns out it is possible. In fact, the above requirements are exactly the requirements of a free ultra lter, an important device from mathematical logic. De nition 33 An ultra lter on N (or more simply an ultra lter) is a set U of subsets of N such that: 1. ; 62 U . 2. For every N 2 U , for every N N, if N N , then N 2 U . 1 2 1 2 2 3. For every N N, either N 2 U or N 2 U . 4. (\-closure) For every N ; N 2 U , N \ N 2 U . 1 2 1 2 16 Measuring Intelligence and Growth Rate An ultra lter is free if it does not contain any singleton fn g (n 2 N). 0 0 Clearly a free ultra lter is exactly a notion of winning blocs meeting all our requirements. The following theorem is well-known in mathematical logic, and we state it without proof. Theorem 34 Free ultra lters exist. Theorem 34 is profound because it is counter-intuitive that there should be a non- dictatorial method of determining election winners satisfying \-closure. To see how counter- intuitive \-closure is, suppose that in 2021 the Dog party wins the presidency and in 2022 the Cat party wins the presidency (with the same voters every year and no other parties). Call a voter a \Dog-to-Cat switcher" if they vote Dog in 2021 and Cat in 2022. The \- closure property says in order to win in 2023, it would be enough to get just the Dog-to-Cat switchers' votes and no others. For more on in nite-voter elections and free ultra lters, and especially their interplay with Arrow's impossibility theorem, see Kirman and Sondermann (1972). For the remainder of the section, let U be a free ultra lter. Unfortunately, logicians have shown that, though free ultra lters exist, it is impossible to concretely exhibit one. More precisely, all known proofs of Theorem 34 are non-constructive (using non-constructive set- theoretic axioms such as the Axiom of Choice) and logicians have proven that Theorem 34 cannot be proved constructively. De nition 35 Suppose f; g : N ! R. We say f > g if fn 2 N : f (n) > g(n)g 2 U: In other words: if U is thought of as a black box deciding which subsets of N are winning blocs, then f > g if and only if \f outgrows g" wins the election when each n 2 N votes for \f outgrows g" or \f does not outgrow g" depending whether f (n) > g(n) or f (n) g(n) respectively. Lemma 36 > is transitive. Proof Suppose f; g; h : N ! R are such that f > g and g > h, we must show f > h. U U U Let N = fn 2 N : f (n) > g(n)g; fg N = fn 2 N : g(n) > h(n)g; and gh N = fn 2 N : f (n) > h(n)g: fh Since f > g, N 2 U . Since g > h, N 2 U . By \-closure, N \ N 2 U . Clearly U fg U gh fg gh N \ N N , so, by (2) of De nition 33, N 2 U , that is, f > h. fg gh fh fh U We will now explain how De nition 35 leads to a numerical growth rate measure and, in turn, an ASPI taxonomy. We will show that by coming up with De nition 35, we have actually done much of the work of the so-called ultrapower construction of the hyperreal number system, studied in the eld of non-standard analysis (Robinson, 1974; Goldblatt, 2012). 17 S. Alexander & B. Hibbard De nition 37 (Compare De nition 35) Suppose f; g : N ! R. We say f g if fn 2 N : f (n) = g(n)g 2 U: In other words, f g if \f = g" wins the election (as decided by U ) when each n 2 N votes for \f = g" or \f 6= g" depending whether f (n) = g(n) or f (n) 6= g(n) respectively. Lemma 38 The relation (from De nition 37) is an equivalence relation. Proof Symmetry and re exivity are trivial. The proof of transitivity is similar to Lemma De nition 39 The hyperreal numbers, written R, are the equivalence classes of . For every f : N ! R, write f for the hyperreal number (i.e., the -equivalence class) containing f . We endow R with arithmetic and order as follows (where f; g : N ! R): ^ ^ We de ne addition on R by declaring that f + g ^ = h where h(n) = f (n) + g(n). ^ ^ We de ne multiplication on R by declaring that f g ^ = h where h(n) = f (n)g(n). We order R by declaring that f > g ^ if and only if f > g (De nition 35). The following is well-known and we state it without proof. Theorem 40 The addition, multiplication, and ordering in De nition 39 are well-de ned, and they make R an ordered eld. With this machinery, we now have a trivial hyperreal solution to Problem 10. De nition 41 (The hyperreal solution to Problem 10) For any function f : N ! N, the hyperreal growth rate of f is the hyperreal number f . Because of the non-constructive nature of free ultra lters, the following notions are even less practical than the measures in the previous sections. However, they could potentially be useful for proving theoretical properties about the intelligence of predictors. De nition 42 Suppose p is a predictor and e is an evader. Let (x ; y ; x ; y ; : : :) be the 1 1 2 2 result of p playing against e. We say p U -learns e if fn 2 N : p(x ; : : : ; x ) = e(y ; : : : ; y )g 2 U 1 n 1 n (or equivalently: fn 2 N : y = x g 2 U ). In other words, p U -learns e if \p learns e" n+1 n+1 wins the election (according to U ) when every n 2 N votes for \p learns e" or \p does not learn e" depending whether or not p(x ; : : : ; x ) = e(y ; : : : ; y ). 1 n 1 n In the following de nition, rather than assigning a particular hyperreal number intelligence to every predictor, rather, we categorize predictors into a taxonomy. This is necessary because there is no way of choosing canonical bounds of bounded sets of hyperreal numbers in general. For lack of a way of choosing a particular bound, we are forced to consider many taxa corresponding to many bounds. 18 Measuring Intelligence and Growth Rate De nition 43 (The hyperreal ASPI taxonomy) Let p be a predictor and let f be a hyperreal number. We say that p has hyperreal ASPI intelligence at least f if and only if the following condition holds: for every evader e, if the hyperreal growth rate of t is < f , then p U -learns e. Now we would like to state an analog of Proposition 8 for hyperreal ASPI intelligence, but before we can do that, we need to state the following lemma. This lemma is well-known so we state it without proof. Lemma 44 For any N 2 U , for any nite N N, the dierence NnN is in U . 0 0 Proposition 45 For any computable function f : N ! N, there is a predictor which has hyperreal ASPI intelligence at least f . Proof Let p be as in the proof of Proposition 8. We claim p has hyperreal ASPI f f intelligence at least f . To see this, assume e is an evader such that the hyperreal growth rate of t is < f , we will show p U -learns e. e f Let (x ; y ; x ; y ; : : :) be the result of p playing against e. Let 1 1 2 2 f N = fn 2 N : f (n) > t (n)g; 1 e N = fn 2 N : p (x ; : : : ; x ) = e(y ; : : : ; y )g; 2 1 f 1 n 1 n N = fn 2 N : p (x ; : : : ; x ) = e(y ; : : : ; y )g: 3 f 1 n 1 n By the Claim in the proof of Proposition 8, for all-but- nitely-many n 2 N, if f (n) > t (n), then x = y , i.e., p (x ; : : : ; x ) = e(y ; : : : ; y ). In other words, N = N nN for n+1 n+1 f 1 n 1 n 2 1 0 some nite N N. Since the hyperreal growth rate of t is < f , N 2 U . Since N = N nN , by Lemma 44 e 1 2 1 0 we have N 2 U . Since N N , it follows that N 2 U , that is, p U -learns e, as desired. 2 2 3 3 f Whereas we showed in Proposition 9 that the original Hibbard measure is already non-computable, we conjecture that the hyperreal ASPI taxonomy is even harder (in a computability theoretical sense). We mention in passing that there are other ways to approach non-standard analysis (Hrbacek and Katz, 2020), where the free ultra lter dependency is replaced by a dependency on a model of a certain set of axioms. 6.2 The surreal ASPI measure In De nition 43, we had to content ourselves with an ASPI taxonomy rather than an ASPI measure, because there is no canonical way of choosing a preferred bound for a bounded set of hyperreals. In this section, we will consider another number system, the surreal numbers (Conway, 2000; Knuth, 1974; Ehrlich, 2012), in which the hyperreal numbers can be embedded. By embedding the hyperreals within the surreals, our hyperreal solution to Problem 10 yields a surreal solution to Problem 10, via the embedding. Unlike the hyperreal numbers, the surreal numbers do admit a canonical way of choosing preferred bounds for 19 S. Alexander & B. Hibbard bounded sets. Thus, our surreal solution to Problem 10 will yield an ASPI measure, not just a taxonomy. Formally de ning the surreal numbers involves nuances touching the foundations of mathematics, so we will only sketch the de nition here. Suppose we take the following as the guiding principle in creating a number system: For every set L of lower bounds in our number system, and every set R of upper bounds in our number system, with L < R (by which we mean ` < r for all ` 2 L; r 2 R), we want there to exist a canonical number L < x < R (i.e., a canonical number x such that ` < x < r for all ` 2 L; r 2 R). To simplify things, since this guiding principle only requires the existence of these canonical numbers, we can assume no other numbers exist, only such canonical numbers. Thus, every number in the resulting system can, recursively, be identi ed with a pair (L; R) of sets of numbers, L < R. Notationally, we use fLjRg as a name for the canonical number L < fLjRg < R between L and R, whenever L < R are sets of numbers. There are two tricky issues with this idea: 1. A number in this number system we are building might have multiple names, so we cannot simply take the numbers to be their names. Instead, it is necessary to take the numbers to be equivalence classes of names. What should it mean for two names to be equivalent? 2. If fx jx g and fy jy g are (names of ) two numbers x; y (respectively) in this number L R L R system we are building (so x < x and y < y are sets of numbers in said number L R L R system), what does it mean to say x < y? This question seems somewhat circular, because in order for fx jx g to be a valid name in the rst place already requires L R that x < x (i.e., that ` < r whenever ` 2 x ; r 2 x ). L R L R To answer issue 2, we ask ourselves: what does it mean for the canonical number x between x and x to be less than the canonical number y between y and y ? With some thought, L R L R it seems a natural way to (recursively) answer this question is to declare x < y if and only if one of the following holds: 0 0 There is some y 2 y such that x y , or 0 0 There is some x 2 x such that x y. Having answered (2), we can answer (1) by declaring that two names fx jx g and fy jy g L R L R (of x and y respectively) are equivalent if x 6< y and y 6< x. Carrying out the above de nition in full formality is tricky, because the < relation and the equivalence relation have to be de ned by simultaneous recursion in terms of each other. To give the formal de nition here would be beyond the scope of this paper . The equivalence classes of the above equivalence relation are called surreal numbers. It is possible to de ne addition and multiplication on them in such a way that, along with the ordering < already de ned, the surreal numbers satisfy the axioms of an ordered eld. 10. Technically, < and the equivalence relation are not actually relations at all, because their universes are not sets. For this and other reasons, the formal construction of the surreals is usually carried out in trans nitely many stages, such that at any particular stage, the surreals constructed so far form a set. 20 Measuring Intelligence and Growth Rate Example 46 (Examples of surreal numbers) Taking L = R = ; yields the surreal number named f;j;g, usually abbreviated fjg. It can be shown this is the surreal 0 (i.e., the unique surreal additive identity). Let L = f0g, R = ;, where 0 is as above. This yields the surreal named ff0gjg. It can be shown this is the surreal 1 (i.e., the unique multiplicative identity). Let L = f1g, R = ;, where 1 is as above. This yields the surreal number named ff1gjg. It can be shown this is 2 (i.e., 1 + 1). In the same way, one can obtain surreal numbers 3; 4; 5; : : :. Let L = f0; 1; 2; : : :g (as above), R = ;. This yields the surreal named ff0; 1; 2; : : :gjg. This surreal represents the smallest in nite ordinal number, !, considered as a surreal. With 0 and 1 as above, ff0gjf1gg names a surreal strictly between 0 and 1. It can be shown to be (i.e., that its product with 2 is 1). 1 1 1 With 0 and as above, ff0gjf gg names a surreal strictly between 0 and . It can be 2 2 2 shown to be . 1 1 Similarly, one can construct surreals ; ; : : :. 8 16 1 1 1 1 1 1 With 0 and ; ; ; : : : as above, ff0gjf ; ; ; : : :gg names a surreal larger than 0 2 4 8 2 4 8 1 1 1 but smaller than every ; ; ; : : :. This shows that the surreals include in nitesimals. 2 4 8 This particular in nitesimal can be shown to be 1=!, in the sense that when multiplied by !, the result is 1. The reader might object that if we let L be the set of all surreals, then fLjg seems to name a surreal larger than all surreals, which would be absurd. This paradox is avoided because in fact the class of all surreals is not a set, but a proper class . It can be shown that for any free ultra lter U , the hyperreals constructed using U can be embedded into the surreals. This allows us to transform our hyperreal solution to Problem 10 into a surreal solution. Unfortunately, there are many dierent ways to embed the hyperreals into the surreals, none of them canonical, so while our hyperreal solution already depends arbitrarily on a free ultra ler, our surreal solution further depends on an arbitrary embedding. De nition 47 (The surreal solution to Problem 10) Suppose U is a free ultra lter and R is the corresponding hyperreal number system (De nition 39). Let be an embedding of R into the surreals. For any function f : N ! N, the surreal growth rate of f (given by U and ^ ^ ) is (f ) where f is the hyperreal growth rate of f (De nition 41). The payo of doing this is that there is a canonical way to pick a particular bound of a bounded set of surreals, so the surreal solution to Problem 10 provides an ASPI measure, not just an ASPI taxonomy. 11. It can be shown that the ordinal numbers can be embedded in the surreals, and so the non-sethood of the class of surreals is strictly weaker than the non-sethood of the class of ordinals. The latter non-sethood is referred to as the Burali-Forti paradox. 21 S. Alexander & B. Hibbard De nition 48 (The surreal ASPI measure) Let U be a free ultra lter, let R be the corresponding hyperreal number system, and let be an embedding of R into the surreals. Let ( R) be the range of . For every predictor p, the surreal ASPI measure of p (given by U and ) is de ned to be the surreal with name fLjg, where L is the set of all surreal numbers ` 2 ( R) such that the following condition holds: for every evader e, if the surreal growth rate of t is < `, then p U -learns e. Note that in De nition 48, we require ` 2 ( R) because otherwise L would not be a set. 7. Pros and cons of dierent ASPI measures and taxonomies Here are pros and cons of the ASPI measures and taxonomies which arise from dierent solutions to the problem (Problem 10) of measuring the growth rate of functions. The original Hibbard measure (De nition 7), which arises by measuring growth rate by comparing a function with Liu's enumeration (Liu, 1960) of the primitive recursive functions: { Pro: Relatively concrete. { Pro: Measures intelligence using a familiar number system (the natural numbers). { Con: The numbers which the measure outputs are not very meaningful, in that predictor p having a measure of +1 higher than predictor q tells us little about how much more computationally complex the evaders which p learns are, versus the evaders which q learns. { Con: Only distinguishes suciently non-intelligent predictors; all predictors suciently intelligent receive measure 1. Big-O/Big- (De nition 20), in which, rather than directly measuring the intelligence of a predictor, instead, we would talk of a predictor's intelligence being O(f (n)) or (f (n)) for various functions f : N ! N: { Pro: Nearly perfect granularity (slightly coarser than perfect granularity because of the constants C; C in De nition 19). { Pro: Computer scientists already use Big-O/Big- routinely and are comfortable with them. { Con: A non-numerical taxonomy. Intelligence based on a majorization hierarchy such as the standard slow- or fast- growing hierarchy up to (De nitions 29 and 31): { Pro: A numerical measure, albeit less granular than the Big-O/Big- tax- onomies. { Pro: Relatively concrete. { Pro: The numbers which the measure outputs are meaningful, in the sense that the degree to which a predictor p is more intelligent than a predictor q is re ected in the degree to which p's intelligence-measure is larger than q's. 22 Measuring Intelligence and Growth Rate { Con: The numbers which the measure outputs are ordinal numbers, which may be unfamiliar to some users. { Con: Only distinguishes suciently non-intelligent predictors; for any particular majorization hierarchy, all predictors suciently intelligent receive measure 1. Hyperreal intelligence (De nition 43): { Pro: A taxonomy like Big-O/Big-, but with the added bene t that the taxons are numerical. { Pro: Perfect granularity. { Con: Depends on a free ultra lter (free ultra lters exist but cannot be concretely exhibited). Surreal intelligence (De nition 48): { Pro: An actual numerical measure (not just a taxonomy), with perfect granularity. { Con: The numbers which the measure outputs are surreal numbers, which are relatively new and thus unfamiliar, and are dicult to work with in practice. { Con: Depends on both a free ultra lter and also an embedding of the resulting hyperreals into the surreals. 8. Conclusion To summarize: Hibbard (2011) proposed an intelligence measure for predictors in games of adversarial sequence prediction. We argued that Hibbard's idea actually splits into two orthogonal sub-ideas. First: that intelligence can be measured via the growth-rates of the run-times of evaders that a predictor can learn to predict. Second: that such growth-rates can be measured in one speci c way (involving an enumeration of the primitive recursive functions). We argued that there are many other ways to measure growth-rates, and that each method of measuring growth-rates yields a corresponding adversarial sequence prediction intelligence (ASPI) measure or taxonomy. We considered several speci c ways of measuring growth-rates of functions, and exhibited corresponding ASPI measures and taxonomies. The growth-rate-measuring methods which we considered were: Big-O/Big- notation; majorization hierarchies; hyperreal numbers; and surreal numbers. We also discussed how the intelligence of adversarial sequence predictors can be considered as an approximation of the intelligence of idealized AGIs. 23 S. Alexander & B. Hibbard Acknowledgments We acknowledge Bryan Dawson for feedback on Section 6.1. We acknowledge Philip Ehrlich for correcting a mistake. We acknowledge Mikhail Katz and Roman Yampolskiy for providing literature references. We acknowledge the editor and the reviewers for much generous feedback and suggestions. References Alexander, S. A. 2019a. Intelligence via ultra lters: structural properties of some intelligence comparators of deterministic Legg-Hutter agents. Journal of Arti cial General Intelligence 10(1):24{45. Alexander, S. A. 2019b. Measuring the intelligence of an idealized mechanical knowing agent. In CIFMA. Alexander, S. A. 2020a. AGI and the Knight-Darwin Law: why idealized AGI reproduction requires collaboration. In ICAGI. Alexander, S. A. 2020b. The Archimedean trap: Why traditional reinforcement learning will probably not yield AGI. Journal of Arti cial General Intelligence 11(1):70{85. Bostrom, N. 2003. Ethical issues in advanced arti cial intelligence. In Schneider, S., ed., Science ction and philosophy: from time travel to superintelligence. John Wiley and Sons. 277{284. Chaitin, G. 2011. Metaphysics, Metamathematics and Metabiology. In Hector, Z., ed., Randomness through computation. World Scienti c. Conway, J. H. 2000. On Numbers and Games. CRC Press, 2nd edition. Ehrlich, P. 2012. The absolute arithmetic continuum and the uni cation of all numbers great and small. Bulletin of Symbolic Logic 18:1{45. Girard, J.-Y. 1981. -logic, Part 1: Dilators. Annals of Mathematical Logic 21(2-3):75{219. Goldblatt, R. 2012. Lectures on the hyperreals: an introduction to nonstandard analysis. Springer. Good, I. J. 1969. G odel's theorem is a red herring. The British Journal for the Philosophy of Science 19(4):357{358. Hardy, G. H. 1904. A theorem concerning the in nite cardinal numbers. Quarterly Journal of Mathematics 35:87{94. Hibbard, B. 2008. Adversarial sequence prediction. In ICAGI, 399{403. Hibbard, B. 2011. Measuring agent intelligence via hierarchies of environments. In ICAGI, 303{308. 24 Measuring Intelligence and Growth Rate Hrbacek, K., and Katz, M. G. 2020. In nitesimal analysis without the axiom of choice. Preprint. Hutter, M. 2004. Universal arti cial intelligence: Sequential decisions based on algorithmic probability. Springer. Kirman, A. P., and Sondermann, D. 1972. Arrow's theorem, many agents, and invisible dictators. Journal of Economic Theory 5(2):267{277. Knuth, D. E. 1974. Surreal numbers: a mathematical novelette. Addison-Wesley. Knuth, D. E. 1976. Big Omicron and big Omega and big Theta. ACM Sigact News 8(2):18{24. Legg, S. 2006. Is there an elegant universal theory of prediction? In International Conference on Algorithmic Learning Theory, 274{287. Springer. Liu, S.-C. 1960. An enumeration of the primitive recursive functions without repetition. Tohoku Mathematical Journal 12(3):400{402. Robinson, A. 1974. Non-standard analysis. Princeton University Press. Wainer, S., and Buchholz, W. 1987. Provably computable functions and the fast growing hierarchy. In Simpson, S. G., ed., Logic and Combinatorics. AMS. Wainer, S. 1989. Slow growing versus fast growing. The Journal of Symbolic Logic 54(2):608{ Wang, P. 2019. On De ning Arti cial Intelligence. Journal of Arti cial General Intelligence 10(2):1{37. Weiermann, A. 1997. Sometimes slow growing is fast growing. Annals of Pure and Applied Logic 90(1-3):91{99. Weiermann, A. 2002. Slow versus fast growing. Synthese 133:13{29. Yampolskiy, R. V. 2012. AI-complete, AI-hard, or AI-easy{classi cation of problems in AI. In The 23rd Midwest Arti cial Intelligence and Cognitive Science Conference. Yampolskiy, R. V. 2013. Turing test as a de ning feature of AI-completeness. In Arti cial intelligence, evolutionary computing and metaheuristics. Springer. 3{17. Yampolskiy, R. V. 2020. On Controllability of Arti cial Intelligence. Technical report. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of Artificial General Intelligence de Gruyter http://www.deepdyve.com/lp/de-gruyter/measuring-intelligence-and-growth-rate-variations-on-hibbard-s-40QT8EedRA

Loading next page...

References (36)

S. Alexander (2019)
Intelligence via ultrafilters: structural properties of some intelligence comparators of deterministic Legg-Hutter agents
Journal of Artificial General Intelligence, 10
P. O'Kane (1978)
On Numbers and Games
Journal of the Operational Research Society, 29
D. Knuth (1976)
Big Omicron and big Omega and big Theta
SIGACT News, 8
(2019)
2019a. Intelligence via ultrafilters: structural properties
Marcus Hutter (2008)
Sequential Decisions based on Algorithmic Probability
S. Wainer (1989)
Slow growing versus fast growing
Journal of Symbolic Logic, 54
(d) For all n > 0 , the ( n + 1) th prediction y n +1 = p ( x 1 , . . . , x n ) is the output of p on the sequence of the ﬁrst n evasions
S. Alexander (2019)
Measuring the intelligence of an idealized mechanical knowing agent
ArXiv, abs/1912.09571
Pei Wang (2019)
On Defining Artificial Intelligence
Journal of Artificial General Intelligence, 10
B. Hibbard (2008)
Adversarial Sequence Prediction
S. Alexander (2020)
The Archimedean trap: Why traditional reinforcement learning will probably not yield AGI
Journal of Artificial General Intelligence, 11
G. Chaitin (2011)
Metaphysics, Metamathematics and Metabiology
b) The ﬁrst prediction y 1 = p ( (cid:104)(cid:105) ) is the output of p when run on the empty evasion-sequence
A. Weiermann (2002)
Slow Versus Fast Growing
Synthese, 133
A. Weiermann (1997)
Sometimes Slow Growing is Fast Growing
Ann. Pure Appl. Log., 90
Philip Ehrlich (2012)
The Absolute Arithmetic Continuum and the Unification Of all Numbers Great and Small
The Bulletin of Symbolic Logic, 18
Marcus Hutter (2004)
Universal artificial intelligence
Roman Yampolskiy (2012)
AI-Complete, AI-Hard, or AI-Easy - Classification of Problems in AI
Shih-Chao Liu (1960)
AN ENUMERATION OF THE PRIMITIVE RECURSIVE FUNCTIONS WITHOUT REPETITION
Tohoku Mathematical Journal, 12
I. Good (1969)
GÖDEL'S THEOREM IS A RED HERRING
The British Journal for the Philosophy of Science, 19
Example 27 (More examples from the slow-growing hierarchy)
N. Bostrom (2020)
Ethical Issues in Advanced Artificial Intelligence
W. Buchholz, Stan Wainer (1987)
Provably computable functions and the fast growing hierarchy
K. Hrbacek, M. Katz (2020)
Infinitesimal analysis without the Axiom of Choice
Ann. Pure Appl. Log., 172
Roman Yampolskiy (2013)
Turing Test as a Defining Feature of AI-Completeness
any predictor p and
A. Kirman, D. Sondermann (1972)
Arrow's theorem, many agents, and invisible dictators☆
Journal of Economic Theory, 5
W. Luxemburg (1977)
Non-Standard Analysis
R. Guy (1978)
On numbers and games
Proceedings of the IEEE, 66
(1974)
Surreal numbers: a mathematical novelette
B. Hibbard (2011)
Measuring Agent Intelligence via Hierarchies of Environments
R. Goldblatt (1998)
Lectures on the hyperreals : an introduction to nonstandard analysis
S. Alexander (2020)
AGI and the Knight-Darwin Law: why idealized AGI reproduction requires collaboration
ArXiv, abs/2005.08801
S. Legg (2006)
Is There an Elegant Universal Theory of Prediction?
ArXiv, abs/cs/0606070
J. Girard (1981)
Π12-logic, Part 1: Dilators
Annals of Mathematical Logic, 21
(1904)
A theorem concerning the inﬁnite cardinal numbers

Publisher: de Gruyter
Copyright: © 2021 Samuel Alexander et al., published by Sciendo
ISSN: 1946-0163
eISSN: 1946-0163
DOI: 10.2478/jagi-2021-0001
Publisher site: See Article on Publisher Site

Abstract

In 2011, Hibbard suggested an intelligence measure for agents who compete in an adversarial sequence prediction game. We argue that Hibbard's idea should actually be considered as two separate ideas: rst, that the intelligence of such agents can be measured based on the growth rates of the runtimes of the competitors that they defeat; and second, one speci c (somewhat arbitrary) method for measuring said growth rates. Whereas Hibbard's intelligence measure is based on the latter growth-rate-measuring method, we survey other methods for measuring function growth rates, and exhibit the resulting Hibbard-like intelligence measures and taxonomies. Of particular interest, we obtain intelligence taxonomies based on Big-O and Big-Theta notation systems, which taxonomies are novel in that they challenge conventional notions of what an intelligence measure should look like. We discuss how intelligence measurement of sequence predictors can indirectly serve as intelligence measurement for agents with Arti cial General Intelligence (AGIs). 1. Introduction In his insightful paper, Hibbard (2011) introduces a novel intelligence measure (which we will here refer to as the original Hibbard measure ) for agents who play a game of adversarial sequence prediction (Hibbard, 2008) \against a hierarchy of increasingly dicult sets of " evaders (environments that attempt to emit 1s and 0s in such a way as to evade prediction). The levels of Hibbard's hierarchy are labelled by natural numbers, and an agent's original Hibbard measure is the maximum n 2 N such that said agent learns to predict all the evaders in the nth level of the hierarchy, or implicitly an agent's original Hibbard measure is 1 if said agent learns to predict all the evaders in all levels of Hibbard's hierarchy. The hierarchy which Hibbard uses to measure intelligence is based on the growth rates of the runtimes of evaders. We will argue that Hibbard's idea is really a combination of two orthogonal ideas. First: that in some sense the intelligence of a predicting agent can be measured based on the growth rates of the runtimes of the evaders whom that predictor 1. Hibbard does not explicitly include the 1 case in his de nition, but in his Proposition 3 he refers to agents having \ nite intelligence", and it is clear from context that by this he means agents who fail to predict some evader somewhere in the hierarchy. 1 S. Alexander & B. Hibbard learns to predict. Second: Hibbard proposed one particular method for measuring said growth rates. The growth rate measurement which Hibbard proposed yields a corresponding intelligence measure for these agents. We will argue that any method for measuring growth rates of functions yields a corresponding adversarial sequence prediction intelligence measure (or ASPI measure for short) provided the underlying number system provides a way of choosing canonical bounds for bounded sets. If the underlying number system does not provide a way of choosing canonical bounds for bounded sets, the growth-rate-measure will yield a corresponding ASPI taxonomy (like the big-O taxonomy of asymptotic complexity). The particular method which Hibbard used to measure function growth rates is not very standard. We will survey other ways of measuring function growth rates, and these will yield corresponding ASPI measures and taxonomies. The structure of the paper is as follows. In Section 2, we review the original Hibbard measure. In Section 3, we argue that any method of measuring growth rates of functions yields an ASPI measure or taxonomy, and that the original Hibbard measure is just a special case resulting from one particular method of measuring function growth rate. In Section 4, we consider Big-O notation and Big- notation and de ne corresponding ASPI taxonomies. In Section 5, we consider solutions to the problem of measuring growth rates of functions using majorization hierarchies, and de ne corresponding ASPI measures. In Section 6, we consider solutions to the problem of measuring growth rates of functions using more abstract number systems, namely the hyperreal numbers and the surreal numbers. We do not assume previous familiarity with these number systems. In Section 7, we give pros and cons of dierent ASPI measures and taxonomies. In Section 8, we summarize and make concluding remarks. 2. Hibbard's original measure Hibbard proposed an intelligence measure for measuring the intelligence of agents who compete to predict evaders in a game of adversarial sequence prediction (we de ne this formally below). A predictor p (whose intelligence we want to measure) competes against evaders e. In each step of the game, both predictor and evader simultaneously choose a binary digit, 1 or 0. Only after both of them have made their choice do they see which choice the other one made, and then the game proceeds to the next step. The predictor's goal in each round is to choose the same digit that the evader will choose; the evader's goal is to choose a dierent digit than the predictor. The predictor wins the game (and is said to learn to predict e, or simply to learn e) if, after nitely many initial steps, eventually the predictor always chooses the same digit as the evader. De nition 1 By B, we mean the binary alphabet f0; 1g. By B , we mean the set of all nite binary sequences. By hi we mean the empty binary sequence. 2 Measuring Intelligence and Growth Rate De nition 2 (Predictors and evaders) 1. By a predictor, we mean a Turing machine p which takes as input a nite (possibly empty) binary sequence (x ; : : : ; x ) 2 B (thought of as a sequence of evasions) and 1 n outputs 0 or 1 (thought of as a prediction), which output we write as p(x ; : : : ; x ). 1 n 2. By an evader, we mean a Turing machine e which takes as input a nite (possibly empty) binary sequence (y ; : : : ; y ) 2 B (thought of as a sequence of predictions) 1 n and outputs 0 or 1 (thought of as an evasion), which output we write as e(y ; : : : ; y ). 1 n 3. For any predictor p and evader e, the result of p playing the game of adversarial sequence prediction against e (or more simply, the result of p playing against e) is the in nite binary sequence (x ; y ; x ; y ; : : :) de ned as follows: 1 1 2 2 (a) The rst evasion x = e(hi) is the output of e when run on the empty prediction- sequence. (b) The rst prediction y = p(hi) is the output of p when run on the empty evasion- sequence. (c) For all n > 0, the (n + 1)th evasion x = e(y ; : : : ; y ) is the output of e on n+1 1 n the sequence of the rst n predictions. (d) For all n > 0, the (n + 1)th prediction y = p(x ; : : : ; x ) is the output of p on n+1 1 n the sequence of the rst n evasions. 4. Suppose r = (x ; y ; x ; y ; : : :) is the result of a predictor p playing against an evader 1 1 2 2 e. For every n 1, we say the predictor wins round n in r if x = y ; otherwise, the n n evader wins round n in r. We say that p learns to predict e (or simply that p learns e) if there is some N 2 N such that for all n > N , p is the winner of round n in r. Note that if e simply ignores its inputs (y ; : : : ; y ) and instead computes e(y ; : : : ; y ) 1 n 1 n based only on n, then e is essentially a sequence. Thus De nition 2 is a generalization of sequence prediction, which many authors have written about (such as Legg (2006), who gives many references). In the future, it could be interesting to consider variations of the game involving probability in various ways, for example, where the predictor wins if his guesses have > 50% win rate, or where the predictor states how con dent he is about each guess, or other such variations. In the following de nition, we dier from Hibbard's original paper because of a minor (and fortunately, easy-to- x) error there. De nition 3 Suppose e is an evader. For each n 2 N, let t (n) be the maximum number of steps that e takes to run on any length-n sequence of binary digits. In other words, t (0) is the number of steps e takes to run on hi, and for all n > 0, t (n) = max (number of steps e takes to run on (b ; : : : ; b )): e 1 n b ;:::;b 2f0;1g 1 n 2. The measures we introduce in this paper would also work if we de ned predictors as not-necessarily- computable functions B ! B, but this would not add much insight. We prefer to emphasize the duality between predictors and evaders when each is a Turing machine. 3 S. Alexander & B. Hibbard Example 4 Let e be an evader. Then t (2) is equal to the number of steps e takes to run on input (0; 0), or to run on input (0; 1), or to run on input (1; 0), or to run on input (1; 1)|whichever of these four possibilities is largest. De nition 5 Suppose f : N ! N and g : N ! N. We say f majorizes g, written f g, if there is some n 2 N such that for all n > n , f (n) > g(n). 0 0 De nition 6 Suppose f : N ! N. We de ne E to be the set of all evaders e such that f t . De nition 7 (The original Hibbard measure) Let g ; g ; : : : be the enumeration of the 1 2 primitive recursive functions given by Liu (1960). For each m > 0, de ne f : N ! N by f (k) = max max g (j): m i 0<im jk For any predictor p, we de ne the original Hibbard intelligence of p to be the maximum m > 0 such that p learns to predict e for every e 2 E (or 0 if there is no such m, or 1 if p learns to predict e for every e 2 E for every m > 0). The following result shows that the measure in De nition 7 does not overshoot the agents being measured. Proposition 8 For any integer m 1, there is a predictor p with original Hibbard measure m. Proof This is part of Proposition 3 of Hibbard (2011), but we give a self-contained proof here because we will state similar results about other measures below. For each computable f : N ! N, let p be the predictor who proceeds as follows when given evasion-sequence (x ; : : : ; x ) 2 B as input. 1 n First, by calling itself recursively on inputs hi; (x ); (x ; x ); : : : ; (x ; : : : ; x ); 1 1 2 1 n1 p determines the prediction-sequence (y ; : : : ; y ) as in De nition 2. 1 n Next, p considers the rst n Turing machines, T ; : : : ; T . For 1 i n, say that T is f 1 n i an nth-order e-lookalike if the following requirements hold: If T halts in f (0) steps on input hi, then T outputs x on that input. i i 1 If T halts in f (1) steps on input (y ), then T outputs x on that input. i 1 i 2 If T halts in f (2) steps on input (y ; y ), then T outputs x on that input. i 1 2 i 3 : : : If T halts in f (n 1) steps on input (y ; : : : ; y ), then T outputs x on that i 1 n1 i n input. T halts in f (n) steps on input (y ; : : : ; y ) with some output X . i 1 n i;n 4 Measuring Intelligence and Growth Rate By simulating T ; : : : ; T as needed (which only requires nitely many steps), p determines 1 n f if any T is an nth-order e-lookalike. If so, p outputs p (x ; : : : ; x ) = X for the minimal i f f 1 n i;n such i. If not, p outputs 0. Claim: For every computable f : N ! N, for every evader e, if (x ; y ; x ; y ; : : :) is the 1 1 2 2 result of p playing against e, then for all-but- nitely-many n 2 N, if f (n) > t (n) then f e x = y . n+1 n+1 Let f; e; (x ; y ; : : :) be as in the claim and assume f (n) > t (n). Being an evader, e is a 1 1 e Turing machine, say, the kth Turing machine. Since f (n) > t (n), it follows that T is an e k nth-order e-lookalike. It follows that, on input (x ; : : : ; x ), p will play the output given 1 n by some nth-order e-lookalike T 0 , k k. For any k < k, say that p is tricked by T 0 on input (x ; : : : ; x ) if, on said input, p f k 1 n 0 0 0 identi es T as the rst nth-order e-lookalike and so plays y = X but X 6= x k n+1 k ;n k ;n n+1 (loosely speaking: p is led to believe the evader is T 0 , and this false belief causes p to f k f incorrectly predict the evader's next digit). It follows that p will not identify T as an f k 0 0 n th-order e-lookalike when run on (x ; : : : ; x 0 ) for any n > n. Thus, p can only be 1 n f tricked at most once by T for any particular k < k. If p is not so tricked, then either k f p identi es T 0 as the rst nth-order e-lookalike for some k < k (in which case p plays f k f 0 0 y = X = x , lest p be tricked by T ), or else p identi es T as the rst nth-order n+1 n+1 k ;n f k f k e-lookalike, in which case p plays y = X = x since e = T . Either way, after f n+1 k;n n+1 k possibly nitely many exceptions caused by being tricked, p always plays y = x f n+1 n+1 whenever f (n) > t (n), proving the claim. We claim p has original Hibbard measure m. To see this, let e 2 E , we must f f m m show p learns e. Let (x ; y ; : : :) be the result of p playing against e. Since e 2 E , 1 1 f f m m f t , so for-all-but- nitely-many n 2 N, f (n) > t (n). And, by the above Claim, with m e m e at most nitely many exceptions, whenever f (n) > t (n), x = y . It follows that m e n+1 n+1 p learns e, as desired. Unfortunately, the original Hibbard measure is not computable (unless the background model of computation is contrived), as the following proposition shows . Proposition 9 Assume the background model of computation is well-behaved enough that there is an evader e which always outputs 0 and whose runtime t is bounded by some 0 e primitive recursive function. Then the original Hibbard measure is not computable: there is no eectively computable procedure, given a predictor, to compute its original Hibbard measure. In fact, there is not even an eectively computable procedure to tell if one given predictor has a higher original Hibbard measure than another given predictor. Proof Let p be a predictor which always outputs 1, and let m be its original Hibbard measure. By the existence of e , it follows that m < 1 since certainly p does not learn e . 0 1 0 Let p be a predictor with original Hibbard measure > m (Proposition 8). If the proposition were false, then we could solve the halting problem as follows. Given any Turing machine M , in order to determine whether or not M halts, proceed as follows. 3. In fact, assuming a non-contrived background model of computation, for any strictly increasing total computable function f , even the following can be shown to be non-computable: given an evader e, to determine whether or not f t . 5 S. Alexander & B. Hibbard Let p be the predictor which, on input (x ; : : : ; x ), outputs p(x ; : : : ; x ) if M halts in M 1 n 1 n n steps, or 1 otherwise. Clearly, M halts if and only if p has a higher original Hibbard measure than p . Likewise, the variations on Hibbard's measure which we present in this paper are also non-computable. To quantify their precise degrees of computability (e.g., where they fall within the arithmetical hierarchy) would be beyond the scope of this paper. We will, however, state one conjecture. If we modi ed the original Hibbard measure by replacing Liu's enumeration of the primitive recursive functions by an enumeration of all total computable functions, then we conjecture the resulting measure would be strictly higher in the arithmetical hierarchy (i.e., would require strictly stronger oracles to compute), essentially because the set of total computable functions is not computably enumerable, whereas the primitive recursive functions are. 2.1 Predictor intelligence and AGI intelligence De nition 7, and similar measures and taxonomies which we will de ne later, quantify the intelligence of predictors in the game of adversarial sequence prediction. But any method for quantifying the intelligence of such predictors can also approximately quantify the intelligence of (suitably idealized) agents with Arti cial General Intellience (that is, the intelligence of AGIs). The idealized AGIs we have in mind should be capable of understanding, and obedient in following or trying to follow, commands issued in everyday human language (this is not to say that all AGIs must necessarily be obedient, merely that for the purposes of this paper we restrict our attention to obedient AGIs). For example, if such an idealized AGI were commanded, \until further notice, compute and list the digits of pi," it would be capable of understanding that command, and would obediently compute said digits until commanded otherwise . It is unclear how an AGI ought to respond if given an impossible command, such as \write a computer program that solves the halting problem", or Yampolskiy's \Disobey!" (Yampolskiy, 2020). But an AGI should be capable of understanding and attempting to obey an open-ended command, provided it is not impossible. For example, we could command an AGI to \until further notice, write an endless poem about trees," and the AGI should be able to do so, writing said poem line-by-line until we tell it to stop. This is despite the fact that the command is open-ended and under-determined (there are many decisions involved in writing a poem about trees, and we have left all these decisions to the AGI's discretion). The AGI's ability to obey such open-ended and under-determined commands exempli es its ability to \adapt with insucient knowledge and resources" (Wang, 2019). One well-known example of an open-ended command which an AGI should be perfectly 4. It is somewhat unclear how explicitly an AGI would obey certain commands. To use an example of Yampolskiy (2020), if we asked a car-driving AGI to stop the car, would the AGI stop the car in the middle of trac, or would it pull over to the side rst? We assume this ambiguity does not apply when we ask the AGI to perform tasks of a suciently abstract and mathematical nature. 5. Our thinking here is reminiscent of some remarks of Yampolskiy (2013). 6 Measuring Intelligence and Growth Rate capable of attempting to obey (perhaps at peril to us) is Bostrom's \manufacture as many paperclips as possible" (Bostrom, 2003). In particular, such an idealized AGI X should be capable of obeying the following command: \Act as a predictor in the game of adversarial sequence prediction". By giving X this command, and then immediately ltering out all X 's sensory input except only for input about the digits chosen by an evader, we would obtain a formal predictor in the sense of De nition 2. This predictor might be called \the predictor generated by X ". Strictly speaking, if the command is given to X at time t, then it would be more proper to call the resulting predictor \the predictor generated by X at time t": up until time t, the observations X makes about the universe might have an eect on the strategy X chooses to take once commanded to act as a predictor; but as long as we lter X 's sensory input immediately after giving X the command, no further such observations can so alter X 's strategy. In short, to use Yampolskiy's terminology (Yampolskiy, 2012), the act of trying to predict adversarial sequence evaders is AI-easy. Thus, any intelligence measure (or taxonomy) for predictors also serves as an intelligence measure (or taxonomy) for suitably idealized AGIs. Namely: the intelligence level of an AGI X is equal to the intelligence level of X 's predictor. Of course, a priori, X might be very intelligent at various other things while being poor at sequence prediction, or vice versa, so this only approximately captures X 's true intelligence. Of course, the same could be said for any competency measure on any task: we do not make any claims that when we measure X 's intelligence via X 's predictor's performance, that this is in any sense \the one true intelligence measure". One could just as well measure X 's intelligence in terms of the Elo ranking X would obtain if one ordered X to compete at chess. We would oer two motivations to consider adversarial sequence prediction ability as a particularly interesting proxy for AGI intelligence measurement: 1. There seem to be high-level connections between intelligence and prediction in general (Hutter, 2004), of which adversarial sequence prediction is an elegant and parsimonious example. 2. Adversarial sequence prediction ability is not bounded above, in the sense that for any particular predictor p, one can easily produce a predictor p that learns all the evaders which p learns and at least one additional evader. 3. Quantifying growth rates of functions The following is a general and open-ended problem. Problem 10 Quantify the growth-rate of functions from N to N. The de nition of the original Hibbard measure (De nition 7) can be thought of as implicitly depending on a speci c solution to Problem 10, which we make explicit in the following de nition. De nition 11 For each m > 0, let f be as in De nition 7. For each f : N ! N, we de ne the original Hibbard growth rate H (f ) to be minfm > 0 : f fg if there is any such m > 0, and otherwise H (f ) = 1. 7 S. Alexander & B. Hibbard In order to generalize the original Hibbard de nition in a uniform way, we will rearrange notation somewhat. We have, in some sense, more notation than necessary, namely the notation in (Hibbard, 2011) and synonymous notation which is modi ed to generalize more readily. Lemma 12 For every natural m > 0 and every f : N ! N, H (f ) m if and only if f f . Proof Straightforward. De nition 13 For every m 2 N, let E be the set of all evaders e such that H (t ) m. Lemma 14 For every natural m > 0, E = E . m m Proof Let e be an evader. By De nition 13, e 2 E if and only if H (t ) m. By Lemma 12, H (t ) m if and only if f t . But by De nition 6, this is the case if and only if e m e e 2 E . Corollary 15 For every predictor p, the original Hibbard measure of p is equal to the maximum natural m > 0 such that p learns e whenever e 2 E , or is equal to 0 if there is no such m, or is equal to 1 if p learns e whenever e 2 E for all m > 0. Proof Immediate by Lemma 14 and De nition 7. In other words, if S is the set of all the m as in Corollary 15, then the original Hibbard measure of p is the \canonical upper bound" of S, where by the \canonical upper bound" of a set of natural numbers we mean the maximum element of that set (or 1 if that set is unbounded). Remark 16 Corollary 15 shows that the de nition of the original Hibbard measure can be rephrased in such a way as to show that it depends in a uniform way on a particular solution to Problem 10, namely on the solution proposed by De nition 11. For any solution 0 H H to Problem 10, we could de ne evader-sets E in a similar way to De nition 13, and, by copying Corollary 15, we could obtain a corresponding intelligence measure given by H (provided there be some way of choosing canonical bounds of bounded sets in the underlying number system|if not, we would have to be content with a taxonomy rather than a measure, a predictor's intelligence falling into many nested taxa corresponding to many dierent upper 2 3 bounds, just as in Big-O notation a function can simultaneously be O(n ) and O(n )). This formalizes what we claimed in the Introduction, that Hibbard's idea can be decomposed into two sub-ideas, rstly, that a predictor's intelligence can be classi ed in terms of the growth rates of the runtimes of the evaders it learns, and secondly, a particular method (De nition 11) of measuring those growth rates (i.e., a particular solution to Problem 10). 8 Measuring Intelligence and Growth Rate 3.1 A theoretical note on the diculty of Problem 10 In this subsection, we will argue that in order for a solution to Problem 10 to be much good, it should probably measure growth rates using some alternative number system to the real numbers. Essentially, this is because the real numbers have the Archimedean property (the property that for any positive real r > 0 and any real y, there is some n 2 N such that nr > y), a constraint which does not apply to function growth rates. De nition 17 Let N be the set of all functions N ! N. A well-behaved real-measure of N N N is a function F : N ! R satisfying the following requirements. 1. (Monotonicity) For each f; g : N ! N, if f g, then F (f ) > F (g). 2. (Nontriviality) For each r 2 R, there is some f : N ! N such that F (f ) > r. Theorem 18 There is no well-behaved real-measure of N . N N Proof Assume F : N ! R is a well-behaved real-measure of N . By Nontriviality, there are f ; f ; : : : : N ! N such that each F (f ) > n. De ne g : N ! N by 0 1 n g(n) = f (n) + + f (n). Clearly, for every n 2 N, g f . By the Archimedean 0 n n property of the real numbers, there is some n 2 N such that n > F (g). By Monotonicity, F (g) > F (f ), but by choice of f , F (f ) > n > F (g), a contradiction. n n n In light of Theorem 18, we are motivated to investigate solutions to Problem 10 using alternatives to the real numbers, which will yield ASPI measures (or taxonomies) in terms of those alternatives. An informal argument could be made that real numbers might be inadequate for measuring AGI intelligence in general . It at least seems plausible that there are AGIs X ; X ; : : : such that each X is signi cantly more intelligent than X , and another AGI 1 2 i+1 i Y such that Y is more intelligent than each X . At least, if this is not true, it is not obvious that it is not true, and it seems like it would be nontrivial to argue that it is not true . Now, if \signi cantly more intelligent" implies \at least +1 more intelligent", then it follows that the intelligence levels of Y and of X ; X ; : : : could not all be real numbers, or else one of 1 2 the X would necessarily be more intelligent than Y . If, as the above argument suggests, the real numbers might potentially be too constrained to perfectly measure intelligence, what next? How could we measure intelligence other than by real numbers? A key motivation for the measures and taxonomies we will come up with below is to provide examples of intelligence measurement using alternative number systems. It is for this reason that we do not, in this paper, consider variations on Hibbard's intelligence measure that arise from simply replacing Liu's enumeration of the primitive recursive functions with various other N-indexed lists of functions (for example, the list of all total computable functions). 6. This argument was pointed out by Alexander (2019b), and by Alexander (2020b) again, the latter amidst a wider discussion of Archimedean and non-Archimedean measures. 7. To do so would require arguing that 8X ; X ; : : :, if each X is signi cantly more intelligent than X , 1 2 i+1 i then 8Y , 9i such that Y is at most as intelligent as X . 9 S. Alexander & B. Hibbard 4. Big-O and Big- intelligence One of the most standard solutions to Problem 10 in computer science is to categorize growth rates of arbitrary functions by comparing them to more familiar functions using Big-O notation or Big- notation. Knuth (1976) de nes these as follows (we modify the de nition slightly because we are only concerned here with functions from N to N). De nition 19 Suppose f : N ! N. We de ne the following function-sets. O(f (n)) is the set of all g : N ! N such that there is some real C > 0 and some n 2 N such that for all n n , g(n) Cf (n). 0 0 (f (n)) is the set of all g : N ! N such that there are some real C > 0 and C > 0 and some n 2 N such that for all n n , Cf (n) g(n) C f (n). 0 0 Note that De nition 19 does not measure growth rates, but rather categorizes growth rates into Big-O and Big- taxonomies. For example, the same function can be both O(n ) and O(n ), the former taxon being nested within the latter. By Remark 16, De nition 19 yields the following elegant taxonomy of predictor intelligence. De nition 20 Suppose p is a predictor, and suppose f : N ! N. We say p has Big-O ASPI measure O(f (n)) if p learns every evader e such that t is O(f (n)). We say p has Big- ASPI measure (f (n)) if p learns every evader e such that t is (f (n)). Proposition 21 For any computable function f : N ! N, there is a predictor p with Big-O ASPI measure O(f (n)) and Big- ASPI measure (f (n)). Proof De ne g : N ! N by g(n) = nf (n) + 1, and let p be as in the proof of Proposition 8. We claim p has Big-O ASPI measure O(f (n)). To see this, let e be any evader such that t is O(f (n)). Thus there is some C 2 R such that for all-but- nitely-many n 2 N, t (n) Cf (n). It follows that g t . By the Claim in the proof of Proposition 8, for all- e e but- nitely-many n 2 N, if g(n) > t (n) then x = y , where (x ; y ; : : :) is the result e n+1 n+1 1 1 of p playing against e. So in all, with only nitely many exceptions, each x = y , as g n+1 n+1 desired. The proof that p has Big- ASPI measure (f (n)) is similar. 5. ASPI measures based on majorization hierarchies Majorization hierarchies (Weiermann, 2002) provide ordinal-number-valued measures for the growth rates of certain functions. A majorization hierarchy depends on many in nite- dimensional parameters. We will describe two majorization hierarchies up to the ordinal , using standard choices for the parameters, and the ASPI measures which they produce. 10 Measuring Intelligence and Growth Rate De nition 22 (Classi cation of ordinal numbers) Ordinal numbers are divided into three types: 1. Zero: The ordinal 0. 2. Successor ordinals: Ordinals of the form + 1 for some ordinal . 3. Limit ordinals: Ordinals which are not successor ordinals nor 0. For example, the smallest in nite ordinal, !, is a limit ordinal. It is not zero (because zero is nite), nor can it be a successor ordinal, because if it were a successor ordinal, say, + 1, then would be nite (since ! is the smallest in nite ordinal), but then + 1 would be nite as well. Ordinal numbers have an arithmetical structure: two ordinals and have a sum + , a product , and a power . It would be beyond the scope of this paper to give the full de nition of these operations. We will only remark that some care is needed because although ordinal arithmetic is associative|e.g., ( + ) + = + ( + ), and similarly for multiplication|it is not generally commutative: + is not always equal to + , and is not always equal to . For this reason, one often sees products like 2, which are not necessarily equivalent to the more familiar 2 . ! ! The ordinal is the smallest ordinal bigger than the ordinals !; ! ; ! ; : : :. It satis es the equation = ! and can be intuitively thought of as = ! : !+1 ! 5 Ordinals below include such ordinals as !, ! + ! + ! + 3, ! ! ! ! !2+1 4 5 3 ! ! +! +! +3 ! +! 8 ! + ! + ! + ! + 1; and so on. Any ordinal below can be uniquely written in the form 1 2 k ! + ! + + ! where are smaller ordinals below |this form for an ordinal below is 1 k 0 0 !2 called its Cantor normal form. For example, the Cantor normal form for ! 2 + ! 3 + 2 is !2 !2 !2 1 1 1 0 0 ! 2 + ! 3 + 2 = ! + ! + ! + ! + ! + ! + ! : De nition 23 (Standard fundamental sequences for limit ordinals ) Suppose is a limit ordinal . We de ne a fundamental sequence for , written ([0]; [1]; [2]; : : :), inductively as follows. ! ! If = , then [0] = !, [1] = ! , [2] = ! , and so on. 1 k If has Cantor normal form ! + + ! where k > 1, then each 1 k1 k [i] = ! + + ! + (! [i]): 11 S. Alexander & B. Hibbard If has Cantor normal form ! , then each [i] = ! i. [i] 0 0 If has Cantor normal form ! where is a limit ordinal, then each [i] = ! . Example 24 (Fundamental sequence examples) 0+1 0 0 0 The fundamental sequence for = ! = ! is ! 0; ! 1; ! 2; : : :, i.e., 0; 1; 2; : : :. 5 4 4 4 The fundamental sequence for = ! is 0; ! ; ! 2; ! 3; : : :. ! 0 1 2 The fundamental sequence for = ! is ! ; ! ; ! ; : : :. ! ! ! ! The fundamental sequence for = ! + ! is ! + 0; ! + 1; ! + 2; : : :. De nition 25 (The standard slow-growing hierarchy up to ) We de ne functions g : N ! N (for all ordinals ) by trans nite induction as follows. g (n) = 0. g (n) = g (n) + 1 if + 1 . +1 0 g (n) = g (n) if is a limit ordinal. [n] Here are some early levels in the slow-growing hierarchy, spelled out in detail. Example 26 (Early examples of functions in the slow-growing hierarchy) 1. g (n) = g (n) = g (n) + 1 = 0 + 1 = 1. 1 0+1 0 2. g (n) = g (n) = g (n) + 1 = 1 + 1 = 2. 2 1+1 1 3. More generally, for all m 2 N, g (n) = m. 4. g (n) = g (n) = g (n) = n. ! n ![n] 5. g (n) = g (n) + 1 = n + 1. !+1 ! 6. More generally, for all m 2 N, g (n) = n + m. !+m 7. g (n) = g (n) = g (n) = n + n = n 2. !2 !+n (!2)[n] Following Example 26, the reader should be able to ll in the details in the following example. Example 27 (More examples from the slow-growing hierarchy) 1. g 2 (n) = n . 2. g 3 (n) = n . 3. g (n) = n . 3n+1 4. g !3+1 (n) = n + n + 5. ! +!+5 12 Measuring Intelligence and Growth Rate 5. g (n) = n . What about g ? Thinking of as ! ; one might expect g (n) to be n ; but such an in nite tower of natural number exponents makes no sense if n > 1. Instead, the answer de es familiar mathematical notation. Example 28 (Level in the slow-growing hierarchy) The values of g are as follows: g (0) = 0. g (1) = 1 . g (2) = 2 . g (3) = 3 . And so on. Examples 26{28 illustrate how the slow-growing hierarchy systematically provides a family of reference functions against which any particular function can be compared. This yields a solution to Problem 10: we can declare the growth rate of an arbitrary function f : N ! N to be the smallest ordinal < such that g f (or 1 if there is no such ). For any bounded set S of ordinals, there is a canonical upper bound for S, namely, the supremum of S. Thus we obtain an ASPI measure (not just a taxonomy). De nition 29 If p is a predictor, the ASPI measure of p given by the standard slow- growing hierarchy up to is de ned to be the supremum of S (or 1 if 2 S), where S is 0 0 the set of all ordinals such that the following condition holds: for every evader e, if g t , then p learns e. In De nition 25, in the successor ordinal case, we chose to de ne g (n) = g (n) + 1. The resulting majorization hierarchy is referred to as slow-growing because in some sense this makes g just barely faster-growing than g . Dierent de nitions of g would yield +1 +1 dierent majorization hierarchies, such as the following. De nition 30 (The standard fast-growing hierarchy up to , also known as the Wainer hierarchy) We de ne functions h : N ! N (for all ordinals ) by trans nite induction as follows. h (n) = n + 1. 13 S. Alexander & B. Hibbard n n 1 2 h (n) = h (n), where h is the nth iterate of h (so h (x) = h (x), h (x) = h (h (x)), h (x) = h (h (h (x))), and so on). h (n) = h (n) if is a limit ordinal . [n] 0 The functions in the fast-growing hierarchy grow quickly as grows. It can be shown (Wainer and Buchholz, 1987) that for every computable function f whose totality can be proven from the axioms of Peano arithmetic, there is some < such that h f . De nition 31 If p is a predictor, the ASPI measure of p given by the standard fast-growing hierarchy up to is de ned to be the supremum of S (or 1 if 2 S), where S is the set of 0 0 all ordinals such that the following condition holds: for every predictor e, if h t , 0 e then p learns e. Proposition 32 For each < , there is a predictor p (resp. q) whose ASPI measure given by the standard slow-growing (resp. fast-growing) hierarchy up to is . Proof Similar to the proof of Proposition 8. Between De nitions 29 and 31, the former oers a ner granularity intelligence measure for the predictors to which it assigns non-1 intelligence, but the latter assigns non-1 intelligence to more intelligent predictors. De nitions 25 and 30 are only two examples of majorization hierarchies. Both the slow- and fast-growing hierarchies can be extended by extending the fundamental sequences of De nition 23 to larger ordinals , however, the larger the ordinals become, the more dicult it is to do this, and especially the less clear it is how to do it in any sort of canonical way. There are also other choices for how to proceed at successor ordinal stages besides g (n) = g (n) + 1 or h (n) = h (n)|for example, one of the oldest majorization hierarchies is the Hardy hierarchy (Hardy, 1904), where H (n) = H (n + 1). And even for ordinals up to , there are other ways to choose fundamental sequences besides how we de ned them in De nition 23|choosing non-canonical fundamental sequences can drastically alter the resulting majorization hierarchy (Weiermann, 1997). All these dierent majorization hierarchies yield dierent ASPI measures. 5.1 A remark about ASPI measures and AGI intelligence All the ASPI measures and taxonomies we have de ned so far double as indirect intelligence measures and taxonomies for an AGI, by the argument we made in Subsection 2.1. For a given AGI X , a priori, we cannot say much about the predictor which X would act as if X were commanded to act as a predictor. But there is one particularly elegant and parsimonious strategy which X might use, a brute force strategy, namely: Enumerate all the computable functions f which X knows to be total, and for each one, attempt to predict the evader e by assuming that the evader's runtime t satis es 8. Remarkably, the slow-growing hierarchy eventually catches up with the fast-growing hierarchy if both hierarchies are extended to suciently large ordinals (Wainer, 1989; Girard, 1981), a beautiful illustration of how counter-intuitive large ordinal numbers can be. 14 Measuring Intelligence and Growth Rate f t . If the evader proves not to be so majorized (by diering from every computable function whose runtime is so majorized), then move on to the next known total function f , and continue the process. We do not know for certain which predictor X would imitate when commanded to act as a predictor, but it seems plausible that X would use this brute force strategy or something equivalent. For an AGI X who uses the above brute force strategy, ASPI measures of X 's intelligence would be determined by X 's knowledge, namely, by the runtime complexity of the computable functions which X knows to be total. Furthermore, the most natural way for X to know totality of functions with large runtime complexity, is for X to know fundamental sequences for large ordinal numbers, and produce said functions by means of majorization hierarchies . This suggests a connection between 1. ASPI measures like that of De nition 31, and 2. intelligence measures based on which ordinals the AGI knows (Alexander, 2019b). Indeed, Alexander (2020a) has argued that the task of notating large ordinals is one which spans the entire range of intelligence. This is reminiscent of Chaitin's proposal to use ordinal notation as a goal intended to facilitate evolution|\and the larger the ordinal, the tter the organism" (Chaitin, 2011)|and Good's observation (Good, 1969) that iterated Lucas-Penrose contests boil down to contests to name the larger ordinal. 6. Hyperreal numbers and surreal numbers In this section, we will exhibit an abstract ASPI taxonomy based on hyperreal numbers and an abstract ASPI measure based on surreal numbers. We do not assume previous familiarity with either of these number systems. 6.1 The hyperreal ASPI taxonomy In this subsection, we will begin by considering growth rate comparison, which is a strictly simpler problem than growth rate measurement (our proposed solution will then lead to a numerical growth rate measure anyway). Given two functions f and g, does f outgrow g or does f not outgrow g? We would like to say that f outgrows g if and only if f (n) > g(n) for \a majority of " n 2 N, but it is not clear what \majority" should mean. Certainly if f (n) > g(n) for all but nitely many n 2 N, it should be safe to say f outgrows g, and if f (n) g(n) for all but nitely many n 2 N, it should be safe to say f does not outgrow g. But what if there are in nitely many n 2 N such that f (n) > g(n), and in nitely many n 2 N such that f (n) g(n)? Adapting the key insight from Alexander (2019a), consider each n 2 N to be a voter in an election to determine whether or not f outgrows g. Each n votes based on whether or not f (n) > g(n). For example, 532 is a voter in this election. If f (532) > g(532), then 532 9. It may be possible for an AGI to be contrived to know totality of functions that are larger than the functions produced by majorization hierarchies up to ordinals the AGI knows about, but we conjecture that that is not the case for AGIs not so deliberately contrived. 15 S. Alexander & B. Hibbard casts her vote for \f outgrows g"; otherwise, 532 casts her vote for \f does not outgrow g". This reduces the outgrowth problem to an election decision problem: f shall be considered to outgrow g if and only if \f outgrows g" gets a winning bloc of votes. We need to decide what it means for a set N N to constitute a winning bloc of votes. We reason as follows. ; should not be a winning bloc: if no-one votes for you, you lose. If N is a winning bloc and N N , then N should be a winning bloc: if you were 1 1 2 2 already winning, and additional voters switch their votes to you, you should still win. For any N N, either N should be a winning bloc, or its complement N = NnN should be a winning bloc: however the election goes, either you win or your opponent wins. We should insist on the outgrowth relation being transitive: if f outgrows g and g outgrows h, then f should outgrow h. Suppose that N = fn 2 N : f (n) > g(n)g; fg N = fn 2 N : g(n) > h(n)g; and gh N = fn 2 N : f (n) > h(n)g: fh Clearly N \ N N but, a priori, we cannot say more: one can nd functions fg gh fh f; g; h such that N \ N = N . Thus, in order to ensure transitivity of the fg gh fh outgrowth relation, we should insist on the following requirement. Whenever N and N are winning blocs, then N \ N should be a winning bloc. 2 1 2 We could trivially satisfy the above requirements, namely: we could choose some n 2 N and declare that N N is a winning bloc if and only if n 2 N . In electoral 0 0 terms, this would amount to making n a dictator, whose vote decides the election regardless how anyone else votes. In terms of the outgrowth relation, this would amount to declaring that f outgrows g if and only if f (n ) > g(n ). That would be 0 0 a poor method of comparing growth rates. Thus, we should insist that fn g is not a winning bloc for any n 2 N. Is it possible to satisfy all the above requirements, or are they too demanding? It turns out it is possible. In fact, the above requirements are exactly the requirements of a free ultra lter, an important device from mathematical logic. De nition 33 An ultra lter on N (or more simply an ultra lter) is a set U of subsets of N such that: 1. ; 62 U . 2. For every N 2 U , for every N N, if N N , then N 2 U . 1 2 1 2 2 3. For every N N, either N 2 U or N 2 U . 4. (\-closure) For every N ; N 2 U , N \ N 2 U . 1 2 1 2 16 Measuring Intelligence and Growth Rate An ultra lter is free if it does not contain any singleton fn g (n 2 N). 0 0 Clearly a free ultra lter is exactly a notion of winning blocs meeting all our requirements. The following theorem is well-known in mathematical logic, and we state it without proof. Theorem 34 Free ultra lters exist. Theorem 34 is profound because it is counter-intuitive that there should be a non- dictatorial method of determining election winners satisfying \-closure. To see how counter- intuitive \-closure is, suppose that in 2021 the Dog party wins the presidency and in 2022 the Cat party wins the presidency (with the same voters every year and no other parties). Call a voter a \Dog-to-Cat switcher" if they vote Dog in 2021 and Cat in 2022. The \- closure property says in order to win in 2023, it would be enough to get just the Dog-to-Cat switchers' votes and no others. For more on in nite-voter elections and free ultra lters, and especially their interplay with Arrow's impossibility theorem, see Kirman and Sondermann (1972). For the remainder of the section, let U be a free ultra lter. Unfortunately, logicians have shown that, though free ultra lters exist, it is impossible to concretely exhibit one. More precisely, all known proofs of Theorem 34 are non-constructive (using non-constructive set- theoretic axioms such as the Axiom of Choice) and logicians have proven that Theorem 34 cannot be proved constructively. De nition 35 Suppose f; g : N ! R. We say f > g if fn 2 N : f (n) > g(n)g 2 U: In other words: if U is thought of as a black box deciding which subsets of N are winning blocs, then f > g if and only if \f outgrows g" wins the election when each n 2 N votes for \f outgrows g" or \f does not outgrow g" depending whether f (n) > g(n) or f (n) g(n) respectively. Lemma 36 > is transitive. Proof Suppose f; g; h : N ! R are such that f > g and g > h, we must show f > h. U U U Let N = fn 2 N : f (n) > g(n)g; fg N = fn 2 N : g(n) > h(n)g; and gh N = fn 2 N : f (n) > h(n)g: fh Since f > g, N 2 U . Since g > h, N 2 U . By \-closure, N \ N 2 U . Clearly U fg U gh fg gh N \ N N , so, by (2) of De nition 33, N 2 U , that is, f > h. fg gh fh fh U We will now explain how De nition 35 leads to a numerical growth rate measure and, in turn, an ASPI taxonomy. We will show that by coming up with De nition 35, we have actually done much of the work of the so-called ultrapower construction of the hyperreal number system, studied in the eld of non-standard analysis (Robinson, 1974; Goldblatt, 2012). 17 S. Alexander & B. Hibbard De nition 37 (Compare De nition 35) Suppose f; g : N ! R. We say f g if fn 2 N : f (n) = g(n)g 2 U: In other words, f g if \f = g" wins the election (as decided by U ) when each n 2 N votes for \f = g" or \f 6= g" depending whether f (n) = g(n) or f (n) 6= g(n) respectively. Lemma 38 The relation (from De nition 37) is an equivalence relation. Proof Symmetry and re exivity are trivial. The proof of transitivity is similar to Lemma De nition 39 The hyperreal numbers, written R, are the equivalence classes of . For every f : N ! R, write f for the hyperreal number (i.e., the -equivalence class) containing f . We endow R with arithmetic and order as follows (where f; g : N ! R): ^ ^ We de ne addition on R by declaring that f + g ^ = h where h(n) = f (n) + g(n). ^ ^ We de ne multiplication on R by declaring that f g ^ = h where h(n) = f (n)g(n). We order R by declaring that f > g ^ if and only if f > g (De nition 35). The following is well-known and we state it without proof. Theorem 40 The addition, multiplication, and ordering in De nition 39 are well-de ned, and they make R an ordered eld. With this machinery, we now have a trivial hyperreal solution to Problem 10. De nition 41 (The hyperreal solution to Problem 10) For any function f : N ! N, the hyperreal growth rate of f is the hyperreal number f . Because of the non-constructive nature of free ultra lters, the following notions are even less practical than the measures in the previous sections. However, they could potentially be useful for proving theoretical properties about the intelligence of predictors. De nition 42 Suppose p is a predictor and e is an evader. Let (x ; y ; x ; y ; : : :) be the 1 1 2 2 result of p playing against e. We say p U -learns e if fn 2 N : p(x ; : : : ; x ) = e(y ; : : : ; y )g 2 U 1 n 1 n (or equivalently: fn 2 N : y = x g 2 U ). In other words, p U -learns e if \p learns e" n+1 n+1 wins the election (according to U ) when every n 2 N votes for \p learns e" or \p does not learn e" depending whether or not p(x ; : : : ; x ) = e(y ; : : : ; y ). 1 n 1 n In the following de nition, rather than assigning a particular hyperreal number intelligence to every predictor, rather, we categorize predictors into a taxonomy. This is necessary because there is no way of choosing canonical bounds of bounded sets of hyperreal numbers in general. For lack of a way of choosing a particular bound, we are forced to consider many taxa corresponding to many bounds. 18 Measuring Intelligence and Growth Rate De nition 43 (The hyperreal ASPI taxonomy) Let p be a predictor and let f be a hyperreal number. We say that p has hyperreal ASPI intelligence at least f if and only if the following condition holds: for every evader e, if the hyperreal growth rate of t is < f , then p U -learns e. Now we would like to state an analog of Proposition 8 for hyperreal ASPI intelligence, but before we can do that, we need to state the following lemma. This lemma is well-known so we state it without proof. Lemma 44 For any N 2 U , for any nite N N, the dierence NnN is in U . 0 0 Proposition 45 For any computable function f : N ! N, there is a predictor which has hyperreal ASPI intelligence at least f . Proof Let p be as in the proof of Proposition 8. We claim p has hyperreal ASPI f f intelligence at least f . To see this, assume e is an evader such that the hyperreal growth rate of t is < f , we will show p U -learns e. e f Let (x ; y ; x ; y ; : : :) be the result of p playing against e. Let 1 1 2 2 f N = fn 2 N : f (n) > t (n)g; 1 e N = fn 2 N : p (x ; : : : ; x ) = e(y ; : : : ; y )g; 2 1 f 1 n 1 n N = fn 2 N : p (x ; : : : ; x ) = e(y ; : : : ; y )g: 3 f 1 n 1 n By the Claim in the proof of Proposition 8, for all-but- nitely-many n 2 N, if f (n) > t (n), then x = y , i.e., p (x ; : : : ; x ) = e(y ; : : : ; y ). In other words, N = N nN for n+1 n+1 f 1 n 1 n 2 1 0 some nite N N. Since the hyperreal growth rate of t is < f , N 2 U . Since N = N nN , by Lemma 44 e 1 2 1 0 we have N 2 U . Since N N , it follows that N 2 U , that is, p U -learns e, as desired. 2 2 3 3 f Whereas we showed in Proposition 9 that the original Hibbard measure is already non-computable, we conjecture that the hyperreal ASPI taxonomy is even harder (in a computability theoretical sense). We mention in passing that there are other ways to approach non-standard analysis (Hrbacek and Katz, 2020), where the free ultra lter dependency is replaced by a dependency on a model of a certain set of axioms. 6.2 The surreal ASPI measure In De nition 43, we had to content ourselves with an ASPI taxonomy rather than an ASPI measure, because there is no canonical way of choosing a preferred bound for a bounded set of hyperreals. In this section, we will consider another number system, the surreal numbers (Conway, 2000; Knuth, 1974; Ehrlich, 2012), in which the hyperreal numbers can be embedded. By embedding the hyperreals within the surreals, our hyperreal solution to Problem 10 yields a surreal solution to Problem 10, via the embedding. Unlike the hyperreal numbers, the surreal numbers do admit a canonical way of choosing preferred bounds for 19 S. Alexander & B. Hibbard bounded sets. Thus, our surreal solution to Problem 10 will yield an ASPI measure, not just a taxonomy. Formally de ning the surreal numbers involves nuances touching the foundations of mathematics, so we will only sketch the de nition here. Suppose we take the following as the guiding principle in creating a number system: For every set L of lower bounds in our number system, and every set R of upper bounds in our number system, with L < R (by which we mean ` < r for all ` 2 L; r 2 R), we want there to exist a canonical number L < x < R (i.e., a canonical number x such that ` < x < r for all ` 2 L; r 2 R). To simplify things, since this guiding principle only requires the existence of these canonical numbers, we can assume no other numbers exist, only such canonical numbers. Thus, every number in the resulting system can, recursively, be identi ed with a pair (L; R) of sets of numbers, L < R. Notationally, we use fLjRg as a name for the canonical number L < fLjRg < R between L and R, whenever L < R are sets of numbers. There are two tricky issues with this idea: 1. A number in this number system we are building might have multiple names, so we cannot simply take the numbers to be their names. Instead, it is necessary to take the numbers to be equivalence classes of names. What should it mean for two names to be equivalent? 2. If fx jx g and fy jy g are (names of ) two numbers x; y (respectively) in this number L R L R system we are building (so x < x and y < y are sets of numbers in said number L R L R system), what does it mean to say x < y? This question seems somewhat circular, because in order for fx jx g to be a valid name in the rst place already requires L R that x < x (i.e., that ` < r whenever ` 2 x ; r 2 x ). L R L R To answer issue 2, we ask ourselves: what does it mean for the canonical number x between x and x to be less than the canonical number y between y and y ? With some thought, L R L R it seems a natural way to (recursively) answer this question is to declare x < y if and only if one of the following holds: 0 0 There is some y 2 y such that x y , or 0 0 There is some x 2 x such that x y. Having answered (2), we can answer (1) by declaring that two names fx jx g and fy jy g L R L R (of x and y respectively) are equivalent if x 6< y and y 6< x. Carrying out the above de nition in full formality is tricky, because the < relation and the equivalence relation have to be de ned by simultaneous recursion in terms of each other. To give the formal de nition here would be beyond the scope of this paper . The equivalence classes of the above equivalence relation are called surreal numbers. It is possible to de ne addition and multiplication on them in such a way that, along with the ordering < already de ned, the surreal numbers satisfy the axioms of an ordered eld. 10. Technically, < and the equivalence relation are not actually relations at all, because their universes are not sets. For this and other reasons, the formal construction of the surreals is usually carried out in trans nitely many stages, such that at any particular stage, the surreals constructed so far form a set. 20 Measuring Intelligence and Growth Rate Example 46 (Examples of surreal numbers) Taking L = R = ; yields the surreal number named f;j;g, usually abbreviated fjg. It can be shown this is the surreal 0 (i.e., the unique surreal additive identity). Let L = f0g, R = ;, where 0 is as above. This yields the surreal named ff0gjg. It can be shown this is the surreal 1 (i.e., the unique multiplicative identity). Let L = f1g, R = ;, where 1 is as above. This yields the surreal number named ff1gjg. It can be shown this is 2 (i.e., 1 + 1). In the same way, one can obtain surreal numbers 3; 4; 5; : : :. Let L = f0; 1; 2; : : :g (as above), R = ;. This yields the surreal named ff0; 1; 2; : : :gjg. This surreal represents the smallest in nite ordinal number, !, considered as a surreal. With 0 and 1 as above, ff0gjf1gg names a surreal strictly between 0 and 1. It can be shown to be (i.e., that its product with 2 is 1). 1 1 1 With 0 and as above, ff0gjf gg names a surreal strictly between 0 and . It can be 2 2 2 shown to be . 1 1 Similarly, one can construct surreals ; ; : : :. 8 16 1 1 1 1 1 1 With 0 and ; ; ; : : : as above, ff0gjf ; ; ; : : :gg names a surreal larger than 0 2 4 8 2 4 8 1 1 1 but smaller than every ; ; ; : : :. This shows that the surreals include in nitesimals. 2 4 8 This particular in nitesimal can be shown to be 1=!, in the sense that when multiplied by !, the result is 1. The reader might object that if we let L be the set of all surreals, then fLjg seems to name a surreal larger than all surreals, which would be absurd. This paradox is avoided because in fact the class of all surreals is not a set, but a proper class . It can be shown that for any free ultra lter U , the hyperreals constructed using U can be embedded into the surreals. This allows us to transform our hyperreal solution to Problem 10 into a surreal solution. Unfortunately, there are many dierent ways to embed the hyperreals into the surreals, none of them canonical, so while our hyperreal solution already depends arbitrarily on a free ultra ler, our surreal solution further depends on an arbitrary embedding. De nition 47 (The surreal solution to Problem 10) Suppose U is a free ultra lter and R is the corresponding hyperreal number system (De nition 39). Let be an embedding of R into the surreals. For any function f : N ! N, the surreal growth rate of f (given by U and ^ ^ ) is (f ) where f is the hyperreal growth rate of f (De nition 41). The payo of doing this is that there is a canonical way to pick a particular bound of a bounded set of surreals, so the surreal solution to Problem 10 provides an ASPI measure, not just an ASPI taxonomy. 11. It can be shown that the ordinal numbers can be embedded in the surreals, and so the non-sethood of the class of surreals is strictly weaker than the non-sethood of the class of ordinals. The latter non-sethood is referred to as the Burali-Forti paradox. 21 S. Alexander & B. Hibbard De nition 48 (The surreal ASPI measure) Let U be a free ultra lter, let R be the corresponding hyperreal number system, and let be an embedding of R into the surreals. Let ( R) be the range of . For every predictor p, the surreal ASPI measure of p (given by U and ) is de ned to be the surreal with name fLjg, where L is the set of all surreal numbers ` 2 ( R) such that the following condition holds: for every evader e, if the surreal growth rate of t is < `, then p U -learns e. Note that in De nition 48, we require ` 2 ( R) because otherwise L would not be a set. 7. Pros and cons of dierent ASPI measures and taxonomies Here are pros and cons of the ASPI measures and taxonomies which arise from dierent solutions to the problem (Problem 10) of measuring the growth rate of functions. The original Hibbard measure (De nition 7), which arises by measuring growth rate by comparing a function with Liu's enumeration (Liu, 1960) of the primitive recursive functions: { Pro: Relatively concrete. { Pro: Measures intelligence using a familiar number system (the natural numbers). { Con: The numbers which the measure outputs are not very meaningful, in that predictor p having a measure of +1 higher than predictor q tells us little about how much more computationally complex the evaders which p learns are, versus the evaders which q learns. { Con: Only distinguishes suciently non-intelligent predictors; all predictors suciently intelligent receive measure 1. Big-O/Big- (De nition 20), in which, rather than directly measuring the intelligence of a predictor, instead, we would talk of a predictor's intelligence being O(f (n)) or (f (n)) for various functions f : N ! N: { Pro: Nearly perfect granularity (slightly coarser than perfect granularity because of the constants C; C in De nition 19). { Pro: Computer scientists already use Big-O/Big- routinely and are comfortable with them. { Con: A non-numerical taxonomy. Intelligence based on a majorization hierarchy such as the standard slow- or fast- growing hierarchy up to (De nitions 29 and 31): { Pro: A numerical measure, albeit less granular than the Big-O/Big- tax- onomies. { Pro: Relatively concrete. { Pro: The numbers which the measure outputs are meaningful, in the sense that the degree to which a predictor p is more intelligent than a predictor q is re ected in the degree to which p's intelligence-measure is larger than q's. 22 Measuring Intelligence and Growth Rate { Con: The numbers which the measure outputs are ordinal numbers, which may be unfamiliar to some users. { Con: Only distinguishes suciently non-intelligent predictors; for any particular majorization hierarchy, all predictors suciently intelligent receive measure 1. Hyperreal intelligence (De nition 43): { Pro: A taxonomy like Big-O/Big-, but with the added bene t that the taxons are numerical. { Pro: Perfect granularity. { Con: Depends on a free ultra lter (free ultra lters exist but cannot be concretely exhibited). Surreal intelligence (De nition 48): { Pro: An actual numerical measure (not just a taxonomy), with perfect granularity. { Con: The numbers which the measure outputs are surreal numbers, which are relatively new and thus unfamiliar, and are dicult to work with in practice. { Con: Depends on both a free ultra lter and also an embedding of the resulting hyperreals into the surreals. 8. Conclusion To summarize: Hibbard (2011) proposed an intelligence measure for predictors in games of adversarial sequence prediction. We argued that Hibbard's idea actually splits into two orthogonal sub-ideas. First: that intelligence can be measured via the growth-rates of the run-times of evaders that a predictor can learn to predict. Second: that such growth-rates can be measured in one speci c way (involving an enumeration of the primitive recursive functions). We argued that there are many other ways to measure growth-rates, and that each method of measuring growth-rates yields a corresponding adversarial sequence prediction intelligence (ASPI) measure or taxonomy. We considered several speci c ways of measuring growth-rates of functions, and exhibited corresponding ASPI measures and taxonomies. The growth-rate-measuring methods which we considered were: Big-O/Big- notation; majorization hierarchies; hyperreal numbers; and surreal numbers. We also discussed how the intelligence of adversarial sequence predictors can be considered as an approximation of the intelligence of idealized AGIs. 23 S. Alexander & B. Hibbard Acknowledgments We acknowledge Bryan Dawson for feedback on Section 6.1. We acknowledge Philip Ehrlich for correcting a mistake. We acknowledge Mikhail Katz and Roman Yampolskiy for providing literature references. We acknowledge the editor and the reviewers for much generous feedback and suggestions. References Alexander, S. A. 2019a. Intelligence via ultra lters: structural properties of some intelligence comparators of deterministic Legg-Hutter agents. Journal of Arti cial General Intelligence 10(1):24{45. Alexander, S. A. 2019b. Measuring the intelligence of an idealized mechanical knowing agent. In CIFMA. Alexander, S. A. 2020a. AGI and the Knight-Darwin Law: why idealized AGI reproduction requires collaboration. In ICAGI. Alexander, S. A. 2020b. The Archimedean trap: Why traditional reinforcement learning will probably not yield AGI. Journal of Arti cial General Intelligence 11(1):70{85. Bostrom, N. 2003. Ethical issues in advanced arti cial intelligence. In Schneider, S., ed., Science ction and philosophy: from time travel to superintelligence. John Wiley and Sons. 277{284. Chaitin, G. 2011. Metaphysics, Metamathematics and Metabiology. In Hector, Z., ed., Randomness through computation. World Scienti c. Conway, J. H. 2000. On Numbers and Games. CRC Press, 2nd edition. Ehrlich, P. 2012. The absolute arithmetic continuum and the uni cation of all numbers great and small. Bulletin of Symbolic Logic 18:1{45. Girard, J.-Y. 1981. -logic, Part 1: Dilators. Annals of Mathematical Logic 21(2-3):75{219. Goldblatt, R. 2012. Lectures on the hyperreals: an introduction to nonstandard analysis. Springer. Good, I. J. 1969. G odel's theorem is a red herring. The British Journal for the Philosophy of Science 19(4):357{358. Hardy, G. H. 1904. A theorem concerning the in nite cardinal numbers. Quarterly Journal of Mathematics 35:87{94. Hibbard, B. 2008. Adversarial sequence prediction. In ICAGI, 399{403. Hibbard, B. 2011. Measuring agent intelligence via hierarchies of environments. In ICAGI, 303{308. 24 Measuring Intelligence and Growth Rate Hrbacek, K., and Katz, M. G. 2020. In nitesimal analysis without the axiom of choice. Preprint. Hutter, M. 2004. Universal arti cial intelligence: Sequential decisions based on algorithmic probability. Springer. Kirman, A. P., and Sondermann, D. 1972. Arrow's theorem, many agents, and invisible dictators. Journal of Economic Theory 5(2):267{277. Knuth, D. E. 1974. Surreal numbers: a mathematical novelette. Addison-Wesley. Knuth, D. E. 1976. Big Omicron and big Omega and big Theta. ACM Sigact News 8(2):18{24. Legg, S. 2006. Is there an elegant universal theory of prediction? In International Conference on Algorithmic Learning Theory, 274{287. Springer. Liu, S.-C. 1960. An enumeration of the primitive recursive functions without repetition. Tohoku Mathematical Journal 12(3):400{402. Robinson, A. 1974. Non-standard analysis. Princeton University Press. Wainer, S., and Buchholz, W. 1987. Provably computable functions and the fast growing hierarchy. In Simpson, S. G., ed., Logic and Combinatorics. AMS. Wainer, S. 1989. Slow growing versus fast growing. The Journal of Symbolic Logic 54(2):608{ Wang, P. 2019. On De ning Arti cial Intelligence. Journal of Arti cial General Intelligence 10(2):1{37. Weiermann, A. 1997. Sometimes slow growing is fast growing. Annals of Pure and Applied Logic 90(1-3):91{99. Weiermann, A. 2002. Slow versus fast growing. Synthese 133:13{29. Yampolskiy, R. V. 2012. AI-complete, AI-hard, or AI-easy{classi cation of problems in AI. In The 23rd Midwest Arti cial Intelligence and Cognitive Science Conference. Yampolskiy, R. V. 2013. Turing test as a de ning feature of AI-completeness. In Arti cial intelligence, evolutionary computing and metaheuristics. Springer. 3{17. Yampolskiy, R. V. 2020. On Controllability of Arti cial Intelligence. Technical report.

Journal

Journal of Artificial General Intelligence – de Gruyter

Published: Jan 1, 2021

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Measuring Intelligence and Growth Rate: Variations on Hibbard’s Intelligence Measure

Measuring Intelligence and Growth Rate: Variations on Hibbard’s Intelligence Measure

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Measuring Intelligence and Growth Rate: Variations on Hibbard’s Intelligence Measure

Measuring Intelligence and Growth Rate: Variations on Hibbard’s Intelligence Measure

References (36)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies