The Kelly Criterion— Maximizing a Gambler's or Investor's Most-Likely Final Amount of Wealth

A Case for Kelly

There is, for example, such a thing as a “listed stock option”, of which there are two types. For present purposes we could say that a stock option is a bet that the price of a particular stock will be either above or below a stated price, the “strike price”, by a given expiration date. If the option pays off when the stock finishes above the strike price then it's a “call” option; “put” options are bearish bets in the other direction. If at expiration the option is a winner, if it finishes “in the money”, it returns 100 times the absolute value of the difference between the strike price and the price of the stock on that day (customarily each option contract pertains to 100 shares of stock, hence the multiplier of 100). The options marketplace sets the price of each option, the amount bet, which is called the “premium”.

Now there are some options that don't expire for, say, two years. And “warrants” are very similar to call options and at issuance they may be set to expire more than a decade hence. But the most liquid options contracts are those that expire within about three months, and most of the action in the options marketplace is with options whose strike prices are not very different from the current market price. That means that the odds of winning at least something or of getting nothing back for the premium on any given bet are usually something like 50-50, most often not outside of, say, 70-30 either way. Given the short times until expiration and those odds, an options “investor” could hypothetically make many, many such bets in a career, each of them posing a very substantial risk of getting back nothing for the premium.

We immediately see the problem. If some night at the casino you want to guarantee that you'll be retiring early you can just put everything on black at the roulette table and let it ride. You'll have lost all in at most a few spins of the wheel. The stock option investor can't very well “let it ride”, put up everything on the option contract each time and hope to survive. So how much should the investor be willing to pay out as premium each time? What fraction of his capital? Well, the famous “Kelly criterion” determines a formula for the optimum size of each bet in a given set of gaming circumstances with the goal being to maximize the growth rate of the accumulated wealth over the long run. The purpose of this article is to explain it.

And to figure out what the Kelly criterion is all about is to understand that if the most-likely rate of growth of an investor's equity is to be maximized then the fraction of his equity that should be risked at any one time is often, especially in the circumstances of retail investors, considerably less than 1.0 . That is not what, say, mutual fund managers typically do. They are usually obliged to remain as close to 100% invested as possible.

To be sure, the same sort of concern about how many eggs to put in a basket arises when simply holding stocks and even with the now-popular approach of investing in Exchange Traded Funds (ETFs). True, it's a much milder concern, particularly with the ETFs because with such securities there is generally no chance of ever finishing absolutely out of the money, of suffering anything like a total loss of the amount put up. But the fact remains that the compounding of returns on investments that are not risk-free does not proceed in quite the same way as the compounding of risk-free investments and that untoward outcomes might be ameliorated by paying some attention to the mathematics of compounding.

In part due to its pure emphasis and applicability only to outcomes after many, many trials, which generally means to long-term outcomes, the Kelly criterion is hardly in use by investment advisors and portfolio managers who allocate money to stocks, bonds, ETFs and the like. Not only are they held accountable for their performance annually, not over the long term, and not only do their clients have limited time horizons, but we'll also see that in order to effectively apply the criterion some statistics on the future performance of the securities must be rather well known in advance. And those statistics are never well known (possibly they are if the game is Blackjack, but not if it's stock market investing). Most such advisors and managers therefore rightfully disregard Kelly's observation entirely. In all, although professionals who conduct many, many transactions through the years may benefit, it's really true that the criterion will not bring you fortune, not if you are a retail investor. However, what the Kelly mathematics has to say about whether or not everyone should always be fully invested, holding essentially nothing in cash, is of practical importance. Please read on!


Kelly's Criterion

It's not too much of a reach to refer to the purchasing of a stock option as a “bet”. Kelly was not a gambler but although he developed his formula while working constructively on information theory for the improvement of electronic communications his published article did in fact demonstrate application to gambling— to parimutuel betting on horse races. So the Kelly criterion has also been applied to gambling and the greatest need for knowing about it is with regard to all such risky endeavors.

The Kelly criterion requires the computation of an “expectation value”. Where a quantity has a set of possible values the expectation value of that quantity is the arithmetic average of those possible outcomes, weighted in proportion to the theoretically-known likelihood of occurrence of each. In general, some particular value might be vastly more likely to occur than any others yet not be so much as close to, let alone equal to, the expectation value. We'll return to this matter of the expectation value not necessarily being representative of anything that you are likely to experience and the built-in Kelly criterion workaround for that later, on page 3 of this article.

Let us consider that an investor begins with a given starting wealth and does nothing else with it but use it to repeatedly assume a position of some size in a given security. For example, the investor could at regular intervals— e.g., every week, month or year— adjust the amount committed to the security with the rest being held as cash. Or, a gambler could repeatedly bet on a particular game of chance. And let us further assume that the individual persists through such a large number of “trials”— that's what the statisticians call them— as to, in effect, fully encounter the entire distribution of possible outcomes for each trial, possibly many times over.

And for each trial we can compute an overall return ratio: the total amount of wealth at the end of the trial divided by the total at the beginning, which definition does not preclude the funds put at risk each time being by choice only a fraction of the available wealth. The Kelly criterion determines the fraction of the wealth at the beginning of each trial that must be committed each time in order to maximize the most likely final wealth amount after many trials. We will see that this maximization amounts to maximizing the expectation value of the logarithm of the final wealth amount (not the final wealth amount)— with the expectation value being taken using the probabilities of occurrence of the theoretically-expected distribution of the return ratios. Again, the rationale is that under the specified imagined circumstance of an ultimately large number of trials the distribution of the return ratios that would thereby hypothetically be realized would thoroughly exhaust and therefore replicate the entire theoretically-expected distribution. “Taking the expectation value” amounts to substituting the latter distribution for the former, the latter being the one that is assumed to be known (though in reality it may be poorly known). In the United States we used to have an idiomatic expression, a colloquialism, for that: “buying the average”. That's what it meant. In academic circles it's called the “law of large numbers” and it's attributed to Gerolamo Cardano.

Yes! The logarithm. The real Kelly purpose is to come as close as possible to simply maximizing the most likely final amount of wealth— should that be what you are determined to do, the attendant risk notwithstanding. In the process the rate of growth of the wealth is naturally also maximized and that is often stated as the Kelly purpose since no investor is going to last for an eternity (though we do really seem to come perilously close to assuming that in order to justify implementing the criterion). It will turn out that the expectation value is representative of the outcome that we are likely to get if it is applied to the logarithm. The Kelly criterion is really a natural thing, no voodoo about it, just math; the logarithms are not there as a contrivance, not there in the guise of a “utility function”, the favorite tool of the welfare economist Paul Samuelson, but are demonstrably mathematically necessary to simply maximize the most likely amount of final wealth. But we're getting ahead of ourselves. The logarithms and the rest of the mathematics are derived on page 3 of this article under “A Bit of the Mathematics”.

I am aware of this article by Samuelson and Merton. The latter had been Samuelson's student and later became a promoter and director of Long-Term Capital Management, which he helped “blow up”. The article is an attack on the use of the Kelly criterion as a potential cornerstone of portfolio management, with the real concern evidently being about guidance for retirement plan portfolios and the like; the authors are not talking about whether or not it could be OK for some venturous hedge fund to commit some portions of the assets of their “accredited investors” to strategies that are in some way modulated by the use of the Kelly criterion. I have only skimmed the article and do not intend to finish reading it, as early on the authors seem to commit to imposing utility functions on investors so as to compel them to assert their own risk tolerance in particular ways. It would not be surprising to find, as these authors do, that merging the Kelly math with their own particular utility function math might produce untoward outcomes. So they do not criticize the Kelly math per se, as it is indisputably correct; they are just concerned about its misapplication and the general impossibility of it being made to comport to their favorite principles of portfolio management. I might have gotten further in the article had I not encountered, on the fourth page, a “thoughtful person” invoked as a component of the argument. But the authors do, commendably, in their second paragraph, admit to the gross failures of “mean-variance” models, which are today known as modern portfolio theory (MPT) and which are still foisted off on investors by many firms and advisors.

William T. Ziemba has responded to Samuelson's various concerns at length here. The article is also generally informative about the use of the Kelly criterion and the list of references is extensive.

With schemes that fully implement the criterion come levels of risk that can be formidable, especially in the early going. That does not refer to the early going of your efforts to understand and correctly apply Kelly. Rather, with Kelly perfectly applied, account equity can dive towards zero before recovering. But you can moderate the risk by being less aggressive and accepting sub-optimal rates of growth. And we'll soon see how the mathematics of Kelly helps us with that decision.

When it comes to developing expectations for real-life circumstances such as actually trading in stocks or stock options, nothing can be done that is very accurate and so great care must be taken to assess whether or not the resultant trading scheme is likely to have any reliability to it at all. You have to do proper backtesting, which happens to be the business of Retail Backtest. “Wheels of Fortune” are depicted on this page. A truly random wheel of fortune game doesn't present any of the complications of securities and the theoretically-expected distribution that we need to know in order to implement the Kelly criterion is printed on its face. That's what we'll consider next.



Mike O'Connor is a physicist who now develops and tests computerized systems for optimizing portfolio performance.

A Poundstone Wheel History

Geometric Mean & Deviation Therefrom
v. Betting Fraction

Click-drag to zoom in; double-click to zoom out;
shift-click-drag (quickly) to pan.

Poundstone's Wheels of Fortune— Click to Spin


Note: The wheels are used with the kind permission of Mr. Poundstone.

(Continued...)

The Book “Fortune's Formula”

It's by William Poundstone and it's about the Kelly criterion and the characters who gave it life. The title is from an article by Edward O. Thorp, a renowned mathematician and hedge fund manager who used the Kelly criterion in both gambling and investing with great success. It's a worthwhile book overall, a lively one, one for which this web article is no substitute (owing in part to its utter failure to reference gangsters and ponies). But I'd have to say that the book isn't quite going to suffice if you want to learn the mathematics of the Kelly criterion so as to be able to apply it to anything: the book is written so as to be readable by the general public; for full understanding integral and differential calculus is needed, albeit mainly just calculus of a single variable. Thorp's articles are the main place to go for the mathematics but you will find an introduction to the math, one that avoids most of the difficulties, on page 3 of this article under “A Bit of the Mathematics”.

This article is related in part to a particularly important section of the book, one in which the Kelly criterion is discussed in relation to three wheels of fortune— “The Trouble with Markowitz” section in Part Three, “Arbitrage”. There's one wheel for each of three penny stocks, each with its own possible outcomes. A spin of a wheel is taken to simulate the outcome of a $1 investment in a penny stock over a holding period of a year.

The wheels are shown on the first page of this article and it may be helpful if you open that page in another tab or window of your browser for access as you read this page. The numbers on each wheel are the possible dollar values of your initial dollar investment in the penny stock at the end of the year. In the book the idea is to see which wheel is the best one, from the point of view of a Kelly investor versus one who does not heed the effects of compounding the returns of risky investments.

Those wheels of fortune fairly cry out for the JavaScript-powered widgets that I have provided. The widgets allow you to spin a wheel of your choice yourself, very rapidly and many times in succession— take that, Vanna White. Of course JavaScript must be at least temporarily enabled on your browser for any of it to work. All of the calculations are done on your own computer.

While preparing the JavaScript I became puzzled by one key paragraph in that section of the book. It is in order to be able to effectively clarify the meaning that I have taken the liberty of re-using the very same wheels that Mr. Poundstone used (actually I have his kind permission). My understanding comes about in part from having actually applied the Kelly mathematics to the given wheels. Here's the paragraph in question:

The worst wheel by the Kelly philosophy is the second. That's because it has a zero as one of its outcomes. With each spin, you risk losing everything. Any long-term “investor” who keeps letting money ride on the second wheel must eventually go bust. The second wheel's geometric mean is zero.

Whether the player repeatedly bets an optimal amount as determined by the Kelly criterion or simply uses the let-it-ride approach, the geometric mean after n spins of the wheel is the positive real number which when multiplied by itself n times produces the ratio of the player's final wealth to his starting wealth. So a geometric mean of zero would mean that the player lost everything; the bigger the geometric mean the better; should a geometric mean of 1.00 ever happen that would mean that in the final analysis there was no change in the player's wealth notwithstanding the ups and downs along the way. If we seek to determine the geometric mean that is characteristic of a particular wheel by experiments on it rather than by reading the numbers that can come up off of the face and and making certain simple theoretical assumptions, then n has to be a very large number in order for the experimentally-determined geometric mean to nearly equal the theoretical value. If furthermore a let-it-ride policy is assumed then the experimentally-determined geometric mean will converge to the value for each wheel that is shown in the book.

The quoted paragraph of the book is basically true. It seems that the author had in mind the usual practice of stock market investors which is in effect to let it ride and is simply saying that if that is to be the policy then the general theory behind the Kelly principle immediately leads to the understanding that wheel #2, with it's let-it-ride geometric mean of zero, should be utterly avoided.

However, when the Kelly criterion is actually employed so as to adopt an optimal bet size the second wheel performs for the Kelly investor about as well as the third, with the return ratios having a decidedly non-zero geometric mean thanks to his having bet only a fraction of his wealth each time. Certainly the second is not an utterly bad wheel notwithstanding zero being one of the outcomes and we could easily make it better than the third by tweaking the non-zero returns upward while its let-it-ride geometric mean would remain zero. It's all because the Kelly criterion compels the investor to not let it ride but to instead hold back some cash each year. In that way the Kelly criterion naturally avoids utter ruin even if sometimes the amount that is bet is entirely lost.


We Spin the Poundstone Wheels

How do we see all of that about the second wheel? When the first page of this article loaded all three graphs on the right above the Poundstone wheels (or at the bottom of that page if your screen is not of sizable width) were initiated using the possible payouts of wheel #2; otherwise you can simply click on the image of any wheel to initiate the graphs with the distribution of that particular wheel. The first graph shows a single possible history of trading using the distribution of the chosen wheel. The “Spin the Same Wheel Again n Times” button does what is says and you should press it numerous times and whenever you please as that will allow you to see how wildly the outcomes can vary from one history to another. The option of a ridiculously long trading period of 300 years is offered that we might get a glimpse of the long-term trend which is otherwise almost indiscernible within the 30-year view due to the volatility of the outcomes and the fact that the frequency of the trials is only once per year. Click-dragging within any of the graphs so as to zoom in is sometimes very helpful; just double-click to zoom back out.

The key thing to understand is the value, in real-world circumstances, of the basic Kelly idea of committing only a fixed fraction of the wealth each time, holding the rest as cash or as a cash-equivalent. However, wheel #1 is rigged as a non-real-world can't-ever-lose wheel and so the best policy for it would be to instead simply borrow all of the money that you could and let it all ride. For it I've simply accepted the fact that the Kelly criterion does not establish a preferred fraction of wealth to commit and I only plot the let-it-ride option, with 100% committed each time.

We'll instead focus on wheels #2 and #3 on which we see numbers less than 1 that present losses. The Kelly approach comes into play only when losses are possible. For such wheels the conjoined second and third charts inform us that we should consider adopting a “Betting Fraction” from the horizontal axes of those charts, “f” in the common notation, that's greater than zero but less than or equal to the f that has the highest “Annual Geometric Mean” as shown on the second chart. Why confine ourselves to that range of f values? Because outside of that range the geometric mean of the return is less while the risk as represented by the standard deviation is greater. The optimal betting fraction that maximizes the geometric mean is usually denoted in Kelly literature by f*.

If you're not following the f business, if f is 0.5 then we keep half of our money as cash and bet the rest. Per se, fixed-fractional betting, always using the same f, was no invention of Kelly; it's old hat. However the basic Kelly idea does employ fixed fractions and there are theorems that support the use of fixed fractions in conjunction with awareness of the Kelly criterion.

Back to our wheels #2 and #3, with other wheels, or stocks, fractions below or above the zero-to-one range of f might be feasible and would respectively represent selling the stock short or borrowing money to buy an excess of it.

Given the “Average Payouts” of the Poundstone wheels, all of which exceed 1.00, none of them would show a long-term profit with short selling. And for wheels #2 and #3 it turns out that boosting your bet with borrowed money would be either ill-advised or catastrophic, but the story could be different with some other wheel such as #1 or even with a wheel that would occasionally present a loss.

Let's look at the second chart in detail, with wheel #2 selected. We see a sort of inverted, lopsided horseshoe curve having a maximum at a betting fraction f of about f*=0.63, which yields a maximum geometric mean of 1.24— it helps to zoom in, even twice if you wish, in order to pick off the utter maximum. So to get the fastest rate of growth of our wealth we would bet 63% of our wealth on wheel #2 each time.

Had you previously understood that there are investments that pay off when investing only a fraction of the funds that you have available but are sure losers if you simply commit nearly 100%? Read on!

Wheel #2 is like that. If we go off to the right, settling on a higher betting fraction f > f*, not only does our geometric mean deteriorate— at about f=0.96 it goes below 1.00 which means that beyond that we would be losing— but the risk would also be increasing as is represented by the “Standard Deviation” on the conjoined chart (which is, more exactly, Euler's number raised to the standard deviation from the logarithm of the geometric mean). And we are absolutely barred from adopting a betting fraction of f=1.0, or higher which would mean investing with borrowed money, because if we ever get so much as fully invested with f=1.00 then that would mean that when the number zero on the wheel comes up it would cause us utter ruin.

Now if we go off to the left of the maximum geometric mean with f < f* then things are qualitatively different. True we also have to settle for a reduced geometric mean, but the risk decreases. So we can pick any risk-return combination that we like, any f that is between zero and up to and including the point of maximum geometric mean f*=0.63, and we should have nothing much to regret. But we might well prefer to get closer to  f* than to zero due to the fact that the mean does not roll off to the left from its maximum as rapidly as the deviation drops. Accepting a sub-optimal betting fraction is called “fractional Kelly”. It may be advisable to have a general policy of always betting only a fixed fraction of the optimal Kelly fraction f*, to systematically quell the risk.

Still on wheel #2, let's examine the top chart, which is based on a single history of a succession of wheel outcomes over either 30 years or 300 years, your choice. On that chart are plotted two wealth histories that share that single wheel-outcome history— one for let-it-ride trading and one for Kelly-optimal-betting-fraction trading. If you spin the wheel several times you'll see that, oddly and rather inappropriately, the red plot for let-it-ride often stops abruptly at some year short of 30. About one out of every six times there's only a dot at the beginning. The cause of that is the fact that with wheel #2 the wealth of the let-it-ride investor often goes to zero but the logarithm of zero is minus infinity which can't be plotted on that chart because the vertical scale is logarithmic. Hence the chart usually fails to show a complete let-it-ride history.

I much prefer to plot the wealth histories on a logarithmic scale to better show that they look somewhat like straight lines, which they should, at least over the 300-year span notwithstanding the volatility. Furthermore, changes of a given percentage are represented by the same vertical distance on a logarithmic chart, anywhere on the chart; not so on a linear scale. But we still need a fix.

So, to get the fix you simply double-click on the label “A History for Poundstone Wheel #2”. The chart will then change because instead of that wheel there is substituted a very similar wheel that only differs from #2 in that where #2 has a payoff of zero the modified wheel has a payoff of 0.01— meaning that if that number comes up you will lose only 99% of what you put up. It's a have-our-cake-and-eat-it-too thing: we get to keep the logarithmic scale but still get to see all of the dismal results of let-it-ride.

And so now the label will say “A History for Modified Poundstone Wheel #2” and the fun of it is to click the “Spin the Same Wheel Again n Times” button numerous times and particularly with the 300-years election. You'll see the dramatic riches-instead-of-rags difference that the Kelly principle can make. Note that it's not that the tiny change from 0 to 0.01 improved the performance using the optimum Kelly fraction; it didn't, not noticeably.

And before we leave wheel #2, we can ask what amount should accumulate from the geometric mean of 1.24 with f at the Kelly optimal value f* that we previously found. The answer should be 1.24300 if we're on the 300-year scale. That comes to roughly 1028 (type “=1.24^300”, sans the quotation marks, in Google's search engine). The other way of writing that would be 1.0e+28. And sure enough, if you hit the spin-again button a number of times on the 300-years scale there are substantial fluctuations but the final value averages roughly that.

We can now quickly go over wheel #3 as it produces qualitatively similar results when used with the optimal Kelly betting fraction, which for it is f*=0.75— only a bit bigger than the optimal fraction for wheel #2. But this time, since there is no chance of losing utterly everything that is put up on a single spin it would be at least possible to use borrowed money— all the way up to about f=1.5, at which point the geometric mean has declined to about 1.00, beyond which there would be losses. But as with wheel #2, fractional Kelly or full Kelly with 0 < f <= f* is the preferred range of bet sizes with nothing beyond f* ever being advisable. And especially note that the top chart confirms that use of the Kelly optimum generally beats let-it-ride and at less risk, with let-it-ride this time showing a profit. That you can easily see with repeated spins of wheel #3 on the 300-years scale.

And finally, if we consult our second chart to see what the geometric means are for f=1.00, the let-it-ride case, for each of the Poundstone wheels, then we see that they all agree with the values that are given in the book.

The book compares the Kelly emphasis on the geometric mean with the reliance of “mean-variance” analysis upon the arithmetic mean, with regard to assessing the relative attractiveness of the wheels. “Modern Portfolio Theory” (MPT) and specifically the “Capital Asset Pricing Model” (CAPM) are theories that are based upon mean-variance analysis. Inasmuch as they involve schemes that use diversification to maximize returns at given levels of risk they are intended to be applied to portfolios and not to single issues, and in a way that is very dependent upon correlations among the price performance histories of the individual issues. But no such correlations exist among the three wheels so that an uncompromised application of mean-variance analysis to them is not possible. Since MPT/CAPM practitioners manage portfolios none would ever plan to put all of the assets into a single security. Hence if there were a single security in one of their portfolios that had the possibility of becoming worthless that would not lead to the ruination of the portfolio. And if a security has a multi-period “average payout” substantially greater than 1, as with wheel #2, then it might actually be reasonable to include such a security in an MPT- or CAPM-managed portfolio in spite of it presenting the possibility of a total loss.

The reasonability would follow from the fact that were the security the likes of wheel #2 or not, then surely only a certain small fraction of the assets would be assigned to it— portfolios are generally policy-limited to a small fixed range of permissible position sizes to guard against the risk of any one issue going belly-up. So the circumstances of any one issue in such a portfolio differ little from what we have called fixed-fractional betting with the use of a very small fraction. The Annual Geometric Mean plot, the second chart on page 1, goes through zero at f = 0 and if you work it out the calculus shows that the slope there is the arithmetic average payout (the “mean” of mean-variance) minus 1— not influenced by the geometric mean at f = 1. Thus any such minimal successive exposures to the risks and rewards of securities that performed like wheel #2, whose average payout exceeds 1, would ultimately be profitable notwithstanding the zero geometric mean at f = 1.

The chief distinction then is that none of the mean-variance models make any allowance whatsoever for the mathematics of the subsequent and inevitable compounding. It's a dimension that they do not incorporate. Doesn't the theory of the Kelly criterion then however suffer in comparison with mean-variance analysis for its neglect of correlations within portfolios? Well, no, not really. For example if there are p non-risk-free issues in a portfolio we could assign “betting fractions” \(\scriptstyle\text{f}_1,\, \text{f}_2\ldots\,\text{f}_p\), one to each security, where \(\scriptstyle\text{f}_1+ \text{f}_2\ldots\,+\,\text{f}_p =\,\)f and with the fraction 1 - f being committed to a risk-free security or cash. And then we could vary the \(\scriptstyle\text{f}_k\) so as to find the values that maximize the logarithm of the final wealth, just as we do for single issues with just one f.

Note that if we were talking about investing in a single security then the let-it-ride mode that we have discussed would actually be the same as “buy and hold” with 100% invested. Does that sound more familiar? Let's now go on to understand how we calculate the dependence of our final wealth upon the betting fraction f.

(Continued...)


A Bit of the Mathematics

If you didn't immediately comprehend the expectation-value-of-the-logarithm business on page 1 of this article... you could be normal. I didn't get it at first either because I found nothing brought together in the Kelly literature that was really properly instructive as to how the logarithm actually comes about. Various authors insist on bring up utility functions and fail to clearly state that you're not entitled to a choice of utility functions, not if you want to maximize your final wealth; it's the logarithm, nothing else. Here I'll try to fully explain the mathematics behind the Kelly criterion because it is, at base, rather simple. And it helps that the wheel-of-fortune setup with which we started is really rather generally applicable, such as to stocks or stock options or even to funds that hold them. Where Mr. Poundstone talked about penny stocks that had six equally-likely outcomes he also pointed out that for realism we could simply add more outcomes and repeat, as he did, the more likely outcomes. I may assert in other publications, that the forward distribution of possible outcomes of such securities or funds should be considered to be variable from trial to trial rather than constant as with the wheels. But although the mathematics for that is only a bit more complicated we forgo it here. It may not be a good idea anyway.


Compounding

By “\(\equiv\)” in the equation immediately below is meant “is defined to be”; \(X_i\) is the wealth of the investor after i spins of the wheel; \(X_0\) is the starting wealth; \(X_n\) is the final wealth if there are n spins in all. The numerator of each fraction is canceled by the denominator of the next fraction, but no numerator can be zero else we must terminate the sequence right then and there with the investor utterly broke.

\begin{aligned} (\text{geometric sample mean})^n&\equiv\frac{X_n}{X_0}\\ &=\frac{X_1}{X_0}\cdot\frac{X_2}{X_1}\ldots\,\cdot\, \frac{X_n}{X_{n-1}} \end{aligned}
\begin{aligned} &(\text{geometric sample mean})^n\\ &\equiv\frac{X_n}{X_0}\\ &=\frac{X_1}{X_0}\cdot\frac{X_2}{X_1}\ldots\quad\cdot\,\frac{X_n}{X_{n-1}} \end{aligned}

To explain the Kelly criterion we won't have to immediately focus on the geometric mean; we're mainly concerned with the composition of the ratio \(\frac{X_n}{X_0}\). We'll get back to it a bit later as it's fairly often mentioned in Kelly literature, such as in the Poundstone book.


Letting it Ride

Given the sequence of payout numbers \(r_1, r_2\ldots , r_n\), the result of n sequential spins of the wheel and representing random choices of the numbers \(R_1, R_2\ldots , R_6\) that are printed on the wheel, then in the equation above with let-it-ride betting we must set \(\frac{X_i}{X_{i-1}}=r_i\,\). If any of the \(r_i\)'s turns up zero then the sequence ends and the investor is broke.

Although the discussion here continues to refer to the wheels-of-fortune examples, the simple mathematics of this page has much broader applicability. We could just as well take those various \(r_i\)'s to be, say, the annual return ratios of some huge fund that contains various kinds of securities— with the task at hand being to decide what fraction of our wealth we should commit to the fund.

Fixed-Fractional Betting

Given the same sequence of payout numbers \(r_1, r_2\ldots , r_n\) from the face of the spun wheel then with fixed-fractional betting we would not compute the same \(\frac{X_i}{X_{i-1}}\) ratios. Instead, if f is the betting fraction then \(\frac{X_i}{X_{i-1}}=1-\text{f} + \text{f}\cdot r_i\).

We see immediately that if f = 1 then the ratios for fractional betting reduce, as they should, to the ratios for let-it-ride betting. But if f < 1 then if any \(r_i\) is zero \(\frac{X_i}{X_{i-1}}\) will simply be 1 - f, which will be greater than zero. In that way the investor can be prevented from ever going entirely broke.

Of course we are not dealing here with any real-world annoyances such as transaction costs, taxes or dividends, much less policies affecting the use of margin that are in effect at brokerages. But we can see that if f is negative then, but for those complications, the expression for \(\frac{X_i}{X_{i-1}}\) would also represent short selling correctly: the first term 1 represents 100% put up to avoid “going on margin” and it returns the starting wealth for the \(i^{\text{th}}\) trial because if f were zero then \(X_i\) would be the starting wealth \(X_{i-1}\); the second term -f is positive and it's what you would get for short-selling the stock (per dollar of starting wealth).

Once again, fixed-fractional betting is not Kelly betting per se. Of course everyone always knew, before Kelly came along, that you could bet only a fraction of your wealth if you wished and avoid sudden utter ruin that way.


The Kelly Optimum Betting Fraction

Here we are actually going to avoid integral and differential calculus and just use some rules involving exponentiation and natural logarithms. So if you have some mathematical inclinations you should be able to follow even if you don't know calculus— we'll just apply the rules.

If \(y\) is a positive number then \(\text{log}(y)\) increases as \(y\) increases but not as fast. In fact it has a downwardly concave appearance when plotted as the vertical coordinate with \(y\) the horizontal coordinate, and that concave aspect is crucial for the fulfillment of the Kelly criterion. The logarithm is defined only for positive \(y\) because the value of the logarithm plunges towards negative infinity as \(y\) approaches zero from above; the logarithm of one is zero; \(\text{log}(y)\) is that power of Euler's number \(e=2.718\ldots\,\) that yields \(y\). So \(y = e^{\text{log}(y)}\).

Now if we have two positive numbers \(y_1\) and \(y_2\) and multiply them together then the logarithm of the product must be the sum of the logarithms of each because after we multiply \(e\) by itself \(\text{log}(y_1)\) times to get \(y_1\) we must multiply the result by \(e\) multiplied by itself an additional \(\text{log}(y_2)\) times in order to form the product. Hence we find that \( \text{log}(y_1\cdot y_2)=\text{log}(y_1) + \text{log}(y_2) \). So the logarithm of a product is just the sum of the logarithms, and that's true for however many terms that form the product.

With that definition and the rule about products we go to work on our first equation above, the one for the all-important ratio of final wealth to starting wealth. We find the following:

\begin{align} \frac{X_n}{X_0}&= e^{ \text{log}\left(\frac{X_n}{X_0}\right) }\\ \text{log}\left(\frac{X_n}{X_0}\right) &= \text{log}\left(\frac{X_1}{X_0}\right)+\text{log}\left(\frac{X_2}{X_1}\right)\ldots +\text{log}\left(\frac{X_n}{X_{n-1}}\right) \end{align}
\begin{align} \frac{X_n}{X_0}&= e^{ \text{log}\left(\frac{X_n}{X_0}\right) }\\ \text{log}\left(\frac{X_n}{X_0}\right) &= \text{log}\left(\frac{X_1}{X_0}\right)+\\ &\quad\quad\text{log}\left(\frac{X_2}{X_1}\right)\ldots\\ &\quad\quad+\text{log}\left(\frac{X_n}{X_{n-1}}\right) \end{align}

We now focus on that expansion, on the sum of logarithms. Each term can take on only one of six values, each based on a random choice of the six \(R_j\)'s from the face of the wheel:

$$\text{log}\left(\frac{X_i}{X_{i-1}}\right)= \text{log}\left(1-\text{f} + \text{f}\cdot R_j\right)$$

And now comes the easy but profound step... how many are there in the expansion representing each of the \(R_j\) values? We know. Oh we don't really know because any and all sequences are possible. But we have a very good idea concerning the likely number of appearances of each \(R_j\) value. That would be \(\frac{n}{6}\) of course, or the integer closest to that number. Yes, since each segment of the wheel is equally likely to be selected and since there are six segments we expect the following approximation to hold:

\begin{align} \text{log}\left(\frac{Xp_n}{X_0}\right) &\approx \text{log}\left(\frac{X_n}{X_0}\right)\\ &\equiv \text{n}\cdot \left[ \frac{1}{6}\cdot\text{log}\left(1-\text{f} + \text{f}\cdot R_1\right) + \frac{1}{6}\cdot\text{log}\left(1-\text{f} + \text{f}\cdot R_2\right)\ldots\\ + \frac{1}{6}\cdot\text{log}\left(1-\text{f} + \text{f}\cdot R_6\right) \right] \end{align}
\begin{align} \text{log}\left(\frac{Xp_n}{X_0}\right) &\approx \text{log}\left(\frac{X_n}{X_0}\right) \end{align}

That would be if for the term on the right we have the following:

\begin{align} \text{log}\left(\frac{X_n}{X_0}\right)&\equiv \frac{\text{n}}{6}\cdot \biggl[\text{log}\left(1-\text{f} + \text{f}\cdot R_1\right)\\ &+\text{log}\left(1-\text{f} + \text{f}\cdot R_2\right)\ldots\\ &+\text{log}\left(1-\text{f} + \text{f}\cdot R_6\right) \biggr] \end{align}

The notation \(\frac{Xp_n}{X_0}\) with the \(p\) added to the \(X\) has been used to indicate the most-probable or highest-likelihood aspect of our estimate and the n has been factored out on the right-hand side. And we recognize the quantity in the square brackets []. It's the expectation value of the log terms, taken over the distribution of the face of the wheel, the “theoretically-expected” distribution.

And if we take the logarithm of both sides of the very first equation at the top of this page then the rule about the logarithm of a product being the sum of the logarithms of each term yields the following:

\begin{aligned} \text{n}\cdot\text{log}(\text{geometric sample mean})=\text{log}\textstyle\left(\frac{X_n}{X_0}\right)\displaystyle\\ \text{n}\cdot\text{log}(\text{geometric mean})=\text{log}\textstyle\left(\frac{Xp_n}{X_0}\right)\displaystyle \end{aligned}

Then comparing that with the equation above it we see that the quantity in square brackets, the expectation value of the logarithm of the return ratio, is also our best estimate of the logarithm of the geometric mean of the distribution of \(\frac{X_n}{X_0}\). We're supposing here that as n goes to infinity the value of \(\text{log}\left(\frac{X_n}{X_0}\right)\) approaches n times the expectation value of those six log terms.

We have to stop right here to celebrate the fact that we're essentially done. We have the answer. We only need to compute and sum the terms inside the square brackets and find the f that yields the maximum value for that sum. That's the Kelly fraction f*.


The Central Limit Theorem

We now need to discuss a most important theorem. We are only using the simple, classical version of it. It provides proper mathematical justification for our substitution of n times the expectation value of the theoretically-expected distribution for the actual sequence. With a finite number of choices to deal with it doesn't matter how the log terms of the theoretically-expected distribution are distributed as numbers. They could be skewed to one side or the other of their average. The theorem says, in part, that the expectation value that we have computed— the sum inside the square brackets which is the logarithm of the geometric mean— when multiplied by n as above is the best estimator of the mean (average), the mode (most probable) and the median (mid-percentile) of the distribution that consists of all of the \(\text{log}\left(\frac{X_n}{X_0}\right)\) values that might actually happen. And the bigger the n, the better the estimate.



We should be clear here that while our results and the theorem only pertain to large-n circumstances and so there is a subtext concerning the greater precision that happens as n is further increased, “the distribution that consists of all of the \(\text{log}\left(\frac{X_n}{X_0}\right)\) values that might actually happen” does not refer to a distribution derived from a series of increasing n values. No. Think of n as being utterly fixed. We spin the wheel n times and record the resultant \(\text{log}\left(\frac{X_n}{X_0}\right)\). Then we spin it again n more times and record another outcome, and again n more times, and again... repeating the n spins many times. It's that distribution of outcomes that we want to know about. We want to know the most likely value for \(\text{log}\left(\frac{X_n}{X_0}\right)\), the value that would turn up most often (and we've already easily succeeded in calculating it).

The equivalence of the mean, mode and median is guaranteed because the theorem also states that in the large-n limit the distribution of the \(\text{log}\left(\frac{X_n}{X_0}\right)\) values becomes the famous “bell curve”— when the probability density, the likelihood of particular outcomes, is plotted against \(\text{log}\left(\frac{X_n}{X_0}\right)\) the shape is that of a bell that is utterly symmetric and centered on the \(\text{log}\left(\frac{Xp_n}{X_0}\right)\) value that we have just computed which is, because of the symmetry, at once the mean, the mode and the median.

But we are not at base interested in the logarithm of our money, are we? No, we want simply the likely outcome regarding our money, not its logarithm. But if we plot of the probability density versus \(\text{log}\left(\frac{X_n}{X_0}\right)\) and it peaks at some particular value \(\text{log}\left(\frac{Xp_n}{X_0}\right)\) then if the probability density is instead plotted against \(\frac{X_n}{X_0}\) it peaks at the following value:

\begin{align} \textstyle\frac{Xp_n}{X_0}\displaystyle&=e^{\text{log}\left(\frac{Xp_n}{X_0}\right)}\\ &=e^{\text{n}\cdot\text{log(geometric mean)}}\\ &=\scriptstyle(\text{geometric mean})^\text{n}\displaystyle \end{align}

In other words, the intervening logarithm function does not affect the underlying value that has the peak probability. Whether we find that value, the mode, by plotting the probabilities versus \(\text{log}\left(\frac{X_n}{X_0}\right)\) and inverting the log function to find \(\frac{Xp_n}{X_0}\) or by plotting the probabilities versus \(\frac{X_n}{X_0}\) and finding \(\frac{Xp_n}{X_0}\) directly that way, the result is the same. To recapitulate, in the limit of large n the expectation value (mean) of the distribution of \(\text{log}\left(\frac{X_n}{X_0}\right)\) is also its mode \(\text{log}\left(\frac{Xp_n}{X_0}\right)\). And \(\frac{Xp_n}{X_0}\) is the mode of the distribution of \(\frac{X_n}{X_0}\) but is not its expectation value (mean).


A Helpful Online Tutorial

As you might imagine, the Nobel laureates who created the first and most notable of the still-popular mean-variance models for asset allocation— they pertain only to one investment period and not to n of them— are not a bunch of morons who never thought about compounding. Rather they generally quite get the point. And that is clearly shown by the tutorial on compounding by William F. Sharpe.

There is no reference in that particular tutorial to the Kelly criterion but note especially that there is support for the idea that the mode of the distribution of the \(\frac{X_n}{X_0}\) values, to which statistic Kelly resorted, the most likely outcome, is more representative than the mean. That is said in the context of Sharpe having shown that the mean of the \(\frac{X_n}{X_0}\) values is substantially greater than the mode— not true with regard to the distribution of the \(\text{log}\left(\frac{X_n}{X_0}\right)\) values, for which, as we have seen, the mean, mode and median are one and the same. So in electing to maximize the modal value and not the mean we are hitching our star to a not particularly high but most likely outcome.

The lack of representativeness of the mean of the \(\frac{X_n}{X_0}\) values resides in the fact that while we imagined a distribution of those values to be the hypothetical result of taking n spins of Fortuna's wheel over and over again, in real life we have no possibility of experiencing more than one sequence of spins. We can “buy the average” of the \(\text{log}\left(\frac{X_i}{X_{i-1}}\right)\) values because they accumulate to the mode of the distribution of final values of the logarithm, the most likely value.


Kelly on Kelly

It's simple to say what Kelly did but you may prefer to read his article. Unless you're already profoundly committed to horse racing you'll find the section “The Gambler With a Private Wire” to be of first importance because it does not involve the more complicated case of parimutuel betting. (Where he wrote “Gmax = 1 + ...” he meant “Gmax = log(2) + ...”.)


So Where is My Options Bet Size Calculator?

And now we come to the disappointing reality concerning the Kelly criterion and the resultant Kelly formulae. It's this: in order to implement the criterion with good effect you have to know in advance the odds of the various outcomes— fairly accurately. And of course you basically never know the odds accurately. In theory, call options should have favorable odds over the long term, because the stock market goes up over the long term; puts are generally losers. But during bear markets the circumstances should be reversed. And in order to properly make use of the Kelly criterion it would seem that you would need to accurately know the odds for each trial, not just over the long run, and make many, many trades in order to realize the promised maximum rate of growth because it is, as we have seen, only to be had in the very long run.

It could be argued that it is specious to suggest that it's necessary to assume different odds for each set of market conditions, that it would all come out in the wash over the long term if only the overall long-term tendencies are assumed irrespective of current market conditions. But that approach would yield particularly steep drawdowns during adverse markets that should be avoided if at all possible.

But allocating assets to options alone would not be anyone's idea of a forgiving investment plan because of their extreme volatility. And furthermore Thorp himself has listed pitfalls of Kelly investing and one of them is the aforementioned [cf. page 1 of this article] potentially extreme volatility in the early going, especially with the use of the full-Kelly fraction f*— even with securities that can only decline in value rather than go to zero as with options. That happens because of the Kelly focus on long-term results. You can confirm that this unfortunate aspect of Kelly exists if you click the “Spin the Same Wheel Again...” button on page 1 of this article, say, ten times using the 300-years option with wheel #2 or #3. You will see some horrible instances of collapses in value in the early years.

But there is that splendid side of the Kelly criterion which is that it shows that you can overbet and face ruin even though the game or investment actually offers favorable odds and payoffs. And, better yet, it also tells you where to draw the line. We have seen that clearly with Poundstone wheels #2 and #3, for which a too-high betting fraction produces certain eventual ruin even if no outcome produces a total loss of the amount put up. Surely that could help options traders mind their wallets.

Speaking of securities that can at worst only decline in value and the prospects for using the Kelly criterion when investing in them, one of the problems is the fact that once in a great while they do practically go to zero— if it's a stock then it can be hit by a financial scandal or a major lawsuit or regulatory action. And the problem there is that any such fat-tail-of-the-distribution type of event has a probability that we can only take a wild guess at. But options? Ahh... they are different. They go to zero frequently— they are nibbled to death by minor adversities— at price points that are smack dab in the well-traveled middle of the price probability distribution of the underlying security, and so we have a pretty good idea of the likelihood of an option going to zero. Also, the famous Black-Scholes formula for calculating options premiums was originally derived in two different ways, one of which involved showing consistency with the prevailing mean-variance model. Since that approach neglects the effects of compounding whereas the Kelly criterion is all about that it should be potentially helpful to combine Kelly sizing, fractional Kelly, with a scheme for trading options that determines which option to buy and when.

There may eventually appear on this site a program involving equity options— probably one involving options on the most liquid ETFs.

Flash: There is now yet more on the Kelly criterion, on this page of another article on this website under the subtitle “Applying the Kelly Criterion”.

— Mike O'Connor

Comments or Questions: write to Mike. Your comment will not be made public unless you give permission. Corrections are appreciated.

Update Frequency: Infrequent, as this article is about the principle of the Kelly criterion and not about the current state of the market.