Datagolf

Introduction

The idea for this project blossomed from a tweet by @DallasAptGP regarding golf betting.

Like an itch that just must be scratched, I attempt to recreate the system in R below.

The system is simple but effective

  • Find an edge Use the data golf model to find odds discrepancies for each golfer to win, finish top_5, top_10, top_20 or make the cut.
  • Set Your Starting Bankroll Determine how much you are willing to lose.
  • Calculate the Kelly Percentage Per Bet Taking each edge into account, apply the kelly percentage to each prospective bet.
  • Apply the Kelly Fraction to Determine the Amount to Bet This is defined in the tweet as

Amount to Bet = (((((A/100)B)-1)/((A/100)-1))C)D)

  • A - Sportsbook Odds (American)
  • B - Datagolf Probability %
  • C - Your Bankroll
  • D - Kelly Fraction

Goals

The goal for this project is to create the following functions

  • Historical Analysis Using the datagolf.com historical odds API and the historical model predictions API, evaluate the profitability of the system.
  • On-Demand Identify profitable opportunities for an upcoming or ongoing tournament, and create an automated report for future tournaments and opportunities to run daily.

Note: to have access to the datagolf API you must be a ScratchPlus Member. While I have a membership, I am not affiliated with datagolf.com nor do I receive compensation from the links in this project. Also, nothing contained herein should be considered gambling or financial advice. Use your own judgment and don’t bet what you can’t lose.

Overview of the Kelly Criterion

At it’s core, the Kelly criterion is a method to determine how much to bet based on your perceived edge, the odds offered by the sportsbook and your bankroll. “Kelly criterion is a mathematical formula for bet sizing, which is frequently used by investors and gamblers to decide how much money they should allocate to each investment or bet through a predetermined fraction of assets.” Link.

Kelly defines the “optimal size of a series of bets in order to maximize wealth. It is often described as optimizing the logarithm of wealth, and will do better than any other strategy in the long run.” Link

The Kelly formula can be defined multiple ways depending on the context.

Kelly Formula - General Terms

Kelly % = p - q/r

  • p = Probability of winning
  • q = Probability of losing = 1 - p
  • r = Profit on win, as a percentage of the placed wager or investment

Kelly Formula - Sports Betting Terms

Kelly % = (bp - q)/b

  • K % = The fraction of the bankroll to bet
  • b = decimal odds - 1
  • p = The calculated probability of winning bet
  • q = The probability of losing, which is 1 – p

For example, assume a bankroll of $1000. Datagolf predicts Golfer A has a 21.7% implied probability of making the top 20 while a popular book has an implied probability of 20% (+400 American Odds).

  • b = 20% = 5 decimal odds - 1 = 4
  • p = .217
  • q = 1 - .217 = .783

(4 * .217) - .783 / 4 = .0212

The Kelly Fraction dictates you should bet 2.12% of $1,000 or $21.20.

Kelly Formula - Edge or Expected Value Terms

K % = Edge – 1 / Odds – 1

  • Edge = Using decimal odds, the bookmaker probability divided by the actual probability
  • Odds = Using decimal odds, the odds offered by the bookmaker

Referring to the example above

  • Edge (in decimal odds) = 5 (bookmaker odds) / 4.61 (datagolf model predicted odds)
  • Odds (in decimal odds) = 5 (bookmaker)
  • K% = (5/4.61 - 1) / (5 - 1) = 2.11%

Be aware that edge or expected value is usually calculated using implied probabilities while Kelly is calculated using decimal odds in this format. For example, the model may predict and implied probability of 4.45% and the odds are offered at 3.84%. Edge would generally be calculated as Prediction (Percentage) /Odds (Percentage) - 1. Be aware of the format and be consistent throughout. See Revisiting the Kelly Criterion Part 1: A risk assessment.

Identical Sequential Bets

The Kelly criterion optimizes the expected return on a series of identical, single sequential bets. Link. Or stated otherwise, Kelly optimizes your bankroll when placing a single bet an infinite number of times. Built into the equation is the assumption that you will use the outcome of bet no. 1 to calculate your bankroll and Kelly Fraction prior to placing bet no. 2.

Using Kelly for an unknown number of future multiple, simultaneous and independent bets presents a different challenge.

For example, making a series of 7 simultaneous bets with a positive edge creates a very different Kelly Fraction when compared with 7 sequential bets. The table below is taken from the excellent website VegaPit which offers Rust code to numerically solve for simultaneous Kelly fractions vs sequential Kelly fractions. As shown below, making 7 simultaneous bets with a probability of 90% and a profit-loss ratio of 0.2, you should only be betting 5.58% vs sequential Kelly indicating a 40% stake.

Credit to VegaPit.com

In the context of betting Golf tournaments with datagolf.com model data, the total number of bets, your calculated edge, the profit-to-loss ratio and the fact that not all bets are independent of each other are all factors that technically affect the optimal wager size. For example, whether Golfers A, B, C and D all finish in the top-10 is not independent of whether Golfers A-D also finish in the top-20.

The VegaPit article references a paper titled Algorithms for optimal allocation of bets on many simultaneous events, which concluded

For a large number of bets, the optimal wagers tend to be proportional to the “probabilistic edge” of each bet

Probabilistic edge in this context can be defined as follows:

Probabilistic Edge = p - (1/(b + 1))

  • p = probability of winning
  • b = net fractional odds or profit-to-loss ratio

Without getting too into the weeds, this formula will get you close enough to the correct Kelly fraction. To further avoid risk of ruin, an even better approach is to use a half- or quarter-Kelly approach, meaning that you bet 25-50% of the full Kelly fraction.

Historical Analysis

The datagolf API offers several different endpoints to obtain historical data. As pointed out above, API access is only available to Scratch Plus members.

Download Data

API Endpoints Overview

The process for API access of historical odds and prediction data is as follows:

Datagolf descriptions of API endpoints of interest:

  • Historical Raw Data Event IDs - Returns the list of tournaments (and corresponding IDs) that are available through the historical raw data API endpoint. Use this endpoint to fill the event_id and year query parameters in the Round Scoring & Strokes Gained endpoint.
  • Historical Odds Data Event IDs - Returns the list of tournaments (and corresponding IDs) that are available through the historical odds/predictions endpoints. Use this endpoint to fill the event_id and year query parameters in the Archived Predictions, Historical Outrights, and Historical Matchups & 3-Balls endpoints.
  • Historical Outright - Returns opening and closing lines in various markets (win, top 5, make cut, etc.) at 11 sportsbooks. Bet outcomes also included.
  • Pre-Tournament Predictions Archive - Historical archive of our PGA Tour pre-tournament predictions.

Download event_id from API

[[1]]
[1] 0

[[2]]
[1] 0

Download Odds Data from Historical Outright API

Datagolf does a lot of the heavy lifting for you, including providing historical odds coverage for multiple sportsbooks for outright and matchup odds.

By setting the event_id to “all”, it will return all events for a given year by

  • market/finish position (ie win, top 10 finish), and
  • sportsbook.

The code below downloads the outright data for all available books and market/finish positions and saves it in csv format in the /data/outright folder.

Download Prediction Data from Pre-Tournament Predictions Archive API

The next step is to download historical pre-tournament prediction data from the Pre-Tournament Predictions Archive API. This corresponds to the Pre-Tournament Predictions page offered for upcoming PGA tournaments.

This is where the event_id is required.

Import Data and Merge

The next step is to read in the csv files that were downloaded in the previous step and then merge them together. The merge is completed based on the odds offered (for all sportsbooks) by the specific market. So in the end, we will have the odds and pre-tournament predictions for each of the following:

  • top 10 finishes (“top_10”)
  • top 20 finishes (“top_20”)
  • top 5 finishes (“top_5”)
  • win the tournament (“win”)
  • make the cut (“make_cut”)
  • miss the cut (“mc”)

Basic Analysis of the Predictive Models and Outcomes

The final step is to analyze and visualize the historical data. Recall that we have data from hundreds of tournaments across multiple sportsbooks. Datagolf does a great job of showing actual betting results for 2021, 2020 and 2019. They shut the page down in 2022, along with the DG Betting Blog “Three Off the Tee”. I don’t know why they close shop on these programs, but it is most likely because of the time and effort involved in creating them and the fact that betting golf has so many options to bet on that it likely did not appeal to a large enough audience to warrant the amount of time spent creating it.

The FAQs does a good job of explaining their [bet selection process](https://datagolf.com/frequently-asked-questions#betting-results:

All bets are placed through Bet365, so the first criteria is that the bet is offered there. For each bet type (matchups, 3-balls, Top 20s, etc.) there is an expected value threshold that must be met to place the bet. The specific value of these thresholds have tended to evolve over time. The longer are the odds, the higher the threshold is. Currently (updated Dec 1, 2021), at least a 5% edge (that is, an expected value of 0.05 on a 1-unit bet) is required to take a matchup bet, 7% for a 3-ball bet, and somewhere in the range of 8-20% for Top 20s, Top 5s, and Outrights. The purpose of imposing a threshold is to ensure that you are in fact placing positive expected value bets; our model is not perfect, so when the model says expected value is 5%, the ‘true’ value is probably closer to 0. We also do not place 3-ball or matchup bets if we have very little data on any of the players involved (cutoff is around 50 rounds). We do this because our predictions for low-data players have much more uncertainty around them.

They also mention Kelly when determining the number of units to bet:

We use a scaled-down version of the Kelly Criterion. The Kelly staking strategy tells you how much of your bankroll to wager, and is an increasing function of your percieved edge (i.e. how much greater your estimated win probability is than the implied odds) and a decreasing function of the odds (i.e. longer odds translates to smaller bet sizes, all else equal). Importantly, the Kelly is designed for sequential bets; i.e. your first bet is resolved before you placed your second, and so on. However, in golf betting we will often have many simultaneously active bets. We don’t have a fully worked solution to this, but sometimes we will lower the Kelly fraction if there are already a lot of units in play. This is one reason you won’t be able to find a consistent Kelly fraction when analyzing our wagers; the second reason is that we vary the Kelly fraction by bet type and have also varied it over time as our (poorly-formed) betting strategy has evolved.

Parameters to Consider

Evaluating the performance of the datagolf.com predictions - even prior to applying the Kelly fraction - is a challenge in and of itself because of the large amount of parameters that you can filter and sort by. These include:

  • Edge Percentage is the probability of a given outcome (top_5, make the cut etc) predicted by datagolf.com divided by the odds offered by the sportsbook minus 1.
  • Sportsbook as of this writing, datagolf has historical odds offered by 12 different sportsbooks. Each sportbook offers different opening and closing odds for each player. But not all sportsbook offer odds on each betting market - top_5, top_20 etc.
  • Year data only goes back to 2019, but the year also can be used to group and filter the data.
  • Datagolf Model the historic data also contains two different models: (1) the newer baseline_history_fit predictive model that includes course specific adjustments like a golfer’s course history, course fit, and also course-specific residual variance, and (2) the original baseline model that does not incorporate historical adjustments but instead “baseline skill estimates are obtained by equally weighting golfers’ historical performance across all courses (but the weighting is not equal over time – recent results are weighted more)”. More information on the baseline_history_fit model can be found here and the baseline model can be found here. As noted in the FAQs, a good indicator of model accuracy to consider is when both models have similar predictions on the same bet. This last part will be measured below.

Outside of the betting Market itself and the four main parameters listed above, you can also group the predictions and subsequent outcomes by Tournament, Player and even change in opening and closing odds. For example, how did Rory do in the Valero Texas Open? What about Fred Funk when the opening to closing odds on a top_20 finish moved against him by more than 20%?

Measuring Model Performance

The other important question to consider after which paramaters to group the predictive model outomes is how to measure the performance of each Datagolf Model? Looking to the published betting results page yields one method.

  • Number of Bets the total number of bets placed during a given time period.
  • Units Bet a common betting term used to distinguish a fixed value or fixed percentage of your starting bankroll. For example, a unit could represent a $10 bet or a $1000 bet. The key in doing historical analysis is to keep the unit size the same across all analyses. Using decimal odds helps solve this problem. Decimal odds represent the potential return on the bet including the original stake amount. This translates very well to units; a bet with decimal odds of 1.8 means that a win will return the original 1 unit plus a profit of .8. A $10 bet would return a total of the original $10 stake plus $8 in profit while a $200 bet would return $200 plus $160 in profit. Either way, the unit amount stays the same.
  • Expected Value is the amount the bettor can hope to win over a large number of bets based on the odds offered by the bookmaker and the edge indicated in the predictive model. As noted in the FAQs: “We show the expected value from making a 1-unit bet (a unit can be anything you want: $1, $50, etc). If expected value is 0.12, this means you can expect to profit 0.12 units on that specific bet. Of course you will either win or lose that bet, but if you make many bets with an expected value of 0.12, then on average your profit should be 0.12 units per bet.”
  • Dead-Heat Rules Note that both the expected value calculation and the historical odds and predictive model data from the API consider dead-heat outcomes. A dead-heat rule specifies the payout in the event of a tie between golfers.
  • Profit means the amount won or lost in Units for a given bet, tournament or year.
  • Return on Investment (ROI) is the profit in Units divided by the amount bet in Units.

Analysis #1: Baseline Model - All Bets Regardless of Edge

Evaluating predictive models usually involve determining whether the model is accurate. Accuracy is generally defined as whether or not the model minimizes the difference between the observed values and the predicted values. A smaller difference generally means a more accurate model. This generalized definition is not really what we are after with the datagolf.com predictions. Instead, you want to know whether the model can consistently identify opportunities where the odds offered by the bookmaker differ in the bettor’s favor based on the predictive model. This is edge; in mathematical terms, it’s the percentage the prediction favors the bettor over the sportsbook.

For example, consider a sportsbook that offers +500 odds but the datagolf.com model predicts fair odds of +400. American odds of +500 convert to an implied probability of 16.67% (1/100/600) meaning decimal odds of 6.0 (500/100 + 1 or simply 1/implied probability). The datagolf.com model implied probability of 20% with decimal odds of 5. Think of it this way - the sportsbook is going to “pay” you like something only happens 1 out of 6 times when in “reality” that thing actually happens 1 in 5 times. It’s like betting a 3 on a roll of the dice, and realizing the six-sided dice actually has two “3s” on it without the casino knowing. Over a large number of rolls, you would clean up.

But the challenge is determining if the model has an edge. The first step is to analyze the data for the baseline model and determine the profit and loss regardless of whether the model predicted an edge.

The table below examines the baseline model and groups the data by market (top_5 finish, win etc.) and then looks at the mean, standard deviation and total for the win/loss amount at the opening and closing odds. In other words, assuming you bet 1-unit based on all baseline predictions across all 12 books and tournaments since 2019, then what is the average (mean), sd and total amount you would have won or lost at both the opening odds and the closing odds?

It’s important to note that the numbers in this section assume that you take ALL bets regardless of whether the model(s) predict you have an edge or not.

Assuming all of the above is true, taking all bets regardless of edge when the baseline model offered a prediction, you would have been a net loser on the opening odds for top_5, top_10, top_20 and win bets, and a winner on make_cut and mc (miss_cut) bets.

Analysis #2: Baseline Historical Fit Model - All Bets Regardless of Edge

The table below examines the baseline_hist_fit model and groups the data by market (top_5 finish, win etc.) and then looks at the mean, standard deviation and total for the win/loss amount at the opening and closing odds. In other terms, if the model predicted an outcome for a game, what was the outcome regardless of predicted edge?

Similar to the baseline model using opening odds, you would have been a net loser on the opening odds using the baseline_hist_fit model for bets in the top_5, top_10, top_20 and win bets, and a winner on make_cut and mc bets.

Without drilling down any further, taking every bet where the baseline of the baseline_hist_fit model offered a prediction regardless of whether there was an edge would not be profitable endeavor.

Analysis #3: Baseline Model - Predicted Edge Greater than Zero

The next question to consider is whether the baseline model is profitable when there is a predicted edge? We can then compare this with the baseline model in Analysis #1 (all bets regardless of edge) to determine if there is an improvement.

The table below examines the performance of the baseline model where the edge is greater than zero for the opening line.

Compare this to Analysis #1 that took all bets, regardless of edge. The table and graph below show the difference in mean profit by taking all bets when edge was greater than zero for the opening line.

The baseline model performance showed a significant improvement, with a mean improvement of 0.24 units per bet across all markets. Bottom line, when the baseline model predicted an edge, the mean profit improved across all markets.

Analysis #4: Baseline Historical Fit Model - Predicted Edge Greater than Zero

I will repeat the process in Analysis #3 with the baseline_hist_fit model to determine whether it is profitable when the predicted edge is greater than zero. We can then compare this with Analysis #2 (all bets regardless of edge) to determine if there is an improvement.

The table below examines the performance of the baseline_hist_fit model where the edge is greater than zero for the opening line.

Compare this to Analysis #4 that took all bets, regardless of edge. The table and graph below show the difference in mean profit by taking all bets for the baseline_hist_fit model when edge was greater than zero for the opening line.

The baseline_hist_fit model performance showed a significant improvement, with a mean improvement of 0.27 units across all markets.

Analysis #5: Comparison Between Models

One final note before we move on to more complex analysis is to compare the performance between the baseline and baseline_hist_fit models.

You can get very in-depth comparing models and the accuracy of one versus another. See Introduction to model comparisons for a good introduction to the concept. However, I won’t spend much time on it.

It is clear that the baseline_hist_fit slightly outperforms the baseline model on average, although it is important to note that the baseline model has a longer history of usage. The performance by the baseline_hist_fit model could just be attributed to a smaller sample size bias. Regardless, moving forward I will focus on strategies that only use the baseline_hist_fit model.

Advanced Analysis

The next step is to add advanced analysis and performance techniques to help identify profitable strategies to apply to the predictions. I will also include Kelly fractions (quarter, half and full) and implement Monte Carlo methods to introduce randomness and test the robustness of the system.

The “advanced” analysis below will attempt to add to the performance measures outlined above, along with the following:

  • Maximum Drawdown is “an indicator of the risk of a portfolio chosen based on a certain strategy. It measures the largest single drop from peak to bottom in the value of a portfolio before a new peak is achieved.” See Max Drawdown Definition - Link.
  • Risk of Ruin “is a concept in gambling, insurance, and finance relating to the likelihood of losing all one’s investment capital or extinguishing one’s bankroll below the minimum for further play.” See Risk of Ruin - Wikipedia - Link.
  • Sharpe Ratio commonly used in finance, “in sports betting the Sharpe Ratio is a measure of how much a trader is rewarded for any particular system they devise. The ratio is given by the excess returns divided by standard deviation (aka volatility).” Sharpe Ratio for Betting. In basic terms, it “measures how well the return of an asset compensates the investor for the risk taken.”

Two important notes about the rest of the analysis:

  • As noted above, I will only be using the baseline_hist_fit model, and
  • I will only be using the opening line when evaluating the performance (unless otherwise noted).

Analysis #6: Does a Higher Predicted Edge Mean Better Results?

The next logical question is whether taking bets that have a higher predicted edge percentage result in a higher average profit?

The short answer is not really.

The table below shows an in-depth performance summary grouped by market comparing the performance of taking all bets that have a positive edge.

Note that the Sharpe Ratio for all markets are all below 1 on average - which is not encouraging. “A Sharpe Ratio of 1 means that the return is equal to the risk being taken and less than 1 means that more risk is being taken than the reward in return. The higher the Sharpe Ratio then the lower the deviation of returns compared with the average.” Sharpe Ratio for Betting - Link.

Digging a little deeper, what if you analyzed positive edge bets grouped into edge percentage percentiles and then examined the performance. Intuitively, a higher edge percentage (where edge percentage quantifies that the datagolf model predicts that the outcome is more likely than the book) should result in a higher win percentage and a higher Sharpe Ratio. For example, an edge percentage of 50% should be more likely than a bet with an edge percentage of 10%. The table below shows the edge percentages divided into 10 buckets by market. For example, a percentile of 6 means that the percent edge for a given market is higher than 60% of all other bets in the same markets.

You can see the comparison of Winning Percentages by edge percentage is relatively flat across all percentiles. Meaning that there doesn’t seem to be any improvement in the outcome when the edge is higher compared to other bets in the same market.

Sharpe ratio does improve slightly the higher the edge percentage is. This is encouraging as the edge seems to take into account higher reward plays.

A cursory examination of the graphs and tables in Analysis #6 shows that there was no significant performance improvement across most of the markets when ranking edge percentage by percentiles and seeking improvement in winning percentage. There was a small improvement in Sharpe Ratio - a higher predicted edge percentage resulted in a higher

Analysis #7: Odds Change Improve Performance?

“Odds movement in sports betting refers to the shifts in the lines offered by bookmakers for specific teams or players.” How Does Betting Odds Movement Work? - Link For example, a player may open at decimal odds of 0.4 and close at 0.44. What does this mean? Generally, it is because more people are betting on that player to win, and the oddsmaker needs to adjust the line to keep the book balanced (equal action on both sides of the bet). Injuries to another player (ie, the tournament is less competitive), weather and other reasons may move the line.

The majority of odds movement in the historical database show that the implied probability of the event increased. When implied probability increases decimal odds decrease, indicating a lesser payout at the close compared to the opening line. This is a little confusing so an example is order. Assume you bet an opening line with an implied probability of 0.5 or decimal odds of 1/0.5 = 2.0. The line closed at 0.55 or decimal odds of 1/0.55 = 1.82. You make less on the bet at the close than you would the bet at the open.

One thing that is apparent is that when there was a change in odds by the bookmaker, it usually resulted in a favorable outcome using the datagolf predictions and the opening line. Stated otherwise, if datagolf predicts an edge at the open and the bookmaker later moves the line against you, it is an indicator that the bet is a good one.

The table above is misleading; you couldn’t implement the “strategy” if you wanted to. It comes down to a timing issue. You can’t bet the opening line knowing that the closing line will be less favorable. The only thing you can do with if you see a line move after you enter at the open is to either bet more at the original opening odds if they are available at another book or sit tight with your bet with the knowledge that you are (likely) in a better position before the line movement.

Applying the Kelly Fraction and Other Bankroll Management

The last step is to analyze the historical datagolf predictions by applying the Kelly fraction and other bankroll management techniques to determine model performance.

Implicit in the analysis up to this point is that we are only risking 1 unit at a time regardless of the datagolf model prediction. For example, if my starting bankroll was $1,000 and I divided it into 100 individual bets, each unit would equal $10. If the edge percentage was 10% or 1%, I would make the same $10 bet. Additionally, the analysis was focused on the model as a whole regardless of when the tournament occurred. That means that the analysis to this point considered multiple bets from multiple books on the same event.

Recall that Kelly maximizes the growth of your bankroll for a series of identical sequential bets. A simple example of this principle is someone who is a card counter at blackjack. The player bets a single hand at a time, evaluating the predicted edge and Kelly fraction independent of past hands. The bankroll after the outcome of Hand #1 is the amount the player uses to calculate the predicted edge and Kelly fraction for Hand #2 and so on.

Golf betting is different. You generally have to evaluate multiple simultaneous bets across multiple markets. As mentioned earlier, this should theoretically affect the Kelly fraction since golf betting involves multiple bets all at once for a given tournament. For example, if the datagolf model for an upcoming tournament predicts multiple advantageous bets for several players across multiple markets (make_cut, top_20 etc), you have to decide how much of your bankroll bet on each player and market.

There are other real-world complications that will affect your betting and Kelly fraction calculation.

  • Model and Odds Release Timing - Most pga tournaments occur over a four-day period from Thursday through Sunday. Odds are posted by the bookmakers prior to the tournament - generally late Sunday or early the Monday morning prior to the tournament. When Are PGA Tour Odds Released Each Week? - Link. For Datagolf predictions, “the initial release times of paywalled content for PGA and DP World Tour events will be listed at the top of the homepage on Monday. The typical release time is 130pm ET.” Datagolf.com FAQs - Link. This means that all bets where an edge exists should be placed as soon as possible after the datagolf predictions are published.
  • Bookmakers - All bookmakers don’t offer odds on every event and every market. Further, you may not have the ability to legally open an account at all of the bookmakers that datagolf covers. An edge may also exist for multiple bookmakers for the same player and market. Thus, a decision must be made whether you will take a positive edge bet at multiple bookmakers that involves the same player and market.

Bet Ranking by Tournament

A few bet ranking approaches (which are analyzed below) to these challenges:

  • Take X Number of Best Bets - the basic approach is to only take a certain number of positive edge bets based on the model. To identify the “best bets”, this approach ranks the edge and Kelly percentage across all markets and players. For example, for each tournament, you might divide your bankroll among the top 20 bets regardless of the book or market by the indicated Kelly fraction.
  • Take X Number of Best Bets - Single Market - only take a set number of bets based on positive edge ranking for a single individual market. For example, only the top 20 bets in the top_10 market and no other bets in any other market.
  • Take X Number of Best Bets - By Markets Ranked - only take a set number of bets based on positive edge ranking in order of markets ranked. For example, you would take the top 20 bets based on pre-determined order of markets. Historically, make_cut and mc bets have outperformed top_10 bets. If the historical ranked profitability order of the markets was mc, make_cut, top_20, top_10, win, you would analyze positive edge bets in the mc, and if there were less than 20 total bets, you would look at make_cut bets and so on until you had a total of 20 bets.
  • Take X Number of Best Bets - Single Bookmaker - only take a set number of bets based on positive edge ranking for a single book. For example, only the top 20 bets offered by DraftKings. This might come into play if you have limited access to individual books.

Bankroll Allocation

Once the approach to bet ranking is established, the final consideration is how to allocate the bankroll. Fixed unit betting helps to ensure that you limit risk and do not bet more than your starting bankroll. For example, assume you start with a 100-unit bankroll and employ a bet ranking system of taking the Top 20 bets across all markets and books. Initially, you would bet 20 units. For the next tournament, you would bet the same 20 units. Note that under a strict definition of “fixed unit”. the dollar amount would not change for each subsequent tournament; if a unit is $10 it should remain $10 in future bets.

Employing the Kelly fraction can cause complications. One issue that arises is when the sum of the Kelly fraction for all positive edge bets is greater than the total bankroll.

Consider the Top 20 bets by Kelly Fraction across all markets and books for the 2023 PGA Champtionship.

market player.name edge_pct kelly
top_20 Scheffler, Scottie 68.96 47.56
top_10 Scheffler, Scottie 96.98 34.64
top_20 Rahm, Jon 50.08 34.54
top_20 Schauffele, Xander 57.88 30.46
top_20 Cantlay, Patrick 51.89 25.94
top_20 Finau, Tony 52.28 23.76
top_10 Rahm, Jon 66.17 23.63
top_5 Scheffler, Scottie 99.8 22.18
make_cut Scheffler, Scottie 2.65 22.07
make_cut McCarthy, Denny 13.83 20.74
make_cut Scheffler, Scottie 2.43 20.68
make_cut Finau, Tony 4.98 19.93
make_cut Cantlay, Patrick 4.35 19.56
top_10 Schauffele, Xander 72.14 19.24
top_20 McIlroy, Rory 31.34 18.44
make_cut Schauffele, Xander 3.66 18.31
make_cut Fowler, Rickie 7.9 17.77
make_cut Henley, Russell 9.44 17.74
make_cut Scheffler, Scottie 1.83 16.5
make_cut Scheffler, Scottie 1.83 16.5
TOTAL 470.19

Based on the datagolf prediction, these are the best bets for the tournament. The “best” bet (biggest edge of offered odds vs predicted odds) is Scottie Scheffler to make the Top 20. Kelly dictates a bet of 47.56% of your bankroll. If you took all 20 bets, this would sum to 470.19% of your bankroll. Almost 5 times your bankroll.

Here are a few options to deal with this issue:

  • Limit to Kelly Fractions That Don’t Exceed Bankroll - allocate the bankroll based on the Kelly fraction up to the point that the total amount bet is not more than the total bankroll. In the 2023 PGA Championship example above, you would take the first 2 bets totaling 82.2% of your bankroll. The downside is that you are concentrating your bets among fewer players, which increases your risk as compared to spreading the bets out.
  • Half or Quarter Kelly Fractions - a popular approach is to bet half or quarter Kelly fractions. The next approach is a variation of that theme.
  • Aggregate and Allocate Kelly Fractions Equitably - I will refer to this as Equitable Kelly - allocate the bankroll based on the Kelly fraction as a percentage of the sum of the Kelly fractions for your chosen bet ranking. For example, assume that you have identified the top 20 bets across multiple markets and you have a bankroll of $1,000. You would divide the Kelly fraction for a single bet by the sum of all of the Kelly fractions. That way, the sum of all bets would not exceed your bankroll.

For the 2023 PGA Championship example above, the kelly_equitable_amt (%) column shows the amount of the bankroll to bet. You can see the total does not exceed 100%.

market player.name edge_pct (%) kelly (%) kelly_equitable_amt (%)
top_20 Scheffler, Scottie 68.96 47.56 10.12
top_10 Scheffler, Scottie 96.98 34.64 7.37
top_20 Rahm, Jon 50.08 34.54 7.35
top_20 Schauffele, Xander 57.88 30.46 6.48
top_20 Cantlay, Patrick 51.89 25.94 5.52
top_20 Finau, Tony 52.28 23.76 5.05
top_10 Rahm, Jon 66.17 23.63 5.03
top_5 Scheffler, Scottie 99.8 22.18 4.72
make_cut Scheffler, Scottie 2.65 22.07 4.69
make_cut McCarthy, Denny 13.83 20.74 4.41
make_cut Scheffler, Scottie 2.43 20.68 4.40
make_cut Finau, Tony 4.98 19.93 4.24
make_cut Cantlay, Patrick 4.35 19.56 4.16
top_10 Schauffele, Xander 72.14 19.24 4.09
top_20 McIlroy, Rory 31.34 18.44 3.92
make_cut Schauffele, Xander 3.66 18.31 3.89
make_cut Fowler, Rickie 7.9 17.77 3.78
make_cut Henley, Russell 9.44 17.74 3.77
make_cut Scheffler, Scottie 1.83 16.5 3.51
make_cut Scheffler, Scottie 1.83 16.5 3.51
TOTAL 470.19 100

Staking Methods by Tournament - Fixed-Unit, Fixed-Percentage, Kelly and Equitable Kelly

Most bettors will take a per tournament approach to betting. The information below adopts this per-tournament approach and analyzes different systems for bet ranking and bet allocation based on the calculated Kelly fractions. The main difference between the below analyses and those up to this point is the addition of Kelly fraction. However, there is no carryover of bankrolls from one tournament to the next in these analyses; the same bankroll for each approach is assumed at the start of each tournament. Additionally, a fixed-unit and fixed-percentage approach is shown for comparisons, along with the equitable allocation based on the calculated Kelly fractions. Recall that a fixed-percentage approach divides the bankroll as a proportion of the starting bankroll meaning that you cannot bet more than you have at the start of the tournament. This is not true for the fixed-unit or the kelly approach; both assume that you can bet more than 100% of the bankroll.

Analysis #8: All Positive Edge Bets by Tournament

The table below shows all positive edge bets by tournament regardless of market or book. The system takes all bets assuming a positive predicted edge. The table compares the win_loss total from making fixed-unit and fixed percentage bets versus betting the applicable Kelly fraction and the equitable Kelly amount.

The graph below shows the cumulative sum of the profit/loss for each staking plan for all positive edge bets by tournament.

Analysis #9: Top 20 Positive Edge Bets by Tournament

The table below shows the top 20 positive edge bets (including ties) by tournament regardless of market or book. The table compares the win_loss total from making fixed-unit and fixed percentage bets versus betting the applicable Kelly fraction and the equitable Kelly amount.

The graph below shows the cumulative sum of the profit/loss for each staking plan for the top 20 positive edge bets (plus ties) by tournament. It’s interesting to note how the equitable Kelly approach outperformed the Kelly - most of this effect was because of the small amount that Kelly allocates of your bankroll when you limit the range of bets to 20 per tournament.

Analysis #10: Monte Carlo Analysis of Fixed-Unit and Kelly Staking

The final analysis will be to compare fixed-unit and Kelly staking using Monte Carlo analysis.

“The basics of a Monte Carlo simulation are simply to model your problem, and then randomly simulate it until you get an answer.” Monte Carlo Simulations in R - Link.

In the context of sports betting, you create or acquire a model like the one presented in datagolf, and then apply the model to the data to determine the outcome (win or loss) of the bet. Next, you randomly sample these outcomes thousands of times combining them into a single run. In general, a run can be thought of as a series of bets placed at random when a positive edge is identified. Finally, you aggregate all of the runs and analyze the performance as a whole.

The biggest change up to this point is that the bankroll will carry forward over multiple bets. This means that if you bet 40% of a $1,000 bankroll on a single bet with 2.0 odds (a payout of 1 unit for every 1 unit bet) and you win, then for the next bet the bankroll will be $1,400.

Because of the nature of Monte Carlo analysis, I won’t limit the bets to a single tournament. The benefit is that no run of luck in a single tournament can provide a boost to the returns. The downside is that the analysis is not realistic, ie, because the bets are all made at random times, it is not reproducible. However, comparison among the different staking plans will be accurate.

A few terms to define:

  • Runs - a series of (1) non-consecutive (2) random bets selected from the entire bet data.frame where the datagolf bhf model predicts an edge based on the opening line. No replacements of the data is allowed in a single run. This means that no bet should be repeated in a given run. Ideally, this will minimize the effect of a single win and provide a robust outcome.
  • Sample Size - the total number of bets in a run. For example, a sample may consist of 100 bets. The bets combined are considered a single run.
  • Stake - the starting bankroll in a given run. Initially set at $1,000.

For fixed unit betting, I ran a simulation of 10,000 runs. Each run had a sample size of 100 (or 100 consecutive bets) and a starting stake of $1,000. The results were as follows:

A win in this case meant that the ending bankroll after 100 bets was greater than the starting bankroll. Using this measurement, only 45.5% of all runs were profitable. However, the average profit across all 10,000 runs (winners and losers) was $101.

I ran the same scenario using the Kelly fractions.

The results were much better (with a BIG caveat). More than 72% of all runs were profitable, with the average profit for all runs equal to almost $1,200.

Drilling down on the data a little more reveals that most of the profits were from the top 500 wins

The top 500 wins (or .05% of the total bets in the simulation) in 34 of the 10,000 runs accounted for more than 100% of the total profit. This means that a few large consecutive winning bets toward the end of a small portion of the 100 bet runs accounted for the majority of the total winnings. Not something that you want to rely upon long term.

On-Demand Analysis

The next part will create a function to get the current odds data from the datagolf.com API and create a basic report with the most recent model predictions, odds and edge data.

Download and Save Pre-Tournament Prediction Data

The report will use data from the following APIs:

  • Model Predictions - Pre-Tournament Predictions API endpoint This API “Returns full-field probabilistic forecasts for the upcoming tournament on PGA, European, and Korn Ferry Tours from both our baseline and baseline + course history & fit models. Probabilities provided for various finish positions (make cut, top 20, top 5, win, etc.).”
  • Outright (Finish Position) Odds “Returns the most recent win, top 5, top 10, top 20, make/miss cut, and first round leader odds offered at 11 sportsbooks alongside the corresponding predictions from our model.”

For an example of the report, you can see a copy of the report here.

Conclusion

This project turned into a bigger endeavor than when I first started (but don’t they all?). The datagolf predictions proved to be accurate as a whole. A profitable system exists, most likely from using the equitable Kelly approach and limiting it to make_cut and top_20 markets. More research and exploration of the application of bet sizing, staking and money management is in order.

Notes & Research

Datagolf.com

  • DataGolf API Access - Link - DB
  • DataGolf Predictions Archive - Link - DB
  • DataGolf API Odds Coverage - Link - DB
  • DataGolf 2021 Betting Results - Link - DB
  • DataGolf 2020 Betting Results - Link - DB
  • DataGolf 2019 Betting Results - Link - DB
  • FAQ - Link - DB
  • Predictive model methodology - Link - DB
  • Three Off the Tee - THE ROLE OF MARKET ODDS IN THE MODELLING PROCESS - Link - DB
  • Model Talk | Pressure revisited (again) - Link - DB

General Sports Betting and Odds

  • What Are Units in Sports Betting? - Link - DB
  • How to calculate EV | Expected Value in sports betting - Link - DB
  • How Decimal Odds Work in Sports Betting - Link - DB
  • If I have the implied probability of something how do I convert that into decimal odds? - Cross Validated - Link - DB
  • How Does Betting Odds Movement Work? A Comprehensive Guide - Link - DB
  • American vs. Decimal Odds | Betting odds formats explained - Link - DB
  • How To Read The New TradeStation 2000i Performance Report - Link - DB
  • Betfair Pro Trader: Sharpe Ratio for Betting - [Link](http://www.betfairprotrader.co.uk/2012/06/sharpe-ratio-for-betting.html) - [DB](https://www.dropbox.com/s/japuccv3an5omdk/2023-Apr-28-%20%20%20Betfair%20Pro%20Trader%3A%20Sharpe%20Ratio%20for%20Betting.pdf?dl=0)
  • Dead Heat Rules in Golf Betting, Explained: What Happens When Players Tie? - Link - DB
  • How to calculate Cumulative portfolio returns in R :: Coding Finance - Link - DB
  • Null hypothesis - Wikipedia - Link - DB
  • How to find cumulative variance or standard deviation in R - Stack Overflow - Link - DB
  • dplyr: Cumulativate versions of any, all, and mean - Link
  • Wrong sample size bias - Catalog of Bias - Link - DB
  • Discrete gambles: Theoretical study of optimal bet allocations for the expo-power utility gambler - Link

Bankroll Management, Risk of Ruin and Sharpe Ratio

  • Bet sizing using risk-of-ruin indicator - Vegapit - Link - DB
  • Risk of ruin - Wikipedia - Link - DB
  • R code for the Gambler’s Ruin Simulation - Link
  • Risk versus Return: Betting optimally on the Soccer World Cup using Robust Optimization - Link - DB
  • Optimal sports betting strategies in practice: an experimental review - Link - DB
  • Betting Bankroll Management | The Relationships Between Odds, Edge and Variance - Link - DB
  • Monte Carlo Simulations in R - Count Bayesie - Link - DB
  • FAQ: How do I interpret odds ratios in logistic regression? - Link - DB
  • r - maxdrawdown function - Stack Overflow - Link - DB
  • Use statistical bootstrapping to validate an algorithmic trading strategy - Vegapit Link - DB

Kelly Criterion - Basic Information

  • The Kelly Criterion - Quantitative Trading - Link - DB
  • Pyckio How to apply de Kelly criteria in your betting with the "Unit Impact" method | Pyckio - Link - DB
  • Kelly Criterion risk assessent | How risky is the Kelly Criterion? - Link - DB
  • Kelly criterion staking strategy - Use it for value betting - Link - DB

Kelly Criterion - Multiple Bets

  • Numerically solve Kelly criterion for multiple simultaneous bets - Link
  • Multiple Kelly Criterion Calculator - Link
  • Algorithms for Large Sequential Incomplete-Information Games - Link
  • ‘Making big bucks’ with a data-driven sports betting strategy - Link