mberk / shin Goto Github PK

View Code? Open in Web Editor NEW

78.0 5.0 9.0 37 KB

Python implementation of Shin's method for calculating implied probabilities from bookmaker odds

License: MIT License

Python 89.55% Rust 10.45%

shin-method gambling betting betting-odds

shin's Introduction

shin

A Python implementation of Shin's method [1, 2] for calculating implied probabilities from bookmaker odds.

Probabilities calculated in this way have been shown to be more accurate than those obtained by the standard approach of dividing the inverse odds by the booksum [3].

Installation

Requires Python 3.9 or above.

pip install shin

Usage

import shin

shin.calculate_implied_probabilities([2.6, 2.4, 4.3])

[0.37299406033208965, 0.4047794109200184, 0.2222265287474275]

Shin's method assumes there is some unknown proportion of bettors that are insiders, z, and this proportion along with the implied probabilities can be estimated using an iterative procedure described in [4].

Diagnostic information from the iterative procedure can be obtained by setting the full_output argument to True:

import shin

shin.calculate_implied_probabilities([2.6, 2.4, 4.3], full_output=True)

ShinOptimisationDetails(
    implied_probabilities=[0.37299406033208965, 0.4047794109200184, 0.2222265287474275],
    iterations=426,
    delta=9.667822098435863e-13,
    z=0.01694251276407055
)

The returned object contains the following fields:

implied_probablities
iterations - compare this value to the max_iterations argument (default = 1000) to check for failed convergence
delta - the final change in z for the final iteration. Compare with the convergence_threshold argument (default = 1e-12) to assess convergence
z - the estimated proportion of theoretical betting volume coming from insider traders

When there are only two outcomes, z can be calculated analytically [3]. In this case, the iterations and delta fields of the returned dict are 0 to reflect this:

import shin

shin.calculate_implied_probabilities([1.5, 2.74], full_output=True)

ShinOptimisationDetails(
    implied_probabilities=[0.6508515815085157, 0.3491484184914841],
    iterations=0.0,
    delta=0.0,
    z=0.03172728540646625
)

Note that with two outcomes, Shin's method is equivalent to the Additive Method of [5].

What's New in Version 0.2.0?

The latest version improves support for static typing and includes a breaking change.

Breaking Change To `calculate_implied_probabilities()` Signature

All arguments to calculate_implied_probabilities() other than odds are now keyword only arguments. This change simplified declaration of overloads to support typing the function's return value and will allow for more flexibility in the API.

from shin import calculate_implied_probabilities

# still works
calculate_implied_probabilities([2.0, 2.0])
calculate_implied_probabilities(odds=[2.0, 2.0])
calculate_implied_probabilities([2.0, 2.0], full_output=True)
## also any other combination of passing arguments as keyword args remains the same

# passing any arg other than `odds` as positional is now an error
calculate_implied_probabilibies([2.0, 2.0], 1000)  # Error
calculate_implied_probabilities([2.0, 2.0], max_iterations=1000)  # OK


calculate_impolied_probabilities([2.0, 2.0], 1000, 1e-12, True) # Error
calculate_implied_probabilities([2.0, 2.0], max_iterations=1000, convergence_threshold=1e-12, full_output=True)  # OK

See this commit for more details.

Full Output Type

The full_output argument now returns a ShinOptimisationDetails object instead of a dict. This object is a dataclass with the same fields as the dict that was previously returned.

For the read-only case, the ShinOptimisationDetails object can be used as a drop-in replacement for the dict that was previously returned as it supports __getitem__().

This change was introduced to support generic typing of the implied_probabilities, currently not supported by TypedDict in versions of Python < 3.11.

See this and this for more details.

What's New in Version 0.1.0?

The latest version introduces some substantial changes and breaking API changes.

Default Return Value Behaviour

Previously shin.calculate_implied_probabilities would return a dict that contained convergence details of the iterative fitting procedure along with the implied probabilities:

import shin

shin.calculate_implied_probabilities([2.6, 2.4, 4.3])

{'implied_probabilities': [0.37299406033208965,
  0.4047794109200184,
  0.2222265287474275],
 'iterations': 425,
 'delta': 9.667822098435863e-13,
 'z': 0.01694251276407055}

The default behaviour now is for the function to only return the implied probabilities:

import shin

shin.calculate_implied_probabilities([2.6, 2.4, 4.3])

[0.37299406033208965, 0.4047794109200184, 0.2222265287474275]

The full output can still be had by setting the full_output argument to True:

import shin

shin.calculate_implied_probabilities([2.6, 2.4, 4.3], full_output=True)

{'implied_probabilities': [0.37299406033208965,
  0.4047794109200184,
  0.2222265287474275],
 'iterations': 425,
 'delta': 9.667822098435863e-13,
 'z': 0.01694251276407055}

Passing Mappings

A common scenario is to have a mapping between some selection identifiers and their odds. You can now pass such mappings to shin.calculate_implied_probabilities and have a new dict mapping between the selection identifiers and their probabilities returned:

import shin

shin.calculate_implied_probabilities({"HOME": 2.6, "AWAY": 2.4, "DRAW": 4.3})

{'HOME': 0.37299406033208965,
 'AWAY': 0.4047794109200184,
 'DRAW': 0.2222265287474275}

This also works when asking for the full output to be returned:

import shin

shin.calculate_implied_probabilities({"HOME": 2.6, "AWAY": 2.4, "DRAW": 4.3}, full_output=True)

{'implied_probabilities': {'HOME': 0.37299406033208965,
  'AWAY': 0.4047794109200184,
  'DRAW': 0.2222265287474275},
 'iterations': 426,
 'delta': 9.667822098435863e-13,
 'z': 0.01694251276407055}

Controlling the Optimiser

Starting in version 0.1.0, the iterative procedure is implemented in Rust which provides a considerable performance boost. If you would like to use the old Python based optimiser use the force_python_optimiser argument:

import timeit
timeit.timeit(
    "shin.calculate_implied_probabilities([2.6, 2.4, 4.3], force_python_optimiser=True)",
    setup="import shin",
    number=10000
)

3.9101167659973726

import timeit
timeit.timeit(
    "shin.calculate_implied_probabilities([2.6, 2.4, 4.3])",
    setup="import shin",
    number=10000
)

0.14442387002054602

References

[1] H. S. Shin, “Prices of State Contingent Claims with Insider traders, and the Favorite-Longshot Bias”. The Economic Journal, 1992, 102, pp. 426-435.

[2] H. S. Shin, “Measuring the Incidence of Insider Trading in a Market for State-Contingent Claims”. The Economic Journal, 1993, 103(420), pp. 1141-1153.

[3] E. Štrumbelj, "On determining probability forecasts from betting odds". International Journal of Forecasting, 2014, Volume 30, Issue 4, pp. 934-943.

[4] B. Jullien and B. Salanié, "Measuring the Incidence of Insider Trading: A Comment on Shin". The Economic Journal, 1994, 104(427), pp. 1418–1419

[5] S. Clarke, S. Kovalchik, M. Ingram, "Adjusting bookmaker’s odds to allow for overround". American Journal of Sports Science, 2017, Volume 5, Issue 6, pp. 45-49.

shin's People

Contributors

Stargazers

Watchers

Forkers

tubbz-alt wtsgold timbuckwho toulio 8funtik8 shabbirhasan1 peterschutt crashbandicooch devmwembo96

shin's Issues

feature: implied odds calculator

A natural counterpart to probability inference would be odds inference, i.e., output a set of prices for a given set of probabilities (that sum to 1) for a target overround/margin, where the sum of the inverse output prices would = 1 + target overround/margin.

def test_calculate_implied_odds() -> None:
    odds = [2.6, 2.4, 4.3]
    margin = sum([1 / o for o in odds]) - 1
    implied_probabilities = shin.calculate_implied_probabilities(odds)
    res = shin.calculate_implied_odds(implied_probabilities, margin=margin)
    assert pytest.approx(res) == odds

Would you be interested in including something like this in the library?

R's implied package does this, and I've put together a very raw python implementation based from that which you can see diff'd against the typing branch here: https://github.com/peterschutt/shin/pull/1/files.

Missing py.typed file

Hi,

Just started using shin - thanks for making the package available!

In testing I've noticed that while the source is typed correctly mypy won't recognize it as typed due to missing the py.typed marker file.

Here is mypy output:

src/domain/utils.py:5: error: Skipping analyzing "shin": module is installed, but missing library stubs or py.typed marker  [import-untyped]
src/domain/utils.py:5: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports

Rewrite README for version 0.2.0

e.g. some of the comments on return values are no longer valid with the changes in #5

Improve typed interface of `calculate_implied_probabilities()`

The function has a boolean flag argument that alters its output so we can use @overload to specify the relationship between args and output types.

For example,

reveal_type(shin.calculate_implied_probabilities((1.5, 2.5, 3.5)))
reveal_type(shin.calculate_implied_probabilities({"a": 1.5, "b": 2.5, "c": 3.5}))
reveal_type(shin.calculate_implied_probabilities((1.5, 2.5, 3.5), full_output=True))
reveal_type(shin.calculate_implied_probabilities({"a": 1.5, "b": 2.5, "c": 3.5}, full_output=True))

with 0.1.1 branch:

src/domain/utils.py:22: note: Revealed type is "Union[builtins.dict[builtins.str, Any], builtins.list[builtins.float], builtins.dict[Any, builtins.float]]"
src/domain/utils.py:23: note: Revealed type is "Union[builtins.dict[builtins.str, Any], builtins.list[builtins.float], builtins.dict[Any, builtins.float]]"
src/domain/utils.py:24: note: Revealed type is "Union[builtins.dict[builtins.str, Any], builtins.list[builtins.float], builtins.dict[Any, builtins.float]]"
src/domain/utils.py:25: note: Revealed type is "Union[builtins.dict[builtins.str, Any], builtins.list[builtins.float], builtins.dict[Any, builtins.float]]"

Using overloads, that can be:

src/domain/utils.py:22: note: Revealed type is "builtins.list[builtins.float]"
src/domain/utils.py:23: note: Revealed type is "builtins.dict[Any, builtins.float]"
src/domain/utils.py:24: note: Revealed type is "builtins.dict[builtins.str, Any]"
src/domain/utils.py:25: note: Revealed type is "builtins.dict[builtins.str, Any]"

... which is a much nicer experience downstream as we don't need to narrow the return type somehow before we go on to use it.

LMK if this is something you'd be interested in a PR for.. there is potentially a lot of overloads required as all of the arguments can be specified either positionally or by kwarg and mypy requires an overload that covers any scenarios. E.g., here is the diff that I've got to produce the above:

(.venv) peter@pop-os:~/PycharmProjects/shin$ git diff
diff --git a/python/shin/__init__.py b/python/shin/__init__.py
index 2ba4c93..0d58daa 100644
--- a/python/shin/__init__.py
+++ b/python/shin/__init__.py
@@ -1,7 +1,7 @@
-from collections.abc import Collection
+from collections.abc import Sequence
 from collections.abc import Mapping
 from math import sqrt
-from typing import Any, Union
+from typing import Any, Literal, TypeVar, Union, overload
 
 
 from .shin import optimise as _optimise_rust
@@ -15,7 +15,7 @@ def _optimise(
     convergence_threshold: float = 1e-12,
 ) -> tuple[float, float, float]:
     delta = float("Inf")
-    z = 0
+    z = 0.0
     iterations = 0
     while delta > convergence_threshold and iterations < max_iterations:
         z0 = z
@@ -31,8 +31,85 @@ def _optimise(
     return z, delta, iterations
 
 
+# full output False as positional argument
+# sequence input
+@overload
 def calculate_implied_probabilities(
-    odds: Union[Collection[float], Mapping[Any, float]],
+    odds: Sequence[float],
+    max_iterations: int,
+    convergence_threshold: float,
+    full_output: Literal[False],
+    force_python_optimiser: bool = ...,
+) -> list[float]:
+    ...
+
+
+# mapping input
+@overload
+def calculate_implied_probabilities(
+    odds: Mapping[Any, float],
+    max_iterations: int,
+    convergence_threshold: float,
+    full_output: Literal[False],
+    force_python_optimiser: bool = ...,
+) -> dict[Any, float]:
+    ...
+
+
+# full output False as keyword argument, or default False
+# sequence input
+@overload
+def calculate_implied_probabilities(
+    odds: Sequence[float],
+    *,
+    max_iterations: int = 1000,
+    convergence_threshold: float = 1e-12,
+    full_output: Literal[False] = False,
+    force_python_optimiser: bool = False,
+) -> list[float]:
+    ...
+
+
+# mapping input
+@overload
+def calculate_implied_probabilities(
+    odds: Mapping[Any, float],
+    *,
+    max_iterations: int = 1000,
+    convergence_threshold: float = 1e-12,
+    full_output: Literal[False] = False,
+    force_python_optimiser: bool = False,
+) -> dict[Any, float]:
+    ...
+
+
+# full output True as positional argument
+@overload
+def calculate_implied_probabilities(
+    odds: Union[Sequence[float], Mapping[Any, float]],
+    max_iterations: int,
+    convergence_threshold: float,
+    full_output: Literal[True],
+    force_python_optimiser: bool = ...,
+) -> dict[str, Any]:
+    ...
+
+
+# full output True as keyword argument
+@overload
+def calculate_implied_probabilities(
+    odds: Union[Sequence[float], Mapping[Any, float]],
+    *,
+    max_iterations: int = 1000,
+    convergence_threshold: float = 1e-12,
+    full_output: Literal[True],
+    force_python_optimiser: bool = False,
+) -> dict[str, Any]:
+    ...
+
+
+def calculate_implied_probabilities(
+    odds: Union[Sequence[float], Mapping[Any, float]],
     max_iterations: int = 1000,
     convergence_threshold: float = 1e-12,
     full_output: bool = False,
diff --git a/python/shin/shin.pyi b/python/shin/shin.pyi
index e69de29..1ee3b6f 100644
--- a/python/shin/shin.pyi
+++ b/python/shin/shin.pyi
@@ -0,0 +1,8 @@
+def optimise(
+    inverse_odds: list[float],
+    sum_inverse_odds: float,
+    n: int,
+    max_iterations: int = 1000,
+    convergence_threshold: float = 1e-12,
+) -> tuple[float, float, float]:
+    ...

The number of overloads could be reduced by making the args other than odds keyword only, which would be an OK fit IMO:

def calculate_implied_probabilities(
    odds: Union[Sequence[float], Mapping[Any, float]],
    *,
    max_iterations: int = 1000,
    convergence_threshold: float = 1e-12,
    full_output: bool = False,
    force_python_optimiser: bool = False,
) -> Union[dict[str, Any], list[float], dict[Any, float]]: