Simple q-learning simulation to determine optimal strategy in Quasar, a net positive gambling game from Mass Effect.
1,000,000 games of training Final Q-table:
count | Q(4-7) | Q(1-8) | Q(cashout) |
---|---|---|---|
1 | 44.406 | 35.538 | -931.36 |
2 | 38.263 | 33.345 | -922.77 |
3 | 29.432 | 35.911 | -941.9 |
4 | 24.981 | 38.138 | -931.91 |
5 | 28.001 | 43.993 | -942.42 |
6 | 48.211 | 39.415 | -935.08 |
7 | 52.367 | 30.722 | -939.34 |
8 | 35.848 | 15.634 | -945.51 |
9 | 8.4491 | 21.178 | -962.81 |
10 | -14.227 | 23.475 | -951.06 |
11 | -10.764 | 33.581 | -982.21 |
12 | 20.678 | 55.159 | -930.84 |
13 | 90.32 | -3.4135 | -912.44 |
14 | 37.824 | -29.884 | -983.24 |
15 | -31.62 | -55.084 | -150 |
16 | -135.04 | -50.067 | -100 |
17 | -200 | -117.25 | 1.48e-323 |
18 | -200 | -122.5 | 50 |
19 | -200 | -154.43 | 100 |
20 | -200 | -200 | 200 |