There has been discussions about why KataGo is no so strong at high handicap (5 stones

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

How to improve KataGo at high handicap?,about lightvector/katago

Comments (30)

lightvector commented on June 28, 2024 5

@Ishinoshita @kaorahi - yes, there are a LOT of different things one could try.

Self-play changes are very expensive to test, changes to just search would be worth doing first. And also I personally have nowhere near the capacity to test all the ideas myself, I would certainly welcome any experiments by you or other people!

You might be interested in the parameter dynamicScoreCenterZeroWeight, newly pushed to master today. It can be experimented with in conjunction with dynamicScoreUtilityFactor. See here: 52017e8

Roughly, KataGo attempts to maximize:
winrate + dynamicScoreUtilityFactor * sigmoid(score difference - CENTER) + staticScoreUtilityFactor * sigmoid(score difference - 0)
Normally, CENTER is set at the current estimated score, but now dynamicScoreCenterZeroWeight can move CENTER part of the way towards zero. Due to the sigmoid shape of the utility, in theory this should make KataGo more risk-seeking and value potential score gains more than being afraid of possible score losses. Would that help? Maybe!

Additionally, you might want to experiment with the magnitude of dynamicScoreUtilityFactor. Does handicap play improve if you increase/decrease it? For that matter, does even-game match strength improve if you increase/decrease it? I have no idea! Haven't had time to test myself!

from katago.

Splee99 commented on June 28, 2024 5

Preliminary tests showed changing dynamicScoreUtilityFactor to 0.4 (from 0.2) produced good results in 4 stone handicap games.

from katago.

kaorahi commented on June 28, 2024 2

I observed that an adjusted komi prevents 3-3 in this situation:

(;AB[dp][pd][pp][dd]HA[4];W[qf];B[nc];W[rd];B[qc])

(ref. leela-zero/leela-zero#2431 (comment))

komi = 7.5

$ (echo "komi 7.5\ntime_settings 0 60 1\nplay b d4\nplay b d16\nplay b q4\nplay b q16\nplay w r14\nplay b o17\nplay w s16\nplay b r17\ngenmove w\nshowboard") | ./katago gtp -model net/g104-b6c96-s97778688-d23397744/model.txt.gz -config src/cpp/configs/gtp_example.cfg 2> /dev/null | grep -v '^=? *$'
= C17
= MoveNum: 6 HASH: AFD46D37280B84C1E041908C4EDDD226
   A B C D E F G H J K L M N O P Q R S T
19 . . . . . . . . . . . . . . . . . . .
18 . . . . . . . . . . . . . . . . . . .
17 . . O3. . . . . . . . . . X . . X2. .
16 . . . X . . . . . . . . . . . X . O1.
15 . . . . . . . . . . . . . . . . . . .
14 . . . . . . . . . . . . . . . . O . .
13 . . . . . . . . . . . . . . . . . . .
12 . . . . . . . . . . . . . . . . . . .
11 . . . . . . . . . . . . . . . . . . .
10 . . . . . . . . . . . . . . . . . . .
 9 . . . . . . . . . . . . . . . . . . .
 8 . . . . . . . . . . . . . . . . . . .
 7 . . . . . . . . . . . . . . . . . . .
 6 . . . . . . . . . . . . . . . . . . .
 5 . . . . . . . . . . . . . . . . . . .
 4 . . . X . . . . . . . . . . . X . . .
 3 . . . . . . . . . . . . . . . . . . .
 2 . . . . . . . . . . . . . . . . . . .
 1 . . . . . . . . . . . . . . . . . . .

komi = 60

$ (echo "komi 60\ntime_settings 0 60 1\nplay b d4\nplay b d16\nplay b q4\nplay b q16\nplay w r14\nplay b o17\nplay w s16\nplay b r17\ngenmove w\nshowboard") | ./katago gtp -model net/g104-b6c96-s97778688-d23397744/model.txt.gz -config src/cpp/configs/gtp_example.cfg 2> /dev/null | grep -v '^=? *$'
= R11
= MoveNum: 6 HASH: 65EC6B38F3CA6FB2D3FD92847A9A48D6
   A B C D E F G H J K L M N O P Q R S T
19 . . . . . . . . . . . . . . . . . . .
18 . . . . . . . . . . . . . . . . . . .
17 . . . . . . . . . . . . . X . . X2. .
16 . . . X . . . . . . . . . . . X . O1.
15 . . . . . . . . . . . . . . . . . . .
14 . . . . . . . . . . . . . . . . O . .
13 . . . . . . . . . . . . . . . . . . .
12 . . . . . . . . . . . . . . . . . . .
11 . . . . . . . . . . . . . . . . O3. .
10 . . . . . . . . . . . . . . . . . . .
 9 . . . . . . . . . . . . . . . . . . .
 8 . . . . . . . . . . . . . . . . . . .
 7 . . . . . . . . . . . . . . . . . . .
 6 . . . . . . . . . . . . . . . . . . .
 5 . . . . . . . . . . . . . . . . . . .
 4 . . . X . . . . . . . . . . . X . . .
 3 . . . . . . . . . . . . . . . . . . .
 2 . . . . . . . . . . . . . . . . . . .
 1 . . . . . . . . . . . . . . . . . . .

from katago.

Ishinoshita commented on June 28, 2024 2

@lightvector Thanks for these comments and tips! Nice to have these search parameters accessible as arguments. Will give a try.

from katago.

petgo3 commented on June 28, 2024 1

@Splee99 👍 Thx for the hint. This helped a lot in 4-stone games, againt dcnn1d's in KGS, which my kata config (with dynamicScoreUtilityFactor 0.2) never managed so far. I'll have some further testing with h5 and h6.
Additional, by watching the games (all won so far), i got the impression, that kata reaches quickly some balanced game. After that it plays more like holding the result. But this is only a feeling? Or could there be a reason for this behavior?
Anyway good work!

from katago.

petgo3 commented on June 28, 2024 1

I use dynamicScoreUtilityFactor 0.6 now. Nothing negative detected about it, but quite a creative fuseki.

from katago.

AncalagonX commented on June 28, 2024 1

It would be helpful to be able to apply different utility factors to Black and White.

For example, when KataGo is playing as White against high handicap, I would like to see it calculate utility for Black and White differently. Right now, attempting to make KataGo value "riskier" moves forces the MCTS to do so for both players, skewing the search results in some (probably undesirable) way.

from katago.

alreadydone commented on June 28, 2024 1

Let me suggest an approach which has been gradually fleshed out over months but which I haven't got a chance to implement. It's based on the idea of simulating a weaker opponent in MCTS. Similar ideas, some involving two networks (@Ttl), have been tried, but I think this is the correct way to implement the idea.
Suppose we want to simulate an opponent that accrues playouts slower (with ratio r < 1) optionally using a weaker network (but using only one network could save computation via NNCache).
In the following, white is the stronger player, and black is the weaker player with handicap stones to start with.

The tentative implementation of the idea requires four kinds of nodes: white-to-move node in white's mind "wtm(w)", white-to-move node in black's mind "wtm(b)", black-to-move node in black's mind "btm(b)", and black-to-move node in white's mind "btm(w)".
Note that a wtm node in LZ stores the move black just made (if any) and its policy; I have not checked KataGo.

A wtm node has btm nodes as its children, and vice versa.
A wtm(w) node has both types of btm nodes as its children, and it keeps separate visit count and accumulated winrate for each of them (the (w) winrate is unnecessary for play but useful for analysis). When a wtm(w) node is reached during MCTS, if the (b) visit count is less than r times the (w) visit count, use UCT to select a child from the children of btm(b) type only ("branch into a simulation in black's mind"). Otherwise, select from the btm(w) children (or expand the node itself to create both types of children, if it hasn't been).
A btm(w) node only has wtm(w) nodes as its children. When selecting a wtm(w) child at a btm(w) node, use the wtm(w) nodes' (b) winrates and visit counts.
A (b) node only has (b) nodes as its children.
The root node is always (w).
When using two networks, only btm(w) nodes store policy from the stronger network; all other three types of nodes store policy from the weaker network.

Back up: For a newly expanded (w) node, all ancestors are (w) nodes; we update the (w) visit count and winrate of each of these ancestors (for the node itself, we may as well update both the (w) and (b) visit counts and winrates if it's wtm(w)).
For a newly expanded (b) node, update all (b) ancestors' and the last (w) ancestor's (b) visit counts and winrates (with winrate from the weaker network).

A problem is that the number of nodes would be roughly proportional to the sum of depths of all (w) nodes, leading to excessive memory usage when the tree is large and deep; so maybe one would merge nodes that correspond to the same variation, but keep for each node a list of visit counts and winrates indexed by how deep they're into a simulation in black's mind. Time complexity would be proportional to sum of the squares of the depths of (w) nodes but hopefully still manageable with the help of NNCache, even in the two networks case. (Corrected 7/22)

from katago.

Ishinoshita commented on June 28, 2024

@kaorahi Great! I was anticipating quite the contrary in fact : with high komi, would have expected white strategy 'take corners then reduce moyo' even more likely to be good enough for KG.
I have not give a try at KG (on windows) yet.

from katago.

Splee99 commented on June 28, 2024

What about the drawEquivalentWinsForWhite? By default it is 0.5. When I set it to 0.4 it seems to behave better in handicap games, but I'm not sure.

from katago.

lightvector commented on June 28, 2024

@Splee99 - it does nothing but change how much KataGo values getting an exact draw. If komi is a half-integer, then it does exactly nothing, so you are probably only imagining any effect here.

from katago.

Splee99 commented on June 28, 2024

@petgo3 The behavior is like that katago often plays crazy tenuki in this setting, which makes the opponent confused sometimes. However it could get punished in some capture race situations.

from katago.

Splee99 commented on June 28, 2024

@lightvector, What should I expect if I change numVirtualLossesPerThread to 1?

from katago.

lightvector commented on June 28, 2024

Good question! Presumably threads should be slightly less reluctant to search down overlapping tree branches. It would need testing (ideally at at least a few thousands of visits) to determine the impact on both performance and on overall strength for varying numbers of threads. I haven't done such testing yet, I just put down 2 as a hopefully-reasonable choice pulled out of thin air.

If LZ has done tests of this, any such results would also be fairly suggestive for Kata, although of course the bots implementations are not the same. Or else, in the event that you'd be interested in doing more systematic testing of that, please feel free.

from katago.

Ishinoshita commented on June 28, 2024

On the other side, FB last updated paper on ELF OpenGo has a paragraph on the impact of the virtual loss value itself (they conclude 1.0 is best, as AG).

from katago.

AncalagonX commented on June 28, 2024

I have also been doing some non-systematic testing of different values for dynamicScoreCenterZeroWeight during high-handicap games against dan-level AI (with a dozen anecdotal game results before this utility factor was added + a dozen game results after), and for some reason this utility factor loses more often, seemingly generating less situations where KataGo can force a mistake from its Black dan-level AI opponent.

Presumably, KataGo is following policy too closely rather than playing higher-risk moves with more possibilities for opponent mistakes and higher-reward outcomes. But this is just based on my observations and feelings from a couple dozen high-handicap games, combined with not-very-good methodology—so don't put too much "weight" on my dynamicScoreCenterZeroWeight findings in this comment. :)

from katago.

Ishinoshita commented on June 28, 2024

@Ancalagon Yes, assuming during the search, at high handicap, equal strength/behavior for both side, sounds like a major flaw for all engines, KG included.

Mimicking different behaviour might help, provided there is a parameter to 'mimick' such thing as sttength. Seems worth a try anyway, and worth adding on the already long wish list for @lightvector :-)

from katago.

kaorahi commented on June 28, 2024

Borrowing the idea from leela-zero/leela-zero#2331, I am trying a cheat to keep black uneasy in handicapped games. It tends to avoid early 3-3 invasion at least.

https://github.com/kaorahi/KataGo/tree/keep_uneasy

  // Let P = \sum_k blackOwner_k and Q = \sum_k abs(blackOwner_k)
  // for k = 1, 2, ..., 361.
  // To keep the game complicated, white hopes to minimize Q
  // in addition to P.
  // So the following code replaces scoreMean with
  // (P+Q)/2 shamelessly.

from katago.

kaorahi commented on June 28, 2024

Example of complicated fights in an H5 game

Black: Leela 0.11.0 (1sec)

White: Katago with this cheat (maxVisits = 3200, model = g104-b10c128-s110887936-d54937276)

(;GM[1]FF[4]KM[7.5]SZ[19]HA[5]AB[dp][pd][pp][dd][jj]PB[Leela_0.11.0_1sec]PW[Katago_tmp];W[qf];B[nd];W[cf];B[ch];W[gc];B[df];W[dc];B[cc];W[dg];B[cg];W[de];B[ef];W[cd];B[ed];W[ee];B[fe];W[ce];B[fd];W[ff];B[eg];W[bc];B[cb];W[bb];B[db];W[gf];B[bf];W[be];B[ae];W[bd];B[bh];W[dh];B[eh];W[di];B[ei];W[ci];B[bi];W[ej];B[fj];W[bj];B[cj];W[dj];B[ck];W[ek];B[gi];W[bk];B[cl];W[ai];B[ag];W[bl];B[cm];W[aj];B[ah];W[bm];B[bn];W[fh];B[fi];W[cn];B[dn];W[co];B[bo];W[ad];B[am];W[af];B[qe];W[dm];B[ae];W[dl];B[cp];W[dk];B[ba];W[bq];B[do];W[dr];B[er];W[jc];B[hd];W[qc];B[re];W[ob];B[nb];W[gd];B[ge];W[he];B[fb];W[eq];B[cr];W[oc];B[nc];W[ie];B[hc];W[od];B[oe];W[pf];B[pe];W[fr];B[br];W[ne];B[me];W[nf];B[of];W[og];B[ng];W[mf];B[le];W[mg];B[nh];W[oh];B[oi];W[rf];B[qh];W[pg];B[sf];W[sg];B[se];W[rg];B[rc];W[pc];B[rd];W[rb];B[sb];W[sc];B[qb];W[sa];B[qa];W[qd];B[kf];W[hb];B[gb];W[jg];B[kg];W[lc];B[ld];W[id];B[fc];W[ih];B[ji];W[kh];B[jh];W[hi];B[hj];W[lh];B[jf];W[ig];B[if];W[hf];B[hg];W[gh];B[ii];W[hh];B[kc];W[kb];B[kd];W[mb];B[oa];W[sd];B[jb];W[ib];B[ic];W[jd];B[fg];W[gg];B[je];W[ja];B[lf];W[ha];B[ab];W[af];B[lb];W[la];B[ae];W[fk];B[gj];W[af];B[mc];W[lb];B[ae];W[cm];B[ac];W[iq];B[qn];W[lq];B[oq];W[qj];B[ql];W[qk];B[pl];W[lo];B[mr];W[fo];B[lr];W[bp];B[cq];W[ao];B[fn];W[go];B[gn];W[hn];B[hm];W[in];B[gl];W[en];B[kq];W[kp];B[jq];W[ip];B[ir];W[hr];B[jr];W[pi];B[oj];W[im];B[il];W[fm];B[gm];W[lm];B[mn];W[ln];B[mm];W[np];B[no];W[mp];B[op];W[ll];B[ml];W[rl];B[rm];W[jl];B[jk];W[gk];B[hk];W[fl];B[hl];W[rk];B[kl];W[lk];B[jm];W[jn];B[jp];W[jo];B[hs];W[gr];B[mk];W[ni];B[nj];W[mi];B[pj];W[qi];B[km];W[kk];B[mq];W[lp];B[sl];W[kn];B[jl];W[nq];B[nr];W[sk];B[sm];W[mj];B[pk];W[kj];B[ki];W[li];B[ga];W[eo];B[gs];W[mo];B[fs];W[es];B[is])

from katago.

petgo3 commented on June 28, 2024

@kaorahi: For me using b20 it does not seem to give good results. DId I miss something?
I just copied your code to gtp.cpp and nneval.cpp.

from katago.

kaorahi commented on June 28, 2024

What are your "bad results"? Does it play in the same way as normal KataGo? Or, does it play unreasonable moves?

Anyway, I have no idea if the following points are ok.

0ec7275 only affects kata-analyze. You have to use cf25642 for genmove.
We need large visits (e.g. 3200) to avoid 3-3 invasions.
I never tried b20, that is too slow on my PC. Would you try b10 first?

from katago.

petgo3 commented on June 28, 2024

I did have a siginicant effect using 0ec7275. Only using cf25642 was quite the same than original katago. But perhaps i have to have a closer look with only this line of code.
Having a 2080 ti, i get more than 50k visits, so should be no problem. Perhaps this explains that i can't see this much of a difference ...
Anyway if only the line in cf25642 makes the difference, than i would guess, that the effect in case of high visits is not easy to measure.
Thx for quick advice :-)

from katago.

kaorahi commented on June 28, 2024

My previous comment may be misleading. Both 0ec7275 and cf25642 are necessary for genmove.

from katago.

petgo3 commented on June 28, 2024

Example of h4 game: with using both changes...

from katago.

petgo3 commented on June 28, 2024

Does perhaps pondering affect cheat? I got "Cuda backend: Model version 5", is this the wrong model for the cheat?

from katago.

kaorahi commented on June 28, 2024

Oh...

Would you try maxVisits = 3200 and model = g104-b10c128-s110887936-d54937276 ? It plays interestingly for me with these settings. It may play bad with very high visits and/or a large network.

from katago.

petgo3 commented on June 28, 2024

Tried maxVisits = 3200 and model = g104-b10c128-s110887936-d54937276. Quite the same style than with 20b net and more visits ...

from katago.

kaorahi commented on June 28, 2024

Thx for testing. I have no idea then...

from katago.

kaorahi commented on June 28, 2024

I tried the same shape as the right top corner in the h4 example by @petgo3. In my environment (Intel Graphics Compute Runtime for OpenCL / Sabaki), white plays R14 instead of R16, P17 instead of S15, and P17 instead of S16. Why?

from katago.

lightvector commented on June 28, 2024

Just browsing and cleaning up old issues.

v1.3+ changes have notably improved KataGo at handicap - due to the simple change of training some games with unbalanced playouts between the two sides (so the net learns how to handle the situation of strong vs weak players), and due to offset score center that was described above.

The asymmetric playout training also appears to have the result of completely naturally having the neural net on its own decide to prefer knight approach over 3-3 - it completely automatically learns in selfplay that although 3-3 is best for even games, it does not challenge a weaker player as much.

There's still further improvement possible still, and not all ideas have been exhausted, but I'm going ahead and closing this for now as a big part of this issue is handled. Yay! If there is more to discuss regarding handicap play, feel free to open a new issue. :)

from katago.

How to improve KataGo at high handicap? about katago HOT 30 CLOSED

Comments (30)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent