AlphaGo team published an article in the journal Nature on 19

Facebook Open Sources ELF OpenGo

Oddly, I found that expert iteration can work pretty well on imperfect information games due to feedback dynamics between the policy network and the MCTS probabilities. Basically, the policy network tries to approximate any future knowledge the MCTS ends up exposing, and the structure of where that approximation succeeds or fails ends up biasing the MCTS in ways that capture some degree of active inference. The downside is that while this can work, you do seem to lose the guarantees that it will work that is to say, there a region of the parameter space where that feedback dynamic seems to converge to the correct active inference policies, and a region in which it diverges (generally in the form of driving the action probabilities to arbitrary delta function distributions). I don know how the relative volume of those regions scale with more complex games than the ones I tried (which were extremely simple guessing games and information retrieval games). You could also model it as non deterministic but perfect information, from the point of view of a single player, a bit like quantum mechanics so to speak, but that probably not a very good idea since the other players do, in fact, know the "hidden variables" at play (I guess you could think of each players actions as being an indirect observation or something like that, but I not seeing much of a point beyond the mental exercise)

AlphaGo Zero

AlphaGo Zero is a version of DeepMind Go software AlphaGo. AlphaGo team published an article in the journal Nature on 19 October 2017, introducing AlphaGo Zero, a version created without using data from human games, and stronger than any previous version. By playing games against itself, AlphaGo Zero surpassed the strength of AlphaGo Lee in three days by winning 100 games to 0, reached the level of AlphaGo Master in 21 days, and exceeded all the old versions in 40 days.

It's valid. I did acknowledge though that the 2 weren't immediately comparable in my post, just that they could give a general ballpark estimate. I wasn't sure if my wording was just that unclear/poor or if you didn't bother reading the post you replied to. Regardless of that though, we do both agree it isn't immediately comparable. Whether or not I feel personally offended, I can't help but think the general discourse of a sub is lower when you see people calling others out in what looks personal, simply because it's encouraging people not to post anything. That is, "innocent bystanders" just reading up the sub.

It was a 79 block residual network as I recall. The layers weren particularly wide. It just a question of how much monte carlo tree search you want to do per move, and how much time you have to wait while it does it. But I think I recall reading in the AlphaZero paper that their trained net was about the level of Fan Hui even without any MCTS at all during play, just selecting the move based on a single forwarding of the net. If that right, that means you could probably get gameplay at the level of the European champion running on a graphing calculator.

