送交者: pgss 于 2016-03-18, 05:30:55:
Using a second AI technology called reinforcement learning, they set up countless matches in which (slightly) different versions of AlphaGo played each other.
....
Then the team took yet another step. They collected moves from these machine-versus-machine matches and fed them into a second neural network. This neural net trained the system to examine the potential results of each move, to look ahead into the future of the game.
So AlphaGo learns from human moves, and then it learns from moves made when it plays itself. It understands how humans play, but it can also look beyond how humans play to an entirely different level of the game. This is what happened with Move 37. As Silver told me, AlphaGo had calculated that there was a one-in-ten-thousand chance that a human would make that move. But when it drew on all the knowledge it had accumulated by playing itself so many times—and looked ahead in the future of the game—it decided to make the move anyway. And the move was genius.