{"id":6459,"date":"2018-01-30T00:59:39","date_gmt":"2018-01-30T05:59:39","guid":{"rendered":"http:\/\/www.decisionsciencenews.com\/?p=6459"},"modified":"2018-01-30T14:15:43","modified_gmt":"2018-01-30T19:15:43","slug":"chess-computer-learned-scratch-4-hours","status":"publish","type":"post","link":"https:\/\/www.decisionsciencenews.com\/?p=6459","title":{"rendered":"A chess computer learned from scratch and surpassed human knowledge in 4 hours"},"content":{"rendered":"<p>HOW MANY GAMES WAS THAT?<\/p>\n<p style=\"text-align: center;\"><a href=\"http:\/\/www.decisionsciencenews.com\/wp-content\/uploads\/2017\/12\/4043364183_a3f0de073b_z.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-6460\" src=\"http:\/\/www.decisionsciencenews.com\/wp-content\/uploads\/2017\/12\/4043364183_a3f0de073b_z.jpg\" alt=\"\" width=\"485\" height=\"387\" srcset=\"https:\/\/www.decisionsciencenews.com\/wp-content\/uploads\/2017\/12\/4043364183_a3f0de073b_z.jpg 485w, https:\/\/www.decisionsciencenews.com\/wp-content\/uploads\/2017\/12\/4043364183_a3f0de073b_z-300x239.jpg 300w\" sizes=\"auto, (max-width: 485px) 100vw, 485px\" \/><\/a><\/p>\n<p>AlphaZero is a reinforcement learning (RL) progam that can take a game like chess and given only the rules, can play games against itself and learn how to win.<\/p>\n<p>According to several articles, it learned from scratch and <a href=\"http:\/\/www.telegraph.co.uk\/science\/2017\/12\/06\/entire-human-chess-knowledge-learned-surpassed-deepminds-alphazero\/\">surpassed human knowledge of chess<\/a> in four hours. Specifically, it beat the leading chess computer in that time.<\/p>\n<p>A friend of ours asked if it trained with more or less experience, in terms of games played, than a young human grandmaster has.<\/p>\n<p>To look into this question, <a href=\"https:\/\/arxiv.org\/pdf\/1712.01815.pdf\">we read the paper<\/a>.<\/p>\n<p>About 30 people have become <a href=\"https:\/\/en.wikipedia.org\/wiki\/Chess_prodigy\">grandmasters before 15<\/a>. Let&#8217;s overestimate and say they played 10 years or 3650 days and 100 games a day, that&#8217;s 365k games. From what I can tell, AlphaZero played about 20 million games at the point it beat a top rated chess engine called <a href=\"https:\/\/en.wikipedia.org\/wiki\/Stockfish_(chess)\">Stockfish<\/a> (<a href=\"https:\/\/arxiv.org\/pdf\/1712.01815.pdf\">article<\/a>, Table S3, noting it beat Stockfish at around 4 hours).<\/p>\n<p>So it seems like AlphaZero needs more games to learn than a human grandmaster does. However, AlphaZero starts only with the rules and figures everything out from there. In contrast, people get coached and handed strategies which have been refined over millions of games. It makes sense that humans can learn from fewer games. Also <a href=\"https:\/\/en.wikipedia.org\/wiki\/Reinforcement_learning\">RL<\/a> systems explore patently ridiculous moves on the way to becoming good players and people can likely prune the space better. But on the other hand, the assumptions human bring to this pruning might be what causes us not to be as good at chess as AlphaZero.<\/p>\n<p>Note that some say the real story here is that it taught itself not the four hours number, because of the serious difference in hardware between AlphaZero and Stockfish. <a href=\"https:\/\/en.chessbase.com\/post\/alpha-zero-comparing-orang-utans-and-apples\">Viswanathan Anand says on chessbase<\/a>:<\/p>\n<blockquote><p>Obviously this four hour thing is not too relevant \u2014 though it&#8217;s a nice punchline \u2014 but it&#8217;s obviously very powerful hardware, so it&#8217;s equal to my laptop sitting for a couple of decades. I think the more relevant thing is that it figured everything out from scratch and that is scary and promising if you look at it&#8230;I would like to think that it should be a little bit harder. It feels annoying that you can work things out with just the rules of chess that quickly.<\/p><\/blockquote>\n<p><span style=\"font-size: xx-small;\">Photo credit:https:\/\/www.flickr.com\/photos\/mukumbura\/4043364183\/<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>HOW MANY GAMES WAS THAT? AlphaZero is a reinforcement learning (RL) progam that can take a game like chess and given only the rules, can play games against itself and learn how to win. According to several articles, it learned from scratch and surpassed human knowledge of chess in four hours. Specifically, it beat the [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","enabled":false}}},"categories":[16,13,2],"tags":[807,1660,1272,1659,685,1661,1663,406,907,311,1245,1662,1665,1664],"class_list":["post-6459","post","type-post","status-publish","format-standard","hentry","category-ideas","category-programs","category-research-news","tag-ai","tag-alphazero","tag-artificial","tag-chess","tag-computer","tag-deep","tag-deepmind","tag-games","tag-intelligence","tag-learning","tag-machine","tag-mind","tag-reinforcement","tag-rl"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p4LKj-1Gb","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.decisionsciencenews.com\/index.php?rest_route=\/wp\/v2\/posts\/6459","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.decisionsciencenews.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.decisionsciencenews.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.decisionsciencenews.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.decisionsciencenews.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=6459"}],"version-history":[{"count":9,"href":"https:\/\/www.decisionsciencenews.com\/index.php?rest_route=\/wp\/v2\/posts\/6459\/revisions"}],"predecessor-version":[{"id":6522,"href":"https:\/\/www.decisionsciencenews.com\/index.php?rest_route=\/wp\/v2\/posts\/6459\/revisions\/6522"}],"wp:attachment":[{"href":"https:\/\/www.decisionsciencenews.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=6459"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.decisionsciencenews.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=6459"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.decisionsciencenews.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=6459"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}