{"id":477698,"date":"2023-08-09T09:19:05","date_gmt":"2023-08-09T09:19:05","guid":{"rendered":""},"modified":"2023-09-05T11:15:15","modified_gmt":"2023-09-05T11:15:15","slug":"inverse-reinforcement-learning","status":"publish","type":"wiki","link":"https:\/\/oneproxy.pro\/jp\/wiki\/inverse-reinforcement-learning\/","title":{"rendered":"\u9006\u5f37\u5316\u5b66\u7fd2"},"content":{"rendered":"<p>\u9006\u5f37\u5316\u5b66\u7fd2 (IRL) \u306f\u3001\u6a5f\u68b0\u5b66\u7fd2\u3068\u4eba\u5de5\u77e5\u80fd\u306e\u30b5\u30d6\u30d5\u30a3\u30fc\u30eb\u30c9\u3067\u3042\u308a\u3001\u7279\u5b9a\u306e\u74b0\u5883\u3067\u306e\u30a8\u30fc\u30b8\u30a7\u30f3\u30c8\u306e\u884c\u52d5\u3092\u89b3\u5bdf\u3059\u308b\u3053\u3068\u3067\u3001\u30a8\u30fc\u30b8\u30a7\u30f3\u30c8\u306e\u6839\u672c\u7684\u306a\u5831\u916c\u3084\u76ee\u7684\u3092\u7406\u89e3\u3059\u308b\u3053\u3068\u306b\u91cd\u70b9\u3092\u7f6e\u3044\u3066\u3044\u307e\u3059\u3002\u5f93\u6765\u306e\u5f37\u5316\u5b66\u7fd2\u3067\u306f\u3001\u30a8\u30fc\u30b8\u30a7\u30f3\u30c8\u306f\u5b9a\u7fa9\u6e08\u307f\u306e\u5831\u916c\u95a2\u6570\u306b\u57fa\u3065\u3044\u3066\u5831\u916c\u3092\u6700\u5927\u5316\u3059\u308b\u3088\u3046\u306b\u5b66\u7fd2\u3057\u307e\u3059\u3002\u5bfe\u7167\u7684\u306b\u3001IRL \u306f\u89b3\u5bdf\u3055\u308c\u305f\u884c\u52d5\u304b\u3089\u5831\u916c\u95a2\u6570\u3092\u63a8\u6e2c\u3059\u308b\u3053\u3068\u3092\u76ee\u6307\u3057\u3066\u304a\u308a\u3001\u4eba\u9593\u307e\u305f\u306f\u5c02\u9580\u5bb6\u306e\u610f\u601d\u6c7a\u5b9a\u30d7\u30ed\u30bb\u30b9\u3092\u7406\u89e3\u3059\u308b\u305f\u3081\u306e\u8cb4\u91cd\u306a\u30c4\u30fc\u30eb\u3092\u63d0\u4f9b\u3057\u307e\u3059\u3002<\/p>\n<h2>\u9006\u5f37\u5316\u5b66\u7fd2\u306e\u8d77\u6e90\u3068\u305d\u306e\u6700\u521d\u306e\u8a00\u53ca\u306e\u6b74\u53f2<\/h2>\n<p>\u9006\u5f37\u5316\u5b66\u7fd2\u306e\u6982\u5ff5\u306f\u3001\u30a2\u30f3\u30c9\u30ea\u30e5\u30fc\u30fb\u30f3\u6c0f\u3068\u30b9\u30c1\u30e5\u30a2\u30fc\u30c8\u30fb\u30e9\u30c3\u30bb\u30eb\u6c0f\u304c 2000 \u5e74\u306b\u767a\u8868\u3057\u305f\u300c\u9006\u5f37\u5316\u5b66\u7fd2\u306e\u30a2\u30eb\u30b4\u30ea\u30ba\u30e0\u300d\u3068\u3044\u3046\u8ad6\u6587\u3067\u521d\u3081\u3066\u7d39\u4ecb\u3055\u308c\u307e\u3057\u305f\u3002\u3053\u306e\u753b\u671f\u7684\u306a\u8ad6\u6587\u306f\u3001IRL \u306e\u7814\u7a76\u3068\u3055\u307e\u3056\u307e\u306a\u5206\u91ce\u3078\u306e\u5fdc\u7528\u306e\u57fa\u790e\u3092\u7bc9\u304d\u307e\u3057\u305f\u3002\u305d\u308c\u4ee5\u6765\u3001\u7814\u7a76\u8005\u3084\u5b9f\u8df5\u8005\u306f IRL \u30a2\u30eb\u30b4\u30ea\u30ba\u30e0\u306e\u7406\u89e3\u3068\u6539\u826f\u306b\u304a\u3044\u3066\u5927\u304d\u306a\u9032\u6b69\u3092\u9042\u3052\u3001\u73fe\u4ee3\u306e\u4eba\u5de5\u77e5\u80fd\u7814\u7a76\u306b\u6b20\u304b\u305b\u306a\u3044\u6280\u8853\u3068\u306a\u3063\u3066\u3044\u307e\u3059\u3002<\/p>\n<h2>\u9006\u5f37\u5316\u5b66\u7fd2\u306b\u95a2\u3059\u308b\u8a73\u7d30\u60c5\u5831\u3002\u9006\u5f37\u5316\u5b66\u7fd2\u306e\u30c8\u30d4\u30c3\u30af\u3092\u62e1\u5f35\u3057\u307e\u3059\u3002<\/h2>\n<p>\u9006\u5f37\u5316\u5b66\u7fd2\u306f\u3001\u300c\u30a8\u30fc\u30b8\u30a7\u30f3\u30c8\u306f\u7279\u5b9a\u306e\u74b0\u5883\u3067\u610f\u601d\u6c7a\u5b9a\u3092\u884c\u3046\u969b\u306b\u3001\u3069\u306e\u3088\u3046\u306a\u5831\u916c\u3084\u76ee\u6a19\u3092\u6700\u9069\u5316\u3057\u3066\u3044\u308b\u306e\u304b\u300d\u3068\u3044\u3046\u6839\u672c\u7684\u306a\u7591\u554f\u306b\u7b54\u3048\u3088\u3046\u3068\u3057\u307e\u3059\u3002\u3053\u306e\u7591\u554f\u306f\u975e\u5e38\u306b\u91cd\u8981\u3067\u3059\u3002\u306a\u305c\u306a\u3089\u3001\u6839\u672c\u7684\u306a\u5831\u916c\u3092\u7406\u89e3\u3059\u308b\u3053\u3068\u3067\u3001\u610f\u601d\u6c7a\u5b9a\u30d7\u30ed\u30bb\u30b9\u3092\u6539\u5584\u3057\u3001\u3088\u308a\u5805\u7262\u306a AI \u30b7\u30b9\u30c6\u30e0\u3092\u4f5c\u6210\u3057\u3001\u3055\u3089\u306b\u306f\u4eba\u9593\u306e\u884c\u52d5\u3092\u6b63\u78ba\u306b\u30e2\u30c7\u30eb\u5316\u3067\u304d\u308b\u3088\u3046\u306b\u306a\u308b\u304b\u3089\u3067\u3059\u3002<\/p>\n<p>IRL \u306b\u542b\u307e\u308c\u308b\u4e3b\u306a\u624b\u9806\u306f\u6b21\u306e\u3068\u304a\u308a\u3067\u3059\u3002<\/p>\n<ol>\n<li>\n<p><strong>\u89b3\u5bdf<\/strong>IRL \u306e\u6700\u521d\u306e\u30b9\u30c6\u30c3\u30d7\u306f\u3001\u7279\u5b9a\u306e\u74b0\u5883\u3067\u30a8\u30fc\u30b8\u30a7\u30f3\u30c8\u306e\u52d5\u4f5c\u3092\u89b3\u5bdf\u3059\u308b\u3053\u3068\u3067\u3059\u3002\u3053\u306e\u89b3\u5bdf\u306f\u3001\u5c02\u9580\u5bb6\u306e\u30c7\u30e2\u30f3\u30b9\u30c8\u30ec\u30fc\u30b7\u30e7\u30f3\u307e\u305f\u306f\u8a18\u9332\u3055\u308c\u305f\u30c7\u30fc\u30bf\u306e\u5f62\u5f0f\u3067\u884c\u3046\u3053\u3068\u304c\u3067\u304d\u307e\u3059\u3002<\/p>\n<\/li>\n<li>\n<p><strong>\u5831\u916c\u6a5f\u80fd\u306e\u56de\u5fa9<\/strong>: \u89b3\u5bdf\u3055\u308c\u305f\u52d5\u4f5c\u3092\u4f7f\u7528\u3057\u3066\u3001IRL \u30a2\u30eb\u30b4\u30ea\u30ba\u30e0\u306f\u30a8\u30fc\u30b8\u30a7\u30f3\u30c8\u306e\u30a2\u30af\u30b7\u30e7\u30f3\u3092\u6700\u3082\u3088\u304f\u8aac\u660e\u3059\u308b\u5831\u916c\u95a2\u6570\u3092\u56de\u5fa9\u3057\u3088\u3046\u3068\u3057\u307e\u3059\u3002\u63a8\u5b9a\u3055\u308c\u305f\u5831\u916c\u95a2\u6570\u306f\u3001\u89b3\u5bdf\u3055\u308c\u305f\u52d5\u4f5c\u3068\u4e00\u81f4\u3057\u3066\u3044\u308b\u5fc5\u8981\u304c\u3042\u308a\u307e\u3059\u3002<\/p>\n<\/li>\n<li>\n<p><strong>\u30dd\u30ea\u30b7\u30fc\u306e\u6700\u9069\u5316<\/strong>: \u5831\u916c\u95a2\u6570\u304c\u63a8\u6e2c\u3055\u308c\u308b\u3068\u3001\u5f93\u6765\u306e\u5f37\u5316\u5b66\u7fd2\u6280\u8853\u3092\u901a\u3058\u3066\u30a8\u30fc\u30b8\u30a7\u30f3\u30c8\u306e\u30dd\u30ea\u30b7\u30fc\u3092\u6700\u9069\u5316\u3059\u308b\u305f\u3081\u306b\u4f7f\u7528\u3067\u304d\u308b\u3088\u3046\u306b\u306a\u308a\u307e\u3059\u3002\u3053\u308c\u306b\u3088\u308a\u3001\u30a8\u30fc\u30b8\u30a7\u30f3\u30c8\u306e\u610f\u601d\u6c7a\u5b9a\u30d7\u30ed\u30bb\u30b9\u304c\u6539\u5584\u3055\u308c\u307e\u3059\u3002<\/p>\n<\/li>\n<li>\n<p><strong>\u30a2\u30d7\u30ea\u30b1\u30fc\u30b7\u30e7\u30f3<\/strong>IRL \u306f\u3001\u30ed\u30dc\u30c3\u30c8\u5de5\u5b66\u3001\u81ea\u5f8b\u8d70\u884c\u8eca\u3001\u63a8\u5968\u30b7\u30b9\u30c6\u30e0\u3001\u4eba\u9593\u3068\u30ed\u30dc\u30c3\u30c8\u306e\u76f8\u4e92\u4f5c\u7528\u306a\u3069\u3001\u3055\u307e\u3056\u307e\u306a\u5206\u91ce\u3067\u5fdc\u7528\u3055\u308c\u3066\u3044\u307e\u3059\u3002\u3053\u308c\u306b\u3088\u308a\u3001\u5c02\u9580\u5bb6\u306e\u884c\u52d5\u3092\u30e2\u30c7\u30eb\u5316\u3057\u3066\u7406\u89e3\u3057\u3001\u305d\u306e\u77e5\u8b58\u3092\u4f7f\u7528\u3057\u3066\u4ed6\u306e\u30a8\u30fc\u30b8\u30a7\u30f3\u30c8\u3092\u3088\u308a\u52b9\u679c\u7684\u306b\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u3059\u308b\u3053\u3068\u304c\u3067\u304d\u307e\u3059\u3002<\/p>\n<\/li>\n<\/ol>\n<h2>\u9006\u5f37\u5316\u5b66\u7fd2\u306e\u5185\u90e8\u69cb\u9020\u3002\u9006\u5f37\u5316\u5b66\u7fd2\u306e\u4ed5\u7d44\u307f\u3002<\/h2>\n<p>\u9006\u5f37\u5316\u5b66\u7fd2\u306b\u306f\u901a\u5e38\u3001\u6b21\u306e\u30b3\u30f3\u30dd\u30fc\u30cd\u30f3\u30c8\u304c\u542b\u307e\u308c\u307e\u3059\u3002<\/p>\n<ol>\n<li>\n<p><strong>\u74b0\u5883<\/strong>: \u74b0\u5883\u3068\u306f\u3001\u30a8\u30fc\u30b8\u30a7\u30f3\u30c8\u304c\u52d5\u4f5c\u3059\u308b\u30b3\u30f3\u30c6\u30ad\u30b9\u30c8\u307e\u305f\u306f\u8a2d\u5b9a\u3067\u3059\u3002\u30a8\u30fc\u30b8\u30a7\u30f3\u30c8\u306e\u884c\u52d5\u306b\u57fa\u3065\u3044\u3066\u3001\u72b6\u614b\u3001\u30a2\u30af\u30b7\u30e7\u30f3\u3001\u5831\u916c\u304c\u63d0\u4f9b\u3055\u308c\u307e\u3059\u3002<\/p>\n<\/li>\n<li>\n<p><strong>\u30a8\u30fc\u30b8\u30a7\u30f3\u30c8<\/strong>: \u30a8\u30fc\u30b8\u30a7\u30f3\u30c8\u3068\u306f\u3001\u305d\u306e\u52d5\u4f5c\u3092\u7406\u89e3\u3057\u305f\u308a\u6539\u5584\u3057\u305f\u308a\u3057\u305f\u3044\u30a8\u30f3\u30c6\u30a3\u30c6\u30a3\u3067\u3059\u3002\u30a8\u30fc\u30b8\u30a7\u30f3\u30c8\u306f\u3001\u7279\u5b9a\u306e\u76ee\u6a19\u3092\u9054\u6210\u3059\u308b\u305f\u3081\u306b\u74b0\u5883\u5185\u3067\u30a2\u30af\u30b7\u30e7\u30f3\u3092\u5b9f\u884c\u3057\u307e\u3059\u3002<\/p>\n<\/li>\n<li>\n<p><strong>\u5c02\u9580\u5bb6\u306b\u3088\u308b\u30c7\u30e2\u30f3\u30b9\u30c8\u30ec\u30fc\u30b7\u30e7\u30f3<\/strong>: \u3053\u308c\u3089\u306f\u3001\u7279\u5b9a\u306e\u74b0\u5883\u306b\u304a\u3051\u308b\u30a8\u30ad\u30b9\u30d1\u30fc\u30c8\u306e\u884c\u52d5\u306e\u30c7\u30e2\u30f3\u30b9\u30c8\u30ec\u30fc\u30b7\u30e7\u30f3\u3067\u3059\u3002IRL \u30a2\u30eb\u30b4\u30ea\u30ba\u30e0\u306f\u3001\u3053\u308c\u3089\u306e\u30c7\u30e2\u30f3\u30b9\u30c8\u30ec\u30fc\u30b7\u30e7\u30f3\u3092\u4f7f\u7528\u3057\u3066\u3001\u57fa\u790e\u3068\u306a\u308b\u5831\u916c\u95a2\u6570\u3092\u63a8\u6e2c\u3057\u307e\u3059\u3002<\/p>\n<\/li>\n<li>\n<p><strong>\u5831\u916c\u95a2\u6570<\/strong>: \u5831\u916c\u95a2\u6570\u306f\u3001\u74b0\u5883\u5185\u306e\u72b6\u614b\u3068\u30a2\u30af\u30b7\u30e7\u30f3\u3092\u6570\u5024\u306b\u30de\u30c3\u30d4\u30f3\u30b0\u3057\u3001\u305d\u308c\u3089\u306e\u72b6\u614b\u3068\u30a2\u30af\u30b7\u30e7\u30f3\u306e\u671b\u307e\u3057\u3055\u3092\u8868\u3057\u307e\u3059\u3002\u3053\u308c\u306f\u5f37\u5316\u5b66\u7fd2\u306e\u91cd\u8981\u306a\u6982\u5ff5\u3067\u3042\u308a\u3001IRL \u3067\u306f\u63a8\u8ad6\u3059\u308b\u5fc5\u8981\u304c\u3042\u308a\u307e\u3059\u3002<\/p>\n<\/li>\n<li>\n<p><strong>\u9006\u5f37\u5316\u5b66\u7fd2\u30a2\u30eb\u30b4\u30ea\u30ba\u30e0<\/strong>\u3053\u308c\u3089\u306e\u30a2\u30eb\u30b4\u30ea\u30ba\u30e0\u306f\u3001\u5c02\u9580\u5bb6\u306e\u30c7\u30e2\u30f3\u30b9\u30c8\u30ec\u30fc\u30b7\u30e7\u30f3\u3068\u74b0\u5883\u3092\u5165\u529b\u3068\u3057\u3066\u53d7\u3051\u53d6\u308a\u3001\u5831\u916c\u95a2\u6570\u3092\u56de\u5fa9\u3057\u3088\u3046\u3068\u3057\u307e\u3059\u3002\u6700\u5927\u30a8\u30f3\u30c8\u30ed\u30d4\u30fc IRL \u3084\u30d9\u30a4\u30b8\u30a2\u30f3 IRL \u306a\u3069\u3001\u3055\u307e\u3056\u307e\u306a\u30a2\u30d7\u30ed\u30fc\u30c1\u304c\u9577\u5e74\u306b\u308f\u305f\u3063\u3066\u63d0\u6848\u3055\u308c\u3066\u304d\u307e\u3057\u305f\u3002<\/p>\n<\/li>\n<li>\n<p><strong>\u30dd\u30ea\u30b7\u30fc\u306e\u6700\u9069\u5316<\/strong>\u5831\u916c\u95a2\u6570\u3092\u56de\u5fa9\u3057\u305f\u5f8c\u3001Q \u5b66\u7fd2\u3084\u30dd\u30ea\u30b7\u30fc\u52fe\u914d\u306a\u3069\u306e\u5f37\u5316\u5b66\u7fd2\u6280\u8853\u3092\u901a\u3058\u3066\u30a8\u30fc\u30b8\u30a7\u30f3\u30c8\u306e\u30dd\u30ea\u30b7\u30fc\u3092\u6700\u9069\u5316\u3059\u308b\u305f\u3081\u306b\u4f7f\u7528\u3067\u304d\u307e\u3059\u3002<\/p>\n<\/li>\n<\/ol>\n<h2>\u9006\u5f37\u5316\u5b66\u7fd2\u306e\u4e3b\u306a\u7279\u5fb4\u306e\u5206\u6790\u3002<\/h2>\n<p>\u9006\u5f37\u5316\u5b66\u7fd2\u306f\u3001\u5f93\u6765\u306e\u5f37\u5316\u5b66\u7fd2\u306b\u6bd4\u3079\u3066\u3044\u304f\u3064\u304b\u306e\u91cd\u8981\u306a\u6a5f\u80fd\u3068\u5229\u70b9\u3092\u63d0\u4f9b\u3057\u307e\u3059\u3002<\/p>\n<ol>\n<li>\n<p><strong>\u4eba\u9593\u306e\u3088\u3046\u306a\u610f\u601d\u6c7a\u5b9a<\/strong>: \u4eba\u9593\u306e\u5c02\u9580\u5bb6\u306e\u30c7\u30e2\u30f3\u30b9\u30c8\u30ec\u30fc\u30b7\u30e7\u30f3\u304b\u3089\u5831\u916c\u95a2\u6570\u3092\u63a8\u6e2c\u3059\u308b\u3053\u3068\u306b\u3088\u308a\u3001IRL \u306f\u30a8\u30fc\u30b8\u30a7\u30f3\u30c8\u304c\u4eba\u9593\u306e\u597d\u307f\u3084\u884c\u52d5\u306b\u3088\u308a\u8fd1\u3044\u6c7a\u5b9a\u3092\u4e0b\u3059\u3053\u3068\u3092\u53ef\u80fd\u306b\u3057\u307e\u3059\u3002<\/p>\n<\/li>\n<li>\n<p><strong>\u89b3\u6e2c\u4e0d\u53ef\u80fd\u306a\u5831\u916c\u306e\u30e2\u30c7\u30eb\u5316<\/strong>\u591a\u304f\u306e\u73fe\u5b9f\u4e16\u754c\u306e\u30b7\u30ca\u30ea\u30aa\u3067\u306f\u3001\u5831\u916c\u95a2\u6570\u304c\u660e\u793a\u7684\u306b\u63d0\u4f9b\u3055\u308c\u3066\u3044\u306a\u3044\u305f\u3081\u3001\u5f93\u6765\u306e\u5f37\u5316\u5b66\u7fd2\u306f\u56f0\u96e3\u3067\u3059\u3002IRL \u306f\u3001\u660e\u793a\u7684\u306a\u76e3\u7763\u306a\u3057\u306b\u57fa\u790e\u3068\u306a\u308b\u5831\u916c\u3092\u660e\u3089\u304b\u306b\u3059\u308b\u3053\u3068\u304c\u3067\u304d\u307e\u3059\u3002<\/p>\n<\/li>\n<li>\n<p><strong>\u900f\u660e\u6027\u3068\u89e3\u91c8\u53ef\u80fd\u6027<\/strong>IRL \u306f\u89e3\u91c8\u53ef\u80fd\u306a\u5831\u916c\u95a2\u6570\u3092\u63d0\u4f9b\u3057\u3001\u30a8\u30fc\u30b8\u30a7\u30f3\u30c8\u306e\u610f\u601d\u6c7a\u5b9a\u30d7\u30ed\u30bb\u30b9\u3092\u3088\u308a\u6df1\u304f\u7406\u89e3\u3067\u304d\u308b\u3088\u3046\u306b\u3057\u307e\u3059\u3002<\/p>\n<\/li>\n<li>\n<p><strong>\u30b5\u30f3\u30d7\u30eb\u52b9\u7387<\/strong>IRL \u306f\u3001\u5f37\u5316\u5b66\u7fd2\u306b\u5fc5\u8981\u306a\u81a8\u5927\u306a\u30c7\u30fc\u30bf\u3068\u6bd4\u8f03\u3057\u3066\u3001\u3088\u308a\u5c11\u6570\u306e\u5c02\u9580\u5bb6\u306e\u30c7\u30e2\u30f3\u30b9\u30c8\u30ec\u30fc\u30b7\u30e7\u30f3\u304b\u3089\u5b66\u7fd2\u3067\u304d\u308b\u3053\u3068\u304c\u591a\u3044\u3067\u3059\u3002<\/p>\n<\/li>\n<li>\n<p><strong>\u8ee2\u79fb\u5b66\u7fd2<\/strong>\u3042\u308b\u74b0\u5883\u304b\u3089\u63a8\u5b9a\u3055\u308c\u305f\u5831\u916c\u95a2\u6570\u306f\u3001\u985e\u4f3c\u3057\u3066\u3044\u308b\u304c\u308f\u305a\u304b\u306b\u7570\u306a\u308b\u74b0\u5883\u306b\u8ee2\u9001\u3067\u304d\u308b\u305f\u3081\u3001\u6700\u521d\u304b\u3089\u518d\u5b66\u7fd2\u3059\u308b\u5fc5\u8981\u6027\u304c\u8efd\u6e1b\u3055\u308c\u307e\u3059\u3002<\/p>\n<\/li>\n<li>\n<p><strong>\u307e\u3070\u3089\u306a\u5831\u916c\u306e\u53d6\u308a\u6271\u3044<\/strong>IRL \u306f\u3001\u5f93\u6765\u306e\u5f37\u5316\u5b66\u7fd2\u3067\u306f\u30d5\u30a3\u30fc\u30c9\u30d0\u30c3\u30af\u306e\u4e0d\u8db3\u306b\u3088\u308a\u5b66\u7fd2\u304c\u56f0\u96e3\u306a\u3001\u30b9\u30d1\u30fc\u30b9\u5831\u916c\u306e\u554f\u984c\u306b\u5bfe\u51e6\u3067\u304d\u307e\u3059\u3002<\/p>\n<\/li>\n<\/ol>\n<h2>\u9006\u5f37\u5316\u5b66\u7fd2\u306e\u7a2e\u985e<\/h2>\n<table>\n<thead>\n<tr>\n<th>\u30bf\u30a4\u30d7<\/th>\n<th>\u8aac\u660e<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>\u6700\u5927\u30a8\u30f3\u30c8\u30ed\u30d4\u30fc IRL<\/td>\n<td>\u63a8\u5b9a\u3055\u308c\u305f\u5831\u916c\u3092\u4e0e\u3048\u3089\u308c\u305f\u30a8\u30fc\u30b8\u30a7\u30f3\u30c8\u306e\u30dd\u30ea\u30b7\u30fc\u306e\u30a8\u30f3\u30c8\u30ed\u30d4\u30fc\u3092\u6700\u5927\u5316\u3059\u308b IRL \u30a2\u30d7\u30ed\u30fc\u30c1\u3002<\/td>\n<\/tr>\n<tr>\n<td>\u30d9\u30a4\u30b8\u30a2\u30f3 IRL<\/td>\n<td>\u53ef\u80fd\u306a\u5831\u916c\u95a2\u6570\u306e\u5206\u5e03\u3092\u63a8\u6e2c\u3059\u308b\u305f\u3081\u306e\u78ba\u7387\u7684\u30d5\u30ec\u30fc\u30e0\u30ef\u30fc\u30af\u3092\u7d44\u307f\u8fbc\u307f\u307e\u3059\u3002<\/td>\n<\/tr>\n<tr>\n<td>\u6575\u5bfe\u7684\u306a\u73fe\u5b9f\u4e16\u754c<\/td>\n<td>\u5831\u916c\u95a2\u6570\u3092\u63a8\u6e2c\u3059\u308b\u305f\u3081\u306b\u3001\u8b58\u5225\u5668\u3068\u30b8\u30a7\u30cd\u30ec\u30fc\u30bf\u30fc\u3092\u4f7f\u7528\u3057\u305f\u30b2\u30fc\u30e0\u7406\u8ad6\u7684\u30a2\u30d7\u30ed\u30fc\u30c1\u3092\u4f7f\u7528\u3057\u307e\u3059\u3002<\/td>\n<\/tr>\n<tr>\n<td>\u898b\u7fd2\u3044\u5b66\u7fd2<\/td>\n<td>IRL \u3068\u5f37\u5316\u5b66\u7fd2\u3092\u7d44\u307f\u5408\u308f\u305b\u3066\u3001\u5c02\u9580\u5bb6\u306e\u30c7\u30e2\u30f3\u30b9\u30c8\u30ec\u30fc\u30b7\u30e7\u30f3\u304b\u3089\u5b66\u7fd2\u3057\u307e\u3059\u3002<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2>\u9006\u5f37\u5316\u5b66\u7fd2\u306e\u4f7f\u3044\u65b9\u3001\u4f7f\u7528\u4e0a\u306e\u554f\u984c\u70b9\u3068\u305d\u306e\u89e3\u6c7a\u7b56\u3002<\/h2>\n<p>\u9006\u5f37\u5316\u5b66\u7fd2\u306b\u306f\u3055\u307e\u3056\u307e\u306a\u7528\u9014\u304c\u3042\u308a\u3001\u7279\u5b9a\u306e\u8ab2\u984c\u306b\u5bfe\u51e6\u3067\u304d\u307e\u3059\u3002<\/p>\n<ol>\n<li>\n<p><strong>\u30ed\u30dc\u30c3\u30c8\u5de5\u5b66<\/strong>\u30ed\u30dc\u30c3\u30c8\u5de5\u5b66\u306b\u304a\u3044\u3066\u3001IRL \u306f\u5c02\u9580\u5bb6\u306e\u884c\u52d5\u3092\u7406\u89e3\u3057\u3001\u3088\u308a\u52b9\u7387\u7684\u3067\u4eba\u9593\u306b\u512a\u3057\u3044\u30ed\u30dc\u30c3\u30c8\u3092\u8a2d\u8a08\u3059\u308b\u306e\u306b\u5f79\u7acb\u3061\u307e\u3059\u3002<\/p>\n<\/li>\n<li>\n<p><strong>\u81ea\u52d5\u904b\u8ee2\u8eca<\/strong>IRL \u306f\u4eba\u9593\u306e\u30c9\u30e9\u30a4\u30d0\u30fc\u306e\u884c\u52d5\u3092\u63a8\u6e2c\u3059\u308b\u306e\u306b\u5f79\u7acb\u3061\u3001\u6df7\u5408\u4ea4\u901a\u30b7\u30ca\u30ea\u30aa\u3067\u81ea\u52d5\u904b\u8ee2\u8eca\u304c\u5b89\u5168\u304b\u3064\u4e88\u6e2c\u3069\u304a\u308a\u306b\u30ca\u30d3\u30b2\u30fc\u30c8\u3067\u304d\u308b\u3088\u3046\u306b\u3057\u307e\u3059\u3002<\/p>\n<\/li>\n<li>\n<p><strong>\u30ec\u30b3\u30e1\u30f3\u30c7\u30fc\u30b7\u30e7\u30f3\u30b7\u30b9\u30c6\u30e0<\/strong>IRL \u306f\u3001\u63a8\u5968\u30b7\u30b9\u30c6\u30e0\u3067\u30e6\u30fc\u30b6\u30fc\u306e\u597d\u307f\u3092\u30e2\u30c7\u30eb\u5316\u3059\u308b\u305f\u3081\u306b\u4f7f\u7528\u3067\u304d\u3001\u3088\u308a\u6b63\u78ba\u3067\u30d1\u30fc\u30bd\u30ca\u30e9\u30a4\u30ba\u3055\u308c\u305f\u63a8\u5968\u3092\u63d0\u4f9b\u3057\u307e\u3059\u3002<\/p>\n<\/li>\n<li>\n<p><strong>\u4eba\u9593\u3068\u30ed\u30dc\u30c3\u30c8\u306e\u30a4\u30f3\u30bf\u30e9\u30af\u30b7\u30e7\u30f3<\/strong>IRL \u3092\u4f7f\u7528\u3059\u308b\u3068\u3001\u30ed\u30dc\u30c3\u30c8\u304c\u4eba\u9593\u306e\u597d\u307f\u3092\u7406\u89e3\u3057\u3066\u9069\u5fdc\u3067\u304d\u308b\u3088\u3046\u306b\u306a\u308a\u3001\u4eba\u9593\u3068\u30ed\u30dc\u30c3\u30c8\u306e\u3084\u308a\u53d6\u308a\u304c\u3088\u308a\u76f4\u611f\u7684\u306b\u306a\u308a\u307e\u3059\u3002<\/p>\n<\/li>\n<li>\n<p><strong>\u8ab2\u984c<\/strong>: IRL \u3067\u306f\u3001\u7279\u306b\u5c02\u9580\u5bb6\u306e\u30c7\u30e2\u30f3\u30b9\u30c8\u30ec\u30fc\u30b7\u30e7\u30f3\u304c\u9650\u3089\u308c\u3066\u3044\u305f\u308a\u30ce\u30a4\u30ba\u304c\u591a\u304b\u3063\u305f\u308a\u3059\u308b\u5834\u5408\u3001\u5831\u916c\u95a2\u6570\u3092\u6b63\u78ba\u306b\u56de\u5fa9\u3059\u308b\u3053\u3068\u304c\u56f0\u96e3\u306b\u306a\u308b\u53ef\u80fd\u6027\u304c\u3042\u308a\u307e\u3059\u3002<\/p>\n<\/li>\n<li>\n<p><strong>\u30bd\u30ea\u30e5\u30fc\u30b7\u30e7\u30f3<\/strong>: \u30c9\u30e1\u30a4\u30f3\u77e5\u8b58\u3092\u7d44\u307f\u8fbc\u307f\u3001\u78ba\u7387\u7684\u30d5\u30ec\u30fc\u30e0\u30ef\u30fc\u30af\u3092\u4f7f\u7528\u3057\u3001IRL \u3068\u5f37\u5316\u5b66\u7fd2\u3092\u7d44\u307f\u5408\u308f\u305b\u308b\u3053\u3068\u3067\u3001\u3053\u308c\u3089\u306e\u8ab2\u984c\u306b\u5bfe\u51e6\u3067\u304d\u307e\u3059\u3002<\/p>\n<\/li>\n<\/ol>\n<h2>\u4e3b\u306a\u7279\u5fb4\u3084\u305d\u306e\u4ed6\u306e\u985e\u4f3c\u7528\u8a9e\u3068\u306e\u6bd4\u8f03\u3092\u8868\u3084\u30ea\u30b9\u30c8\u306e\u5f62\u5f0f\u3067\u793a\u3057\u307e\u3059\u3002<\/h2>\n<p>| \u9006\u5f37\u5316\u5b66\u7fd2 (IRL) \u3068\u5f37\u5316\u5b66\u7fd2 (RL) |<br \/>\n|\u2014\u2014\u2014\u2014\u2014\u2014 | \u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014-|<br \/>\n| \u30a2\u30a4\u30eb\u30e9\u30f3\u30c9 | RL |<br \/>\n| \u5831\u916c\u3092\u63a8\u6e2c\u3059\u308b | \u65e2\u77e5\u306e\u5831\u916c\u3092\u60f3\u5b9a\u3059\u308b |<br \/>\n| \u4eba\u9593\u306e\u3088\u3046\u306a\u884c\u52d5 | \u660e\u793a\u7684\u306a\u5831\u916c\u304b\u3089\u5b66\u7fd2\u3059\u308b |<br \/>\n| \u89e3\u91c8\u53ef\u80fd\u6027 | \u900f\u660e\u6027\u304c\u4f4e\u3044 |<br \/>\n| \u30b5\u30f3\u30d7\u30eb\u52b9\u7387\u304c\u9ad8\u3044 | \u5927\u91cf\u306e\u30c7\u30fc\u30bf\u3092\u5fc5\u8981\u3068\u3059\u308b |<br \/>\n| \u30b9\u30d1\u30fc\u30b9\u306a\u5831\u916c\u3092\u89e3\u6c7a\u3059\u308b | \u30b9\u30d1\u30fc\u30b9\u306a\u5831\u916c\u306b\u82e6\u52b4\u3059\u308b |<\/p>\n<h2>\u9006\u5f37\u5316\u5b66\u7fd2\u306b\u95a2\u3059\u308b\u5c06\u6765\u306e\u5c55\u671b\u3068\u6280\u8853\u3002<\/h2>\n<p>\u9006\u5f37\u5316\u5b66\u7fd2\u306e\u5c06\u6765\u306b\u306f\u6709\u671b\u306a\u767a\u5c55\u304c\u671f\u5f85\u3055\u308c\u3066\u3044\u307e\u3059\u3002<\/p>\n<ol>\n<li>\n<p><strong>\u9ad8\u5ea6\u306a\u30a2\u30eb\u30b4\u30ea\u30ba\u30e0<\/strong>\u7d99\u7d9a\u7684\u306a\u7814\u7a76\u306b\u3088\u308a\u3001\u3088\u308a\u52b9\u7387\u7684\u3067\u6b63\u78ba\u306a IRL \u30a2\u30eb\u30b4\u30ea\u30ba\u30e0\u304c\u958b\u767a\u3055\u308c\u3001\u3088\u308a\u5e83\u7bc4\u56f2\u306e\u554f\u984c\u306b\u9069\u7528\u3067\u304d\u308b\u3088\u3046\u306b\u306a\u308b\u53ef\u80fd\u6027\u304c\u3042\u308a\u307e\u3059\u3002<\/p>\n<\/li>\n<li>\n<p><strong>\u30c7\u30a3\u30fc\u30d7\u30e9\u30fc\u30cb\u30f3\u30b0\u3068\u306e\u7d71\u5408<\/strong>IRL \u3068\u30c7\u30a3\u30fc\u30d7\u30e9\u30fc\u30cb\u30f3\u30b0 \u30e2\u30c7\u30eb\u3092\u7d44\u307f\u5408\u308f\u305b\u308b\u3053\u3068\u3067\u3001\u3088\u308a\u5f37\u529b\u3067\u30c7\u30fc\u30bf\u52b9\u7387\u306e\u9ad8\u3044\u5b66\u7fd2\u30b7\u30b9\u30c6\u30e0\u3092\u5b9f\u73fe\u3067\u304d\u307e\u3059\u3002<\/p>\n<\/li>\n<li>\n<p><strong>\u73fe\u5b9f\u4e16\u754c\u306e\u30a2\u30d7\u30ea\u30b1\u30fc\u30b7\u30e7\u30f3<\/strong>IRL \u306f\u3001\u30d8\u30eb\u30b9\u30b1\u30a2\u3001\u91d1\u878d\u3001\u6559\u80b2\u306a\u3069\u306e\u73fe\u5b9f\u4e16\u754c\u306e\u30a2\u30d7\u30ea\u30b1\u30fc\u30b7\u30e7\u30f3\u306b\u5927\u304d\u306a\u5f71\u97ff\u3092\u4e0e\u3048\u308b\u3068\u671f\u5f85\u3055\u308c\u3066\u3044\u307e\u3059\u3002<\/p>\n<\/li>\n<li>\n<p><strong>\u502b\u7406\u7684\u306aAI<\/strong>IRL \u3092\u901a\u3058\u3066\u4eba\u9593\u306e\u597d\u307f\u3092\u7406\u89e3\u3059\u308b\u3053\u3068\u306f\u3001\u4eba\u9593\u306e\u4fa1\u5024\u89b3\u306b\u6cbf\u3063\u305f\u502b\u7406\u7684\u306a AI \u30b7\u30b9\u30c6\u30e0\u306e\u958b\u767a\u306b\u8ca2\u732e\u3067\u304d\u307e\u3059\u3002<\/p>\n<\/li>\n<\/ol>\n<h2>\u30d7\u30ed\u30ad\u30b7 \u30b5\u30fc\u30d0\u30fc\u3092\u9006\u5f37\u5316\u5b66\u7fd2\u3067\u4f7f\u7528\u3059\u308b\u65b9\u6cd5\u3001\u307e\u305f\u306f\u95a2\u9023\u4ed8\u3051\u308b\u65b9\u6cd5\u3002<\/h2>\n<p>\u9006\u5f37\u5316\u5b66\u7fd2\u306f\u3001\u30d7\u30ed\u30ad\u30b7 \u30b5\u30fc\u30d0\u30fc\u306e\u30b3\u30f3\u30c6\u30ad\u30b9\u30c8\u3067\u6d3b\u7528\u3057\u3066\u3001\u305d\u306e\u52d5\u4f5c\u3068\u610f\u601d\u6c7a\u5b9a\u30d7\u30ed\u30bb\u30b9\u3092\u6700\u9069\u5316\u3059\u308b\u3053\u3068\u304c\u3067\u304d\u307e\u3059\u3002\u30d7\u30ed\u30ad\u30b7 \u30b5\u30fc\u30d0\u30fc\u306f\u3001\u30af\u30e9\u30a4\u30a2\u30f3\u30c8\u3068\u30a4\u30f3\u30bf\u30fc\u30cd\u30c3\u30c8\u306e\u4ef2\u4ecb\u5f79\u3068\u3057\u3066\u6a5f\u80fd\u3057\u3001\u8981\u6c42\u3068\u5fdc\u7b54\u3092\u30eb\u30fc\u30c6\u30a3\u30f3\u30b0\u3057\u3066\u533f\u540d\u6027\u3092\u63d0\u4f9b\u3057\u307e\u3059\u3002\u5c02\u9580\u5bb6\u306e\u52d5\u4f5c\u3092\u89b3\u5bdf\u3059\u308b\u3053\u3068\u3067\u3001IRL \u30a2\u30eb\u30b4\u30ea\u30ba\u30e0\u3092\u4f7f\u7528\u3057\u3066\u3001\u30d7\u30ed\u30ad\u30b7 \u30b5\u30fc\u30d0\u30fc\u3092\u4f7f\u7528\u3059\u308b\u30af\u30e9\u30a4\u30a2\u30f3\u30c8\u306e\u597d\u307f\u3068\u76ee\u7684\u3092\u7406\u89e3\u3067\u304d\u307e\u3059\u3002\u3053\u306e\u60c5\u5831\u3092\u4f7f\u7528\u3057\u3066\u3001\u30d7\u30ed\u30ad\u30b7 \u30b5\u30fc\u30d0\u30fc\u306e\u30dd\u30ea\u30b7\u30fc\u3068\u610f\u601d\u6c7a\u5b9a\u3092\u6700\u9069\u5316\u3057\u3001\u3088\u308a\u52b9\u7387\u7684\u3067\u52b9\u679c\u7684\u306a\u30d7\u30ed\u30ad\u30b7\u64cd\u4f5c\u3092\u5b9f\u73fe\u3067\u304d\u307e\u3059\u3002\u3055\u3089\u306b\u3001IRL \u306f\u60aa\u610f\u306e\u3042\u308b\u30a2\u30af\u30c6\u30a3\u30d3\u30c6\u30a3\u3092\u8b58\u5225\u3057\u3066\u51e6\u7406\u3059\u308b\u306e\u306b\u5f79\u7acb\u3061\u3001\u30d7\u30ed\u30ad\u30b7 \u30e6\u30fc\u30b6\u30fc\u306e\u30bb\u30ad\u30e5\u30ea\u30c6\u30a3\u3068\u4fe1\u983c\u6027\u3092\u5411\u4e0a\u3055\u305b\u307e\u3059\u3002<\/p>\n<h2>\u95a2\u9023\u30ea\u30f3\u30af<\/h2>\n<p>\u9006\u5f37\u5316\u5b66\u7fd2\u306e\u8a73\u7d30\u306b\u3064\u3044\u3066\u306f\u3001\u6b21\u306e\u30ea\u30bd\u30fc\u30b9\u3092\u53c2\u7167\u3057\u3066\u304f\u3060\u3055\u3044\u3002<\/p>\n<ol>\n<li>\n<p>\u300c\u9006\u5f37\u5316\u5b66\u7fd2\u306e\u30a2\u30eb\u30b4\u30ea\u30ba\u30e0\u300dAndrew Ng \u3068 Stuart Russell (2000) \u8457\u3002<br \/>\n\u30ea\u30f3\u30af\uff1a <a href=\"https:\/\/ai.stanford.edu\/~ang\/papers\/icml00-irl.pdf\" target=\"_new\" rel=\"noopener nofollow\">https:\/\/ai.stanford.edu\/~ang\/papers\/icml00-irl.pdf<\/a><\/p>\n<\/li>\n<li>\n<p>\u300c\u9006\u5f37\u5316\u5b66\u7fd2\u300d \u2013 Pieter Abbeel \u3068 John Schulman \u306b\u3088\u308b\u6982\u8981\u8a18\u4e8b\u3002<br \/>\n\u30ea\u30f3\u30af\uff1a <a href=\"https:\/\/ai.stanford.edu\/~ang\/papers\/icml00-irl.pdf\" target=\"_new\" rel=\"noopener nofollow\">https:\/\/ai.stanford.edu\/~ang\/papers\/icml00-irl.pdf<\/a><\/p>\n<\/li>\n<li>\n<p>Jonathan Ho \u3068 Stefano Ermon \u306b\u3088\u308b\u300c\u4eba\u9593\u306e\u597d\u307f\u304b\u3089\u306e\u9006\u5f37\u5316\u5b66\u7fd2\u300d\u306b\u95a2\u3059\u308b OpenAI \u30d6\u30ed\u30b0\u6295\u7a3f\u3002<br \/>\n\u30ea\u30f3\u30af\uff1a <a href=\"https:\/\/openai.com\/blog\/learning-from-human-preferences\/\" target=\"_new\" rel=\"noopener nofollow\">https:\/\/openai.com\/blog\/learning-from-human-preferences\/<\/a><\/p>\n<\/li>\n<li>\n<p>\u300c\u9006\u5f37\u5316\u5b66\u7fd2\uff1a\u8abf\u67fb\u300d \u2013 IRL \u30a2\u30eb\u30b4\u30ea\u30ba\u30e0\u3068\u30a2\u30d7\u30ea\u30b1\u30fc\u30b7\u30e7\u30f3\u306e\u5305\u62ec\u7684\u306a\u8abf\u67fb\u3002<br \/>\n\u30ea\u30f3\u30af\uff1a <a href=\"https:\/\/arxiv.org\/abs\/1812.05852\" target=\"_new\" rel=\"noopener nofollow\">https:\/\/arxiv.org\/abs\/1812.05852<\/a><\/p>\n<\/li>\n<\/ol>","protected":false},"featured_media":468689,"menu_order":0,"template":"","meta":{"_acf_changed":false,"content-type":"","inline_featured_image":false,"footnotes":""},"class_list":["post-477698","wiki","type-wiki","status-publish","has-post-thumbnail","hentry"],"acf":{"faq_title":"Frequently Asked Questions about <mark>Inverse Reinforcement Learning: Unraveling the Hidden Rewards<\/mark>","faq_items":[{"question":"What is Inverse Reinforcement Learning (IRL)?","answer":"<p>Inverse Reinforcement Learning (IRL) is a branch of artificial intelligence that aims to understand an agent's underlying objectives by observing its behavior in a given environment. Unlike traditional reinforcement learning, where agents maximize predefined rewards, IRL infers the reward function from expert demonstrations, leading to more human-like decision-making.<\/p>"},{"question":"How did Inverse Reinforcement Learning originate?","answer":"<p>IRL was first introduced by Andrew Ng and Stuart Russell in their 2000 paper titled \"Algorithms for Inverse Reinforcement Learning.\" This seminal work laid the foundation for studying IRL and its applications in various domains.<\/p>"},{"question":"How does Inverse Reinforcement Learning work?","answer":"<p>The process of IRL involves observing an agent's behavior, recovering the reward function that best explains the behavior, and then optimizing the agent's policy based on the inferred rewards. IRL algorithms leverage expert demonstrations to uncover the underlying rewards, which can be used to improve decision-making processes.<\/p>"},{"question":"What are the key features of Inverse Reinforcement Learning?","answer":"<p>IRL offers several advantages, including a deeper understanding of human-like decision-making, transparency in reward functions, sample efficiency, and the ability to handle sparse rewards. It can also be used for transfer learning, where knowledge from one environment can be applied to a similar setting.<\/p>"},{"question":"What types of Inverse Reinforcement Learning exist?","answer":"<p>There are various types of IRL approaches, such as Maximum Entropy IRL, Bayesian IRL, Adversarial IRL, and Apprenticeship Learning. Each approach has its unique way of inferring the reward function from expert demonstrations.<\/p>"},{"question":"What are the applications of Inverse Reinforcement Learning?","answer":"<p>Inverse Reinforcement Learning finds applications in robotics, autonomous vehicles, recommendation systems, and human-robot interaction. It allows us to model and understand expert behavior, leading to better decision-making for AI systems.<\/p>"},{"question":"What are the challenges in using Inverse Reinforcement Learning?","answer":"<p>IRL may face challenges when recovering the reward function accurately, especially when expert demonstrations are limited or noisy. Addressing these challenges may require incorporating domain knowledge and using probabilistic frameworks.<\/p>"},{"question":"What does the future hold for Inverse Reinforcement Learning?","answer":"<p>The future of IRL is promising, with advancements in algorithms, integration with deep learning, and potential impacts on various real-world applications, including healthcare, finance, and education.<\/p>"},{"question":"How can Inverse Reinforcement Learning be associated with proxy servers?","answer":"<p>Inverse Reinforcement Learning can optimize the behavior and decision-making process of proxy servers by understanding user preferences and objectives. This understanding leads to better policies, improved security, and increased efficiency in the operation of proxy servers.<\/p>"}]},"_links":{"self":[{"href":"https:\/\/oneproxy.pro\/jp\/wp-json\/wp\/v2\/wiki\/477698","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/oneproxy.pro\/jp\/wp-json\/wp\/v2\/wiki"}],"about":[{"href":"https:\/\/oneproxy.pro\/jp\/wp-json\/wp\/v2\/types\/wiki"}],"version-history":[{"count":0,"href":"https:\/\/oneproxy.pro\/jp\/wp-json\/wp\/v2\/wiki\/477698\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/oneproxy.pro\/jp\/wp-json\/wp\/v2\/media\/468689"}],"wp:attachment":[{"href":"https:\/\/oneproxy.pro\/jp\/wp-json\/wp\/v2\/media?parent=477698"}],"curies":[{"name":"\u3046\u30fc\u3093","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}