強化学習/神経情報処理システム国際会議 NIPS の変更点

追加された行はこの色です。
削除された行はこの色です。
強化学習/神経情報処理システム国際会議 NIPS へ行く。
強化学習/神経情報処理システム国際会議 NIPS の差分を削除
Conference on Advances in Neural Information Procession Systemsで発表された強化学習に関する論文．~
（最近のものから順次追加しており，完全なリストではありません．）

''採択率''
-NIPS 2009: ?
-NIPS 2008: 250/1022=24.5%
-NIPS 2007: 217/975=22.3%
-NIPS 2006: 204/833=24.5%



*ロボット [#cc3c22b6]

-[[''Policy Search for Motor Primitives in Robotics'':http://nips.cc/Conferences/2008/Program/event.php?ID=1056]]~
Jens Kober, Jan Peters~
NIPS 2008, pp. 849-856 (2009).
-[[''An Application of Reinforcement Learning to Aerobatic Helicopter Flight'':http://nips.cc/Conferences/2006/Program/event.php?ID=510]]~
Pieter Abbeel, Adam Coates, Andrew Ng, Morgan Quigley~
NIPS 2006, pp. 1-8 (2007).


*交通制御 [#v33be840]
-[[''Natural Actor-Critic for Road Traffic Optimisation'':http://nips.cc/Conferences/2006/Program/event.php?ID=436]]~
Silvia Richter, Douglas Aberdeen, Jin Yu~
NIPS 2006, pp. 1169-1176 (2007).



*電力制御 [#a7985bf3]

-[[''Managing Power Consumption and Performance of Computing Systems Using Reinforcement Learning'':http://nips.cc/Conferences/2007/Program/event.php?ID=746]]~
Gerry Tesauro, Rajarshi Das, Hoi Chan, Jeffrey Kephart, David Levine, Freeman Rawson, Charles Lefurgy~
NIPS 2007, pp. 1497-1504 (2008).


*見倣い学習 [#ec21e061]

-[[''Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion'':http://nips.cc/Conferences/2007/Program/event.php?ID=756]]~
J. Zico Kolter, Pieter Abbeel, Andrew Ng~
NIPS 2007, pp. 769-776 (2008).
-[[''A Game-Theoretic Approach to Apprenticeship Learning'':http://nips.cc/Conferences/2007/Program/event.php?ID=736]]~
Umar Syed, Robert Schapire~
NIPS 2007, pp. 1449-1456 (2008).



*メタ学習 [#dc360314]

-[[''Stress, noradrenaline, and realistic prediction of mouse behaviour using reinforcement learning'':http://nips.cc/Conferences/2008/Program/event.php?ID=1058]]~
Gediminas Lukšys, Carmen Sandi, Wulfram Gerstner~
NIPS 2008, pp. 1001-1008 (2009).
-''Effects of Stress and Genotype on Meta-parameter Dynamics in Reinforcement Learning''~
Gediminas Lukšys, Jérémie Knüsel, Denis Sheynikhovich, Carmen Sandi, Wulfram Gerstner~
NIPS 2006, pp. 937-944 (2007).



*連続的行動空間 [#m17e8758]

-[[''Fitted Q-iteration by Advantage Weighted Regression'':http://nips.cc/Conferences/2008/Program/event.php?ID=1142]]~
Gerhard Neumann, Jan Peters~
NIPS 2008, pp. 1177-1184 (2009).
-[[''Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods'':http://nips.cc/Conferences/2007/Program/event.php?ID=740]]~
Alessandro Lazaric, Marcello Restelli, Andrea Bonarini~
NIPS 2007, pp. 833-840 (2008).
-[[''Fitted Q-iteration in continuous action-space MDPs'':http://nips.cc/Conferences/2007/Program/event.php?ID=943]]~
András Antos, Remi Munos, Csaba Szepesvari~
NIPS 2007, pp. 9-16 (2008).


*探査と知識利用のジレンマ [#c75af9c5]

-[[''Learning to Explore and Exploit in POMDPs'':http://nips.cc/Conferences/2009/Program/event.php?ID=1789]]~
Chenghui Cai, Xuejun Liao, Lawrence Carin~
NIPS 2009.


*探査 [#o1739031]

-[[''Multi-resolution Exploration in Continuous Spaces'':http://nips.cc/Conferences/2008/Program/event.php?ID=1150]]~
Ali Nouri, Michael Littman~
NIPS 2008, pp. 1209-1216 (2009).



*学習分析 [#d02b9143]

-[[''Temporal Difference Updating without a Learning Rate'':http://nips.cc/Conferences/2007/Program/event.php?ID=902]]~
Marcus Hutter, Shane Legg~
NIPS 2007, pp. 705-712 (2008).


*勾配法 [#s665fbf9]

-[[''Signal-to-Noise Ratio Analysis of Policy Gradient Algorithms'':http://nips.cc/Conferences/2008/Program/event.php?ID=1146]]~
John Roberts, Russ Tedrake~
NIPS 2008, pp. 1361-1368 (2009).
-[[''Bayesian Policy Gradient Algorithms'':http://nips.cc/Conferences/2006/Program/event.php?ID=511]]~
Mohammad Ghavamzadeh, Yaakov Engel~
NIPS 2006, pp. 457-464 (2007).



*TD学習 [#mc6d8702]

-[[''A Convergent '''O'''('''n''') Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation'':http://nips.cc/Conferences/2008/Program/event.php?ID=1346]]~
Rich Sutton, Csaba Szepesvari, Hamid Maei~
NIPS 2008, pp. 1609-1616 (2009).



*アクター・クリティック [#h88d2bf4]

-[[''Temporal Difference Based Actor Critic Learning - Convergence and Neural Implementation'':http://nips.cc/Conferences/2008/Program/event.php?ID=1347]]~
Dotan Di Castro, Dima Volkinshtein, Ron Meir~
NIPS 2008, pp. 385-392 (2009).
-[[''Incremental Natural Actor-Critic Algorithms'':http://nips.cc/Conferences/2007/Program/event.php?ID=747]]~
Shalabh Bhatnagar, Rich Sutton, Mohammad Ghavamzadeh, Mark Lee~
NIPS 2007, pp. 105-112 (2008).



*モデル・ベースド [#s3f15c68]

-[[''Manifold Embeddings for Model-Based Reinforcement Learning under Partial Observability'':http://nips.cc/Conferences/2009/Program/event.php?ID=1580]]~
Keith Bush, Joelle Pineau~
NIPS 2009.
-[[''Learning to Use Working Memory in Partially Observable Environments through Dopaminergic Reinforcement'':http://nips.cc/Conferences/2008/Program/event.php?ID=1057]]~
Michael Todd, Yael Niv, Jonathan Cohen~
NIPS 2008, pp. 1689-1696 (2009).



*その他・未分類 [#a92fb145]

-[[''Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference'':http://nips.cc/Conferences/2009/Program/event.php?ID=1739]]~
Michael Wick, Khashayar Rohanimanesh, Sameer Singh, Andrew McCallum~
NIPS 2009.
-[[''Optimization on a Budget: A Reinforcement Learning Approach'':http://nips.cc/Conferences/2008/Program/event.php?ID=1123]]~
Paul Ruvolo, Ian Fasel, javier movellan~
NIPS 2008, pp. 1385-1392 (2009).
-[[''Near-optimal Regret Bounds for Reinforcement Learning'':http://nips.cc/Conferences/2008/Program/event.php?ID=1100]]~
Peter Auer, Thomas Jaksch, Ronald Ortner~
NIPS 2008, pp. 89-96 (2009).
-[[''Psychiatry: Insights into depression through normative decision-making models'':http://nips.cc/Conferences/2008/Program/event.php?ID=1406]]~
Quentin Huys, joshua vogelstein, Peter Dayan~
NIPS 2008, pp. 729-736 (2009).
-''Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning''~
Peter Auer, Ronald Ortner~
NIPS 2006, pp. 49-56 (2007).