Conference on Advances in Neural Information Procession Systemsで発表された強化学習に関する論文.
(最近のものから順次追加しており,完全なリストではありません.)
採択率
- NIPS 2009: ?
- NIPS 2008: 250/1022=24.5%
- NIPS 2007: 217/975=22.3%
ロボット †
- Policy Search for Motor Primitives in Robotics
Jens Kober, Jan Peters
NIPS 2008, pp. 849-856 (2009). - An Application of Reinforcement Learning to Aerobatic Helicopter Flight
Pieter Abbeel, Adam Coates, Andrew Ng, Morgan Quigley
NIPS 2006, pp. 1-8 (2007).
交通制御 †
- Natural Actor-Critic for Road Traffic Optimisation
Silvia Richter, Douglas Aberdeen, Jin Yu
NIPS 2006, pp. 1169-1176 (2007).
電力制御 †
- Managing Power Consumption and Performance of Computing Systems Using Reinforcement Learning
Gerry Tesauro, Rajarshi Das, Hoi Chan, Jeffrey Kephart, David Levine, Freeman Rawson, Charles Lefurgy
NIPS 2007, pp. 1497-1504 (2008).
見倣い学習 †
- Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion
J. Zico Kolter, Pieter Abbeel, Andrew Ng
NIPS 2007, pp. 769-776 (2008). - A Game-Theoretic Approach to Apprenticeship Learning
Umar Syed, Robert Schapire
NIPS 2007, pp. 1449-1456 (2008).
メタ学習 †
- Stress, noradrenaline, and realistic prediction of mouse behaviour using reinforcement learning
Gediminas Lukšys, Carmen Sandi, Wulfram Gerstner
NIPS 2008, pp. 1001-1008 (2009). - Effects of Stress and Genotype on Meta-parameter Dynamics in Reinforcement Learning
Gediminas Lukšys, Jérémie Knüsel, Denis Sheynikhovich, Carmen Sandi, Wulfram Gerstner
NIPS 2006, pp. 937-944 (2007).
連続的行動空間 †
- Fitted Q-iteration by Advantage Weighted Regression
Gerhard Neumann, Jan Peters
NIPS 2008, pp. 1177-1184 (2009). - Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods
Alessandro Lazaric, Marcello Restelli, Andrea Bonarini
NIPS 2007, pp. 833-840 (2008). - Fitted Q-iteration in continuous action-space MDPs
András Antos, Remi Munos, Csaba Szepesvari
NIPS 2007, pp. 9-16 (2008).
探査と知識利用のジレンマ †
- Learning to Explore and Exploit in POMDPs
Chenghui Cai, Xuejun Liao, Lawrence Carin
NIPS 2009.
探査 †
- Multi-resolution Exploration in Continuous Spaces
Ali Nouri, Michael Littman
NIPS 2008, pp. 1209-1216 (2009).
学習分析 †
- Temporal Difference Updating without a Learning Rate
Marcus Hutter, Shane Legg
NIPS 2007, pp. 705-712 (2008).
勾配法 †
- Signal-to-Noise Ratio Analysis of Policy Gradient Algorithms
John Roberts, Russ Tedrake
NIPS 2008, pp. 1361-1368 (2009). - Bayesian Policy Gradient Algorithms
Mohammad Ghavamzadeh, Yaakov Engel
NIPS 2006, pp. 457-464 (2007).
TD学習 †
- A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation
Rich Sutton, Csaba Szepesvari, Hamid Maei
NIPS 2008, pp. 1609-1616 (2009).
アクター・クリティック †
- Temporal Difference Based Actor Critic Learning - Convergence and Neural Implementation
Dotan Di Castro, Dima Volkinshtein, Ron Meir
NIPS 2008, pp. 385-392 (2009). - Incremental Natural Actor-Critic Algorithms
Shalabh Bhatnagar, Rich Sutton, Mohammad Ghavamzadeh, Mark Lee
NIPS 2007, pp. 105-112 (2008).
モデル・ベースド †
- Manifold Embeddings for Model-Based Reinforcement Learning under Partial Observability
Keith Bush, Joelle Pineau
NIPS 2009. - Learning to Use Working Memory in Partially Observable Environments through Dopaminergic Reinforcement
Michael Todd, Yael Niv, Jonathan Cohen
NIPS 2008, pp. 1689-1696 (2009).
その他・未分類 †
- Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference
Michael Wick, Khashayar Rohanimanesh, Sameer Singh, Andrew McCallum
NIPS 2009. - Optimization on a Budget: A Reinforcement Learning Approach
Paul Ruvolo, Ian Fasel, javier movellan
NIPS 2008, pp. 1385-1392 (2009). - Near-optimal Regret Bounds for Reinforcement Learning
Peter Auer, Thomas Jaksch, Ronald Ortner
NIPS 2008, pp. 89-96 (2009). - Psychiatry: Insights into depression through normative decision-making models
Quentin Huys, joshua vogelstein, Peter Dayan
NIPS 2008, pp. 729-736 (2009). - Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning
Peter Auer, Ronald Ortner
NIPS 2006, pp. 49-56 (2007).