Journal of Machine Learning Researchに掲載された強化学習に関する論文.
(順次追加しており,完全なリストではありません.)
逆強化学習 †
- Inverse Reinforcement Learning in Partially Observable Environments
Jaedeug Choi, Kee-Eung Kim
JMLR 12:691−730 (2011)
Keywords: inversereinforcementlearning,partiallyobservableMarkovdecisionprocess,inverse optimization, linear programming, quadratically constrained programming
POMDP †
- A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes
Stéphane Ross, Joelle Pineau, Brahim Chaib-draa, Pierre Kreitmann
JMLR 12:1729−1770 (2011)
Keywords: reinforcement learning, Bayesian inference, partially observable Markov decision processes - Multi-task Reinforcement Learning in Partially Observable Stochastic Environments
Hui Li, Xuejun Liao, Lawrence Carin (Duke University)
JMLR 10:1131-1186 (2009).
Keywords: reinforcement learning, partially observable Markov decision processes, multi-task learning, Dirichlet processes, regionalized policy representation
転移学習 †
- Transfer Learning for Reinforcement Learning Domains: A Survey
Matthew E. Taylor (The University of Southern California), Peter Stone (The University of Texas at Austin)
JMLR 10:1633-1685 (2009).
Keywords: transfer learning,reinforcement learning, multi-task learning - Transfer Learning via Inter-Task Mappings for Temporal Difference Learning
Matthew E. Taylor, Peter Stone, Yaxin Liu
JMLR 8:2125-2167 (2007).
Keywords: transfer learning, reinforcement learning, temporal difference methods, value function approximation, inter-task mapping
環境変化・動的環境 †
- Value Function Based Reinforcement Learning in Changing Markovian Environments
Balázs Csanád Csáji, László Monostori
JMLR 9:1679-1709 (2008).
Keywords: Markov decision processes, reinforcement learning, changing environments, (ε,δ)- MDPs, value function bounds, stochastic iterative algorithms - ε-MDPs: Learning in Varying Environments
István Szita, Bálint Takács, András Lörincz
JMLR 3:145-174 (2002).
Keywords: reinforcement learning, convergence, event-learning, SARSA, MDP, general- ized MDP, ε-MDP, SDS controller
マルチエージェント †
- Multi-Agent Reinforcement Learning in Common Interest and Fixed Sum Stochastic Games: An Experimental Study
Avraham Bab, Ronen I. Brafman
JMLR 9:2635-2675 (2008).
Keywords: reinforcement learning, multi-agent reinforcement learning, stochastic games - Collaborative Multiagent Reinforcement Learning by Payoff Propagation
Jelle R. Kok, Nikos Vlassis
JMLR 7:1789-1828 (2006).
Keywords: collaborative multiagent system, coordination graph, reinforcement learning, Q-learning, belief propagation
階層型強化学習 †
- Hierarchical Average Reward Reinforcement Learning
Mohammad Ghavamzadeh, Sridhar Mahadevan
JMLR 8:2629-2669 (2007).
Keywords: semi-Markov decision processes, hierarchical reinforcement learning, average reward reinforcement learning, hierarchical and recursive optimality
バッチ学習 †
- Tree-Based Batch Mode Reinforcement Learning
Damien Ernst, Pierre Geurts, Louis Wehenkel
JMLR 6:503-556 (2005).
Keywords: batch mode reinforcement learning, regression trees, ensemble methods, supervised learning, fitted value iteration, optimal control
多目的強化学習 †
- A Geometric Approach to Multi-Criterion Reinforcement Learning
Shie Mannor, Nahum Shimkin
JMLR 5:325-360 (2004).
探査と知識利用のジレンマ †
- Using Confidence Bounds for Exploitation-Exploration Trade-offs
Peter Auer
JMLR 3:397-422 (2002).
Keywords: Online Learning, Exploitation-Exploration, Bandit Problem, Reinforcement Learning, Linear Value Function
学習分析 †
- Reinforcement Learning in Finite MDPs: PAC Analysis
Alexander L. Strehl, Lihong Li, Michael L. Littman
JMLR 10:2413−2444 (2009). - Provably Efficient Learning with Typed Parametric Models
Emma Brunskill, Bethany R. Leffler, Lihong Li, Michael L. Littman, Nicholas Roy
JMLR 10:1955-1988 (2009).
Keywords: reinforcement learning, provably efficient learning - Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems
Eyal Even-Dar, Shie Mannor, Yishay Mansour
JMLR 7:1079-1105 (2006). - Lyapunov Design for Safe Reinforcement Learning
Theodore J. Perkins, Andrew G. Barto
JMLR 3:803-832 (2002) .
Keywords: Reinforcement Learning, Lyapunov Functions, Safety, Stability
TD学習 †
- Evolutionary Function Approximation for Reinforcement Learning
Shimon Whiteson, Peter Stone
JMLR 7:877-917 (2006).
Keywords: reinforcement learning, temporal difference methods, evolutionary computation, neuroevolution, on-line learning - Reinforcement Learning with Factored States and Actions
Brian Sallans, Geoffrey E. Hinton
JMLR 5:1063-1088 (2004).
Keywords: product of experts, Boltzmann machine, reinforcement learning, factored actions - Least-Squares Policy Iteration
Michail G. Lagoudakis, Ronald Parr
JMLR 4:1107-1149 (2003).
Keywords: Reinforcement Learning, Markov Decision Processes, Approximate Policy Iteration, Value-Function Approximation, Least-Squares Methods
アクター・クリティック †
- A Convergent Online Single Time Scale Actor Critic Algorithm
Dotan Di Castro, Ron Meir
JMLR 11:367−410 (2010). - Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning
Evan Greensmith, Peter L. Bartlett, Jonathan Baxter
JMLR 5:1471-1530 (2004).
Keywords: reinforcementlearning,policygradient,baseline,actor-critic,GPOMDP
モデル・ベースド †
- R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning
Ronen I. Brafman, Moshe Tennenholtz
JMLR 3:213-231 (2002).
Keywords: Reinforcement Learning, Learning in Games, Stochastic Games, Markov Decision Processes, Provably Efficient Learning
探査 †
- Policy Gradient in Continuous Time
Rémi Munos
JMLR 7:771-791 (2006).
Keywords: optimal control, reinforcement learning, policy search, sensitivity analysis, para- metric optimization, gradient estimate, likelihood ratio method, pathwise derivation - Policy Search using Paired Comparisons
Malcolm J. A. Strens, Andrew W. Moore
JMLR 3:921-950 (2002).
Keywords: Reinforcement Learning, Policy Search, Experiment Design
ツール †
- RL-Glue: Language-Independent Software for Reinforcement-Learning Experiments
Brian Tanner, Adam White (University of Alberta)
JMLR 10:2133-2136 (2009).
Keywords: reinforcement learning, empirical evaluation, standardization, open source