強化学習/機械学習研究ジャーナル JMLR の変更点

追加された行はこの色です。
削除された行はこの色です。
強化学習/機械学習研究ジャーナル JMLR へ行く。
強化学習/機械学習研究ジャーナル JMLR の差分を削除
Journal of Machine Learning Researchに掲載された強化学習に関する論文． ~
（順次追加しており，完全なリストではありません．）


*逆強化学習 [#pf244e68]

-[[''Inverse Reinforcement Learning in Partially Observable Environments'':http://jmlr.csail.mit.edu/papers/v12/choi11a.html]]~
Jaedeug Choi, Kee-Eung Kim~
JMLR 12:691−730 (2011)~
'''Keywords:''' inversereinforcementlearning,partiallyobservableMarkovdecisionprocess,inverse optimization, linear programming, quadratically constrained programming


*POMDP [#i24c2b85]

-[[''A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes'':http://jmlr.csail.mit.edu/papers/v12/ross11a.html]]~
Stéphane Ross, Joelle Pineau, Brahim Chaib-draa, Pierre Kreitmann~
JMLR 12:1729−1770 (2011)~
'''Keywords:''' reinforcement learning, Bayesian inference, partially observable Markov decision processes
-[[''Multi-task Reinforcement Learning in Partially Observable Stochastic Environments'':http://jmlr.csail.mit.edu/papers/v10/li09b.html]]~
Hui Li, Xuejun Liao, Lawrence Carin (Duke University)~
JMLR 10:1131-1186 (2009).~
'''Keywords:''' reinforcement learning, partially observable Markov decision processes, multi-task learning, Dirichlet processes, regionalized policy representation


*転移学習 [#eba4fe72]

-[[''Transfer Learning for Reinforcement Learning Domains: A Survey'':http://jmlr.csail.mit.edu/papers/v10/taylor09a.html]]~
Matthew E. Taylor (The University of Southern California), Peter Stone (The University of Texas at Austin)~
JMLR 10:1633-1685 (2009).~
'''Keywords:''' transfer learning,reinforcement learning, multi-task learning
-[[''Transfer Learning via Inter-Task Mappings for Temporal Difference Learning'':http://jmlr.csail.mit.edu/papers/v8/taylor07a.html]]~
Matthew E. Taylor, Peter Stone, Yaxin Liu~
JMLR 8:2125-2167 (2007).~
'''Keywords:''' transfer learning, reinforcement learning, temporal difference methods, value function  approximation, inter-task mapping


*環境変化・動的環境 [#l4a70c46]

-[[''Value Function Based Reinforcement Learning in Changing Markovian Environments'':http://jmlr.csail.mit.edu/papers/v9/csaji08a.html]]~
Balázs Csanád Csáji, László Monostori~
JMLR 9:1679-1709 (2008).~
'''Keywords:''' Markov decision processes, reinforcement learning, changing environments, (ε,δ)- MDPs, value function bounds, stochastic iterative algorithms
-[[''ε-MDPs: Learning in Varying Environments'':http://jmlr.csail.mit.edu/papers/v3/szita02a.html]]~
István Szita, Bálint Takács, András Lörincz~
JMLR 3:145-174 (2002).~
'''Keywords:''' reinforcement learning, convergence, event-learning, SARSA, MDP, general- ized MDP, ε-MDP, SDS controller

*マルチエージェント [#xbea664c]

-[[''Multi-Agent Reinforcement Learning in Common Interest and Fixed Sum Stochastic Games: An Experimental Study'':http://jmlr.csail.mit.edu/papers/v9/bab08a.html]]~
Avraham Bab, Ronen I. Brafman~
JMLR 9:2635-2675 (2008).~
'''Keywords:''' reinforcement learning, multi-agent reinforcement learning, stochastic games
-[[''Collaborative Multiagent Reinforcement Learning by Payoff Propagation'':http://jmlr.csail.mit.edu/papers/v7/kok06a.html]]~
Jelle R. Kok, Nikos Vlassis~
JMLR 7:1789-1828 (2006).~
'''Keywords:''' collaborative multiagent system, coordination graph, reinforcement learning, Q-learning, belief propagation


*階層型強化学習 [#p19d8166]

-[[''Hierarchical Average Reward Reinforcement Learning'':http://jmlr.csail.mit.edu/papers/v8/ghavamzadeh07a.html]]~
Mohammad Ghavamzadeh, Sridhar Mahadevan~
JMLR 8:2629-2669 (2007).~
'''Keywords:''' semi-Markov decision processes, hierarchical reinforcement learning, average reward reinforcement learning, hierarchical and recursive optimality


*バッチ学習 [#ad2f5f99]

-[[''Tree-Based Batch Mode Reinforcement Learning'':http://jmlr.csail.mit.edu/papers/v6/ernst05a.html]]~
Damien Ernst, Pierre Geurts, Louis Wehenkel~
JMLR 6:503-556 (2005).~
'''Keywords:''' batch mode reinforcement learning, regression trees, ensemble methods, supervised learning, fitted value iteration, optimal control


*多目的強化学習 [#bc0870fa]

-[[''A Geometric Approach to Multi-Criterion Reinforcement Learning'':http://jmlr.csail.mit.edu/papers/v5/mannor04a.html]]~
Shie Mannor, Nahum Shimkin~
JMLR 5:325-360 (2004).


*探査と知識利用のジレンマ [#k5f6e822]

-[[''Using Confidence Bounds for Exploitation-Exploration Trade-offs'':http://jmlr.csail.mit.edu/papers/v3/auer02a.html]]~
Peter Auer~
JMLR 3:397-422 (2002).~
'''Keywords:''' Online Learning, Exploitation-Exploration, Bandit Problem, Reinforcement Learning, Linear Value Function



*学習分析 [#w8d839f2]

-[[''Reinforcement Learning in Finite MDPs: PAC Analysis'':http://jmlr.csail.mit.edu/papers/v10/strehl09a.html]]~
Alexander L. Strehl, Lihong Li, Michael L. Littman~
JMLR 10:2413−2444 (2009).
-[[''Provably Efficient Learning with Typed Parametric Models'':http://jmlr.csail.mit.edu/papers/v10/brunskill09a.html]]~
Emma Brunskill, Bethany R. Leffler, Lihong Li, Michael L. Littman, Nicholas Roy~
JMLR 10:1955-1988 (2009).~
'''Keywords:''' reinforcement learning, provably efficient learning
-[[''Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems'':http://jmlr.csail.mit.edu/papers/v7/evendar06a.html]]~
Eyal Even-Dar, Shie Mannor, Yishay Mansour~
JMLR 7:1079-1105 (2006).
-[[''Lyapunov Design for Safe Reinforcement Learning'':http://jmlr.csail.mit.edu/papers/v3/perkins02a.html]]~
Theodore J. Perkins, Andrew G. Barto~
JMLR 3:803-832 (2002) .~
'''Keywords:''' Reinforcement Learning, Lyapunov Functions, Safety, Stability


*TD学習 [#e6e25c4d]

-[[''Evolutionary Function Approximation for Reinforcement Learning'':http://jmlr.csail.mit.edu/papers/v7/whiteson06a.html]]~
Shimon Whiteson, Peter Stone~
JMLR 7:877-917 (2006).~
'''Keywords:''' reinforcement learning, temporal difference methods, evolutionary computation, neuroevolution, on-line learning
-[[''Reinforcement Learning with Factored States and Actions'':http://jmlr.csail.mit.edu/papers/v5/sallans04a.html]]~
Brian Sallans, Geoffrey E. Hinton~
JMLR 5:1063-1088 (2004).~
'''Keywords:''' product of experts, Boltzmann machine, reinforcement learning, factored actions
-[[''Least-Squares Policy Iteration'':http://jmlr.csail.mit.edu/papers/v4/lagoudakis03a.html]]~
Michail G. Lagoudakis, Ronald Parr~
JMLR 4:1107-1149 (2003).~
'''Keywords:''' Reinforcement Learning, Markov Decision Processes, Approximate Policy Iteration, Value-Function Approximation, Least-Squares Methods


*アクター・クリティック [#mf88a299]

-[[''A Convergent Online Single Time Scale Actor Critic Algorithm'':http://jmlr.csail.mit.edu/papers/v11/dicastro10a.html]]~
Dotan Di Castro, Ron Meir~
JMLR 11:367−410 (2010).
-[[''Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning'':http://jmlr.csail.mit.edu/papers/v5/greensmith04a.html]]~
Evan Greensmith, Peter L. Bartlett, Jonathan Baxter~
JMLR 5:1471-1530  (2004).~
'''Keywords:''' reinforcementlearning,policygradient,baseline,actor-critic,GPOMDP


*モデル・ベースド [#j7d124d1]

-[[''R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning'':http://jmlr.csail.mit.edu/papers/v3/brafman02a.html]]~
Ronen I. Brafman, Moshe Tennenholtz~
JMLR 3:213-231 (2002).~
'''Keywords:''' Reinforcement Learning, Learning in Games, Stochastic Games, Markov Decision Processes, Provably Efficient Learning


*探査 [#t3da9fdf]

-[[''Policy Gradient in Continuous Time'':http://jmlr.csail.mit.edu/papers/v7/munos06b.html]]~
Rémi Munos~
JMLR 7:771-791 (2006).~
'''Keywords:''' optimal control, reinforcement learning, policy search, sensitivity analysis, para- metric optimization, gradient estimate, likelihood ratio method, pathwise derivation
-[[''Policy Search using Paired Comparisons'':http://jmlr.csail.mit.edu/papers/v3/strens02a.html]]~
Malcolm J. A. Strens, Andrew W. Moore~
JMLR 3:921-950 (2002).~
'''Keywords:''' Reinforcement Learning, Policy Search, Experiment Design


*ツール [#z719584b]

-[[''RL-Glue: Language-Independent Software for Reinforcement-Learning Experiments'':http://jmlr.csail.mit.edu/papers/v10/tanner09a.html]]~
Brian Tanner, Adam White (University of Alberta)~
JMLR 10:2133-2136 (2009).~
'''Keywords:''' reinforcement learning, empirical evaluation, standardization, open source