概要 †
- 機械学習国際会議(International Conference on Machine Learning)
- 2009年6月14日-18日
- モントリオール
- http://www.cs.mcgill.ca/~icml2009/
強化学習に最も関係が強い国際会議です.
今年は強化学習に関するセッションが3つあります. (昨年は6つもありました.)
1C - Exploration in Reinforcement Learning †
- The Adaptive k-Meteorologists Problem and Its Application to Structure Learning and Feature Selection in Reinforcement Learning.
Carlos Diuk, Lihong Li and Bethany Leffler. - Near-Bayesian Exploration in Polynomial Time.
J. Zico Kolter and Andrew Ng. - Optimistic Initialization and Greediness Lead to Polynomial Time Learning in Factored MDPs.
Istvan Szita and Andras Lorincz. - Dynamic Analysis of Multiagent Q-learning with e-greedy Exploration.
Eduardo Rodrigues Gomes and Ryszard Kowalczyk. - Hoeffding and Bernstein Races for Selecting Policies in Evolutionary Direct Policy Search.
Verena Heidrich-Meisner and Christian Igel.
3C - Reinforcement Learning with Temporal Differences †
- Proto-Predictive Representation of States with Simple Recurrent Temporal-Difference Networks.
Takaki Makino. - Regularization and Feature Selection in Least Squares Temporal-Difference Learning.
J. Zico Kolter and Andrew Ng. - Fast gradient-descent methods for temporal-difference learning with linear function approximation.
Richard S. Sutton, Hamid Reza Maei, Doina Precup, Shalabh Bhatnagar, David Silver, Csaba Szepesvari, Eric Wiewiora. - Kernelized Value Function Approximation for Reinforcement Learning.
Gavin Taylor and Ronald Parr. - Constraint Relaxation in Approximate Linear Programs.
Marek Petrik and Shlomo Zilberstein.
5C - Reinforcement Learning in High Order Environments †
- Binary Action Search for Learning Continuous-Action Control Policies.
Jason Pazis and Michail Lagoudakis. - Predictive Representations for Policy Gradient in POMDPs.
Stochastic Search using the Natural Gradient. - Stochastic Search using the Natural Gradient.
Sun Yi, Daan Wierstra, Tom Schaul, and Juergen Schmidhuber. - Approximate Inference for Planning in Stochastic Relational Worlds.
Tobias Lang and Marc Toussaint. - Discovering Options from Example Trajectories.
Peng Zang, Peng Zhou, David Minnen and Charles Isbell.