強化学習/機械学習国際会議 ICML 2010

概要 †

Least-Squares Λ Policy Iteration: Bias-Variance Trade-off in Control Problems
Christophe Thiery (Loria); Bruno Scherrer (Loria)
Finite-Sample Analysis of LSTD
Alessandro Lazaric (Inria); Mohammad Ghavamzadeh (Inria); Remi Munos (Inria)
Convergence of Least Squares Temporal Difference Methods Under General Conditions
Huizhen Yu (Univ. of Helsinki)
Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view
Bruno Scherrer (Loria)

Approximate Predictive Representations of Partially Observable Systems
Doina Precup (Mcgill University); Monica Dinculescu (McGill University)
Constructing States for Reinforcement Learning
M. M. Mahmud (Australian National University)
Temporal Difference Bayesian Model Averaging: A Bayesian Perspective on Adapting Lambda
Carlton Downey (Victoria University of Wellington); Scott Sanner (Nicta)
Bayesian Multi-Task Reinforcement Learning
Alessandro Lazaric (Inria); Mohammad Ghavamzadeh (Inria)

Generalizing Apprenticeship Learning across Hypothesis Classes
Thomas Walsh (Rutgers University); Kaushik Subramanian (Rutgers University); Michael Littman (Rutgers University); Carlos Diuk (Princeton University)
Toward Off-Policy Learning Control with Function Approximation
Hamid Maei (University of Alberta); Csaba Szepesvari (University Of Alberta); Shalabh Bhatnagar (Indian Institute of Science); Richard Sutton (University of Alberta)
Efficient Reinforcement Learning with Multiple Reward Functions for Randomized Controlled Trial Analysis
Daniel Lizotte (University of Michigan); Michael Bowling (University of Alberta); Susan Murphy (University of Michigan)
Internal Rewards Mitigate Agent Boundedness Jonathan Sorg (University of Michigan); Satinder Singh (University of Michigan); Richard Lewis (University of Michigan)

Analysis of a Classification-based Policy Iteration Algorithm
Alessandro Lazaric (Inria); Mohammad Ghavamzadeh (Inria); Remi Munos (Inria)
Nonparametric Return Distribution Approximation for Reinforcement Learning
Tetsuro Morimura (IBM Research - Tokyo); Masashi Sugiyama (Tokyo Institute Of Technology); Hisashi Kashima (University of Tokyo); Hirotaka Hachiya; Toshiyuki Tanaka
Inverse Optimal Control with Linearly Solvable MDPs
Krishnamurthy Dvijotham (University of Washington); Emanuel Todorov (University of Washington)
Feature Selection Using Regularization in Approximate Linear Programs for Markov Decision Processes
Marek Petrik (University of Massachusetts ); Gavin Taylor (Duke); Ron Parr (Duke); Shlomo Zilberstein (University of Massachusetts Amherst)