International Conference on Machine Learningで発表された強化学習に関する論文.
(最近のものから順次追加しており,完全なリストではありません.)
ファイナンス †
- Reinforcement learning for optimized trade execution
Yuriy Nevmyvaka, Yi Feng, Michael Kearns
ICML 2006, pp. 673-680.
ゲーム †
- Sample-based learning and search with permanent and transient memories
David Silver, Richard Sutton, and Martin Mueller
ICML 2008, pp. 968-975. - Learning algorithms for online principal-agent problems (and selling goods online)
Vincent Conitzer, Nikesh Garera
ICML 2006, pp. 209-216. - Learning to compete, compromise, and cooperate in repeated general-sum games
Jacob W. Crandall, Michael A. Goodrich
ICML 2005, pp. 161-168.
ロボット †
- Learning complex motions by sequencing simpler motion templates
Gerhard Neumann, Wolfgang Maass and Jan Peters
ICML 2009, pp. 753-760. - Reinforcement learning by reward-weighted regression for operational space control
Jan Peters, Stefan Schaal
ICML 2007, pp. 745-750.
マルチエージェント †
- Dynamic analysis of multiagent Q-learning with ε-greedy exploration
Eduardo Rodrigues Gomes and Ryszard Kowalczyk
ICML 2009, pp. 369-376. - Privacy-preserving reinforcement learning
Jun Sakuma, Shigenobu Kobayashi, and Rebecca Wright
ICML 2008, pp. 864-871. - Conditional random fields for multi-agent reinforcement learning
Xinhua Zhang, Douglas Aberdeen, S. V. N. Vishwanathan
ICML 2007, pp. 1143-1150.
階層型強化学習 †
- Hierarchical model-based reinforcement learning: R-max + MAXQ
Nicholas Jong and Peter Stone
ICML 2008, pp. 432-439.
多目的強化学習 †
- Learning All Optimal Policies with Multiple Criteria
Leon Barrett and Srinivas Narayanan
ICML 2008, pp. 41-47. - Multi-task reinforcement learning: a hierarchical Bayesian approach
Aaron Wilson, Alan Fern, Soumya Ray, Prasad Tadepalli
ICML 2007, pp. 1015-1022. - Dynamic preferences in multi-criteria reinforcement learning
Sriraam Natarajan, Prasad Tadepalli
ICML 2005, pp. 601-608.
転移学習 †
- Transfer of samples in batch reinforcement learning
Alessandro Lazaric, Marcello Restelli, and Andrea Bonarini
ICML 2008, pp. 544-551. - Automatic discovery and transfer of MAXQ hierarchies
Neville Mehta, Soumya Ray, Prasad Tadepalli, and Thomas Dietterich
ICML 2008, pp. 648-655. - Automatic shaping and decomposition of reward functions
Bhaskara Marthi
ICML 2007, pp. 601-608. - Cross-domain transfer for reinforcement learning
Matthew E. Taylor, Peter Stone
ICML 2007, pp. 879-886. - Autonomous shaping: knowledge transfer in reinforcement learning
George Konidaris, Andrew Barto
ICML 2006, pp. 489-496. - Identifying useful subgoals in reinforcement learning by local graph partitioning
Özgür Şimşek, Alicia P. Wolfe, Andrew G. Barto
ICML 2005, pp. 816-823.
関係強化学習 †
- Relational temporal difference learning
Nima Asgharbeygi, David Stracuzzi, Pat Langley
ICML 2006, pp. 49-56. - Learning the structure of Factored Markov Decision Processes in reinforcement learning problems
Thomas Degris, Olivier Sigaud, Pierre-Henri Wuillemin
ICML 2006, pp. 257-264.
能動学習 †
- Active reinforcement learning
Arkady Epshteyn, Adam Vogel, and Gerald DeJong
ICML 2008, pp. 296-303. - Reinforcement learning with limited reinforcement: using Bayes risk for active learning in POMDPs
Finale Doshi, Joelle Pineau, and Nicholas Roy
ICML 2008, pp. 256-263.
POMDP †
- Predictive representations for policy gradient in POMDPs
Abdeslam Boularias and Brahim Chaib-draa
ICML 2009, pp. 65-72. - Region-based value iteration for partially observable Markov decision processes
Hui Li, Xuejun Liao, Lawrence Carin
ICML 2006, pp. 561-568.
PSR †
- Efficiently learning linear-linear exponential family predictive representations of state
David Wingate and Satinder Singh
ICML 2008, pp. 1176-1183. - Learning predictive state representations using non-blind policies
Michael Bowling, Peter McCracken, Michael James, James Neufeld, Dana Wilkinson
ICML 2006, pp. 129-136. - Predictive state representations with options
Britton Wolfe, Satinder Singh
ICML 2006, pp. 1025-1032. - Learning predictive state representations in dynamical systems without reset
Britton Wolfe, Michael R. James, Satinder Singh
ICML 2005, pp. 980-987. - Learning predictive representations from a history
Eric Wiewiora
ICML 2005, pp. 964-971.
動的環境 †
- Dealing with non-stationary environments using context detection
Bruno C. da Silva, Eduardo W. Basso, Ana L. C. Bazzan, Paulo M. Engel
ICML 2006, pp. 217-224.
価値関数近似 †
- Constructing basis functions from directed graphs for value function approximation
Jeff Johns, Sridhar Mahadevan
ICML 2007, pp. 385-392. - Learning state-action basis functions for hierarchical MDPs
Sarah Osentoski, Sridhar Mahadevan
ICML 2007, pp. 705-712. - Analyzing feature generation for value-function approximation
Ronald Parr, Christopher Painter-Wakefield, Lihong Li, Michael Littman
ICML 2007, pp. 737-744. - Tracking value function dynamics to improve reinforcement learning with piecewise linear function approximation
Chee Wee Phua, Robert Fitch
ICML 2007, pp. 751-758. - Automatic basis function construction for approximate dynamic programming and reinforcement learning
Philipp W. Keller, Shie Mannor, Doina Precup
ICML 2006, pp. 449-456.
連続的行動空間 †
- Binary action search for learning continuous-action control policies
Jason Pazis and Michail Lagoudakis
ICML 2009, pp. 793-800.
大規模状態空間 †
- Bayesian sparse sampling for on-line reward optimization
Tao Wang, Daniel Lizotte, Michael Bowling, Dale Schuurmans
ICML 2005, pp. 956-963. - Proto-value functions: developmental reinforcement learning
Sridhar Mahadevan
ICML 2005, pp. 553-560.
探査 †
- The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning
Carlos Diuk, Lihong Li and Bethany Leffler
ICML 2009, pp. 249-256. - Near-Bayesian exploration in polynomial time
J. Zico Kolter and Andrew Ng
ICML 2009, pp. 513-520. - Optimistic initialization and greediness lead to polynomial time learning in factored MDPs
Istvan Szita and Andras Lorincz
ICML 2009, pp. 1001-1008. - Hoeffding and Bernstein races for selecting policies in evolutionary direct policy search
Verena Heidrich-Meisner and Christian Igel
ICML 2009, pp. 401-408. - The many faces of optimism: a unifying approach
Istvan Szita and Andras Lorincz
ICML 2008, pp. 1048-1055. - Reinforcement learning in the presence of rare events
Jordan Frank, Shie Mannor, and Doina Precup
ICML 2008, pp. 336-343. - Percentile optimization in uncertain Markov decision processes with application to efficient exploration
Erick Delage, Shie Mannor
ICML 2007, pp. 225-232. - An intrinsic reward mechanism for efficient exploration
Özgür Şimşek, Andrew G. Barto
ICML 2006, pp. 833-840. - Qualitative reinforcement learning
Arkady Epshteyn, Gerald DeJong
ICML 2006, pp. 305-312. - Experience-efficient learning in associative bandit problems
Alexander L. Strehl, Chris Mesterharm, Michael L. Littman, Haym Hirsh
ICML 2006, pp. 889-896. - Exploration and apprenticeship learning in reinforcement learning
Pieter Abbeel, Andrew Y. Ng
ICML 2005, pp. 1-8.
行動規則評価 †
- A semiparametric statistical approach to model-free policy evaluation
Tsuyoshi Ueno, Motoaki Kawanabe, Takeshi Mori, Shin-Ichi Maeda, and Shin Ishii
ICML 2008, pp. 1072-1079. - Exploration scavenging
John Langford, Alexander Strehl, and Jennifer Wortman
ICML 2008, pp. 528-535. - Fast direct policy evaluation using multiscale analysis of Markov diffusion processes
Mauro Maggioni, Sridhar Mahadevan
ICML 2006, pp. 601-608.
学習分析 †
- A worst-case comparison between temporal difference and residual gradient with linear function approximation
Lihong Li
ICML 2008, pp. 560-567. - An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning
Ronald Parr, Lihong Li, Gavin Taylor, Christopher Painter-Wakefield
ICML 2008, pp. 752-759. - An analysis of reinforcement learning with function approximation
Francisco Melo, Sean Meyn, and Isabel Ribeiro
ICML 2008, pp. 664-671. - A theoretical analysis of Model-Based Interval Estimation
Alexander L. Strehl, Michael L. Littman
ICML 2005, pp. 856-863. - Relating reinforcement learning performance to classification performance
John Langford, Bianca Zadrozny
ICML 2005, pp. 473-480.
勾配法 †
- Fast gradient-descent methods for temporal-difference learning with linear function approximation
Richard S. Sutton, Hamid Reza Maei, Doina Precup, Shalabh Bhatnagar, David Silver, Csaba Szepesvari, Eric Wiewiora
ICML 2009, pp. 993-1000. - Non-parametric policy gradients: a unified treatment of propositional and relational domains
Kristian Kersting and Kurt Driessens
ICML 2008, pp. 456-463.
TD学習 †
- Proto-predictive representation of states with simple recurrent temporal-difference networks
Takaki Makino
ICML 2009, pp. 697-704. - Regularization and feature selection in least-squares temporal difference learning
J. Zico Kolter and Andrew Ng
ICML 2009, pp. 521-528. - Kernelized value function approximation for reinforcement learning
Gavin Taylor and Ronald Parr
ICML 2009, pp. 1017-1024. - Constraint relaxation in approximate linear programs
Marek Petrik and Shlomo Zilberstein
ICML 2009, pp. 809-816. - Preconditioned temporal difference learning
Hengshuai Yao and Zhi-Qiang Liu
ICML 2008, pp. 1208-1215. - PAC model-free reinforcement learning
Alexander L. Strehl, Lihong Li, Eric Wiewiora, John Langford, Michael L. Littman
ICML 2006, pp. 881-888. - Reinforcement learning with Gaussian processes
Yaakov Engel, Shie Mannor, Ron Meir
ICML 2005, pp. 201-208. - TD(λ) networks: temporal-difference networks with eligibility traces
Brian Tanner, Richard S. Sutton
ICML 2005, pp. 888-895.
アクター・クリティック †
- Bayesian actor-critic algorithms
Mohammad Ghavamzadeh, Yaakov Engel
ICML 2007, pp. 297-304.
モデル・ベースド †
- An analytic solution to discrete Bayesian reinforcement learning
Pascal Poupart, Nikos Vlassis, Jesse Hoey, Kevin Regan
ICML 2006, pp. 697-704. - Using inaccurate models in reinforcement learning
Pieter Abbeel, Morgan Quigley, Andrew Y. Ng
ICML 2006, pp. 1-8.
その他・未分類 †
- Discovering options from example trajectories
Peng Zang, Peng Zhou, David Minnen and Charles Isbell
ICML 2009, pp. 1217-1224. - An object-oriented representation for efficient reinforcement learning
Carlos Diuk, Andre Cohen, and Michael Littman
ICML 2008, pp. 240-247. - Online kernel selection for Bayesian reinforcement learning
Joseph Reisinger, Peter Stone, and Risto Miikkulainen
ICML 2008, pp. 816-823.