強化学習/機械学習国際会議 ICML の変更点

追加された行はこの色です。
削除された行はこの色です。
強化学習/機械学習国際会議 ICML へ行く。
強化学習/機械学習国際会議 ICML の差分を削除
International Conference on Machine Learningで発表された強化学習に関する論文．~
（最近のものから順次追加しており，完全なリストではありません．）


*ファイナンス [#b2eaf14d]

-[[''Reinforcement learning for optimized trade execution'':http://doi.acm.org/10.1145/1143844.1143929]]~
Yuriy Nevmyvaka, Yi Feng, Michael Kearns~
ICML 2006, pp.  673-680.



*ゲーム [#za08746b]

-[[''Sample-based learning and search with permanent and transient memories'':http://doi.acm.org/10.1145/1390156.1390278]]~
David Silver, Richard Sutton, and Martin Mueller~
ICML 2008, pp. 968-975.
-[[''Learning algorithms for online principal-agent problems (and selling goods online)'':http://doi.acm.org/10.1145/1143844.1143871]]~
Vincent Conitzer, Nikesh Garera~
ICML 2006, pp. 209-216.
-[[''Learning to compete, compromise, and cooperate in repeated general-sum games'':http://doi.acm.org/10.1145/1102351.1102372]]~
Jacob W. Crandall, Michael A. Goodrich~
ICML 2005, pp. 161-168.



*ロボット [#t5c9846c]

-[[''Learning complex motions by sequencing simpler motion templates'':http://doi.acm.org/10.1145/1553374.1553471]]~
Gerhard Neumann, Wolfgang Maass and Jan Peters~
ICML 2009, pp. 753-760.
-[[''Reinforcement learning by reward-weighted regression for operational space control'':http://doi.acm.org/10.1145/1273496.1273590]]~
Jan Peters, Stefan Schaal~
ICML 2007, pp. 745-750.


*マルチエージェント [#qbc49045]

-[[''Dynamic analysis of multiagent Q-learning with ε-greedy exploration'':http://doi.acm.org/10.1145/1553374.1553422]]~
Eduardo Rodrigues Gomes and Ryszard Kowalczyk~
ICML 2009, pp. 369-376.
-[[''Privacy-preserving reinforcement learning'':http://doi.acm.org/10.1145/1390156.1390265]]~
Jun Sakuma, Shigenobu Kobayashi, and Rebecca Wright~
ICML 2008, pp. 864-871.
-[[''Conditional random fields for multi-agent reinforcement learning'':http://doi.acm.org/10.1145/1273496.1273640]]~
Xinhua Zhang, Douglas Aberdeen, S. V. N. Vishwanathan~
ICML 2007, pp. 1143-1150.



*階層型強化学習 [#n3d65970]

-[[''Hierarchical model-based reinforcement learning: R-max + MAXQ'':http://doi.acm.org/10.1145/1390156.1390211]]~
Nicholas Jong and Peter Stone~
ICML 2008, pp. 432-439.


*多目的強化学習 [#dae8ad5c]

-[[''Learning All Optimal Policies with Multiple Criteria'':http://doi.acm.org/10.1145/1390156.1390162]]~
Leon Barrett and Srinivas Narayanan~
ICML 2008, pp. 41-47.
-[[''Multi-task reinforcement learning: a hierarchical Bayesian approach'':http://doi.acm.org/10.1145/1273496.1273624]]~
Aaron Wilson, Alan Fern, Soumya Ray, Prasad Tadepalli~
ICML 2007, pp. 1015-1022.
-[[''Dynamic preferences in multi-criteria reinforcement learning'':http://doi.acm.org/10.1145/1102351.1102427]]~
Sriraam Natarajan, Prasad Tadepalli~
ICML 2005, pp. 601-608.



*転移学習 [#u5e36ccb]

-[[''Transfer of samples in batch reinforcement learning'':http://doi.acm.org/10.1145/1390156.1390225]]~
Alessandro Lazaric, Marcello Restelli, and Andrea Bonarini~
ICML 2008, pp. 544-551.
-[[''Automatic discovery and transfer of MAXQ hierarchies'':http://doi.acm.org/10.1145/1390156.1390238]]~
Neville Mehta, Soumya Ray, Prasad Tadepalli, and Thomas Dietterich~
ICML 2008, pp. 648-655.
-[[''Automatic shaping and decomposition of reward functions'':http://doi.acm.org/10.1145/1273496.1273572]]~
Bhaskara Marthi~
ICML 2007, pp. 601-608.
-[[''Cross-domain transfer for reinforcement learning'':http://doi.acm.org/10.1145/1273496.1273607]]~
Matthew E. Taylor, Peter Stone~
ICML 2007, pp. 879-886.
-[[''Autonomous shaping: knowledge transfer in reinforcement learning'':http://doi.acm.org/10.1145/1143844.1143906]]~
George Konidaris, Andrew Barto~
ICML 2006, pp.  489-496.
-[[''Identifying useful subgoals in reinforcement learning by local graph partitioning'':http://doi.acm.org/10.1145/1102351.1102454]]~
Özgür Şimşek, Alicia P. Wolfe, Andrew G. Barto~
ICML 2005, pp. 816-823.



*関係強化学習 [#jccd787b]

-[[''Relational temporal difference learning'':http://doi.acm.org/10.1145/1143844.1143851]]~
Nima Asgharbeygi, David Stracuzzi, Pat Langley~
ICML 2006, pp. 49-56.
-[[''Learning the structure of Factored Markov Decision Processes in reinforcement learning problems'':http://doi.acm.org/10.1145/1143844.1143877]]~
Thomas Degris, Olivier Sigaud, Pierre-Henri Wuillemin~
ICML 2006, pp. 257-264.



*能動学習 [#dc6ee45a]

-[[''Active reinforcement learning'':http://doi.acm.org/10.1145/1390156.1390194]]~
Arkady Epshteyn, Adam Vogel, and Gerald DeJong~
ICML 2008, pp. 296-303.
-[[''Reinforcement learning with limited reinforcement: using Bayes risk for active learning in POMDPs'':http://doi.acm.org/10.1145/1390156.1390189]]~
Finale Doshi, Joelle Pineau, and Nicholas Roy~
ICML 2008, pp. 256-263.



*POMDP [#p3d4e7ac]

-[[''Predictive representations for policy gradient in POMDPs'':http://doi.acm.org/10.1145/1553374.1553383]]~
Abdeslam Boularias and Brahim Chaib-draa~
ICML 2009, pp. 65-72.
-[[''Region-based value iteration for partially observable Markov decision processes'':http://doi.acm.org/10.1145/1143844.1143915]]~
Hui Li, Xuejun Liao, Lawrence Carin~
ICML 2006, pp. 561-568.



*PSR [#xb4b8a08]

-[[''Efficiently learning linear-linear exponential family predictive representations of state'':http://doi.acm.org/10.1145/1390156.1390304]]~
David Wingate and Satinder Singh~
ICML 2008, pp. 1176-1183.
-[[''Learning predictive state representations using non-blind policies'':http://doi.acm.org/10.1145/1143844.1143861]]~
Michael Bowling, Peter McCracken, Michael James, James Neufeld, Dana Wilkinson~
ICML 2006, pp. 129-136.
-[[''Predictive state representations with options'':http://doi.acm.org/10.1145/1143844.1143973]]~
Britton Wolfe, Satinder Singh~
ICML 2006, pp. 1025-1032.
-[[''Learning predictive state representations in dynamical systems without reset'':http://doi.acm.org/10.1145/1102351.1102475]]~
Britton Wolfe, Michael R. James, Satinder Singh~
ICML 2005, pp. 980-987.
-[[''Learning predictive representations from a history'':http://doi.acm.org/10.1145/1102351.1102473]]~
Eric Wiewiora~
ICML 2005, pp. 964-971.



*動的環境 [#k92e74c9]

-[[''Dealing with non-stationary environments using context detection'':http://doi.acm.org/10.1145/1143844.1143872]]~
Bruno C. da Silva, Eduardo W. Basso, Ana L. C. Bazzan, Paulo M. Engel~
ICML 2006, pp. 217-224.



*価値関数近似 [#k0519134]

-[[''Constructing basis functions from directed graphs for value function approximation'':http://doi.acm.org/10.1145/1273496.1273545]]~
Jeff Johns, Sridhar Mahadevan~
ICML 2007, pp. 385-392.
-[[''Learning state-action basis functions for hierarchical MDPs'':http://doi.acm.org/10.1145/1273496.1273585]]~
Sarah Osentoski, Sridhar Mahadevan~
ICML 2007, pp.  705-712.
-[[''Analyzing feature generation for value-function approximation'':http://doi.acm.org/10.1145/1273496.1273589]]~
Ronald Parr, Christopher Painter-Wakefield, Lihong Li, Michael Littman~
ICML 2007, pp. 737-744.
-[[''Tracking value function dynamics to improve reinforcement learning with piecewise linear function approximation'':http://doi.acm.org/10.1145/1273496.1273591]]~
Chee Wee Phua, Robert Fitch~
ICML 2007, pp. 751-758.
-[[''Automatic basis function construction for approximate dynamic programming and reinforcement learning'':http://doi.acm.org/10.1145/1143844.1143901]]~
Philipp W. Keller, Shie Mannor, Doina Precup~
ICML 2006, pp. 449-456.



*連続的行動空間 [#i80e2ed1]

-[[''Binary action search for learning continuous-action control policies'':http://doi.acm.org/10.1145/1553374.1553476]]~
Jason Pazis and Michail Lagoudakis~
ICML 2009, pp. 793-800.



*大規模状態空間 [#g740c173]

-[[''Bayesian sparse sampling for on-line reward optimization'':http://doi.acm.org/10.1145/1102351.1102472]]~
Tao Wang, Daniel Lizotte, Michael Bowling, Dale Schuurmans~
ICML 2005, pp. 956-963.
-[[''Proto-value functions: developmental reinforcement learning'':http://doi.acm.org/10.1145/1102351.1102421]]~
Sridhar Mahadevan~
ICML 2005, pp. 553-560.



*探査 [#h48019f5]

-[[''The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning'':http://doi.acm.org/10.1145/1553374.1553406]]~
Carlos Diuk, Lihong Li and Bethany Leffler~
ICML 2009, pp. 249-256.
-[[''Near-Bayesian exploration in polynomial time'':http://doi.acm.org/10.1145/1553374.1553441]]~
J. Zico Kolter and Andrew Ng~
ICML 2009, pp. 513-520.
-[[''Optimistic initialization and greediness lead to polynomial time learning in factored MDPs'':http://doi.acm.org/10.1145/1553374.1553502]]~
Istvan Szita and Andras Lorincz~
ICML 2009, pp. 1001-1008.
-[[''Hoeffding and Bernstein races for selecting policies in evolutionary direct policy search'':http://doi.acm.org/10.1145/1553374.1553426]]~
Verena Heidrich-Meisner and Christian Igel~
ICML 2009, pp. 401-408.
-[[''The many faces of optimism: a unifying approach'':http://doi.acm.org/10.1145/1390156.1390288]]~
Istvan Szita and Andras Lorincz~
ICML 2008, pp. 1048-1055.
-[[''Reinforcement learning in the presence of rare events'':http://doi.acm.org/10.1145/1390156.1390199]]~
Jordan Frank, Shie Mannor, and Doina Precup~
ICML 2008, pp. 336-343.
-[[''Percentile optimization in uncertain Markov decision processes with application to efficient exploration'':http://doi.acm.org/10.1145/1273496.1273525]]~
Erick Delage, Shie Mannor~
ICML 2007, pp. 225-232.
-[[''An intrinsic reward mechanism for efficient exploration'':http://doi.acm.org/10.1145/1143844.1143949]]~
Özgür Şimşek, Andrew G. Barto~
ICML 2006, pp. 833-840.
-[[''Qualitative reinforcement learning'':http://doi.acm.org/10.1145/1143844.1143883]]~
Arkady Epshteyn, Gerald DeJong~
ICML 2006, pp. 305-312.
-[[''Experience-efficient learning in associative bandit problems'':http://doi.acm.org/10.1145/1143844.1143956]]~
Alexander L. Strehl, Chris Mesterharm, Michael L. Littman, Haym Hirsh~
ICML 2006, pp. 889-896.
-[['Exploration and apprenticeship learning in reinforcement learning'':http://doi.acm.org/10.1145/1102351.1102352]]~
-[[''Exploration and apprenticeship learning in reinforcement learning'':http://doi.acm.org/10.1145/1102351.1102352]]~
Pieter Abbeel, Andrew Y. Ng~
ICML 2005, pp. 1-8.



*行動規則評価 [#w8d3ad24]

-[[''A semiparametric statistical approach to model-free policy evaluation'':http://doi.acm.org/10.1145/1390156.1390291]]~
Tsuyoshi Ueno, Motoaki Kawanabe, Takeshi Mori, Shin-Ichi Maeda, and Shin Ishii~
ICML 2008, pp. 1072-1079.
-[[''Exploration scavenging'':http://doi.acm.org/10.1145/1390156.1390223]]~
John Langford, Alexander Strehl, and Jennifer Wortman~
ICML 2008, pp. 528-535.
-[[''Fast direct policy evaluation using multiscale analysis of Markov diffusion processes'':http://doi.acm.org/10.1145/1143844.1143920]]~
Mauro Maggioni, Sridhar Mahadevan~
ICML 2006, pp. 601-608.



*学習分析 [#ie0cad04]

-[[''A worst-case comparison between temporal difference and residual gradient with linear function approximation'':http://doi.acm.org/10.1145/1390156.1390227]]~
Lihong Li~
ICML 2008, pp. 560-567.
-[[''An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning'':http://doi.acm.org/10.1145/1390156.1390251]]~
Ronald Parr, Lihong Li, Gavin Taylor, Christopher Painter-Wakefield~
ICML 2008, pp. 752-759.
-[[''An analysis of reinforcement learning with function approximation'':http://doi.acm.org/10.1145/1390156.1390240]]~
Francisco Melo, Sean Meyn, and Isabel Ribeiro~
ICML 2008, pp. 664-671.
-[[''A theoretical analysis of Model-Based Interval Estimation'':http://doi.acm.org/10.1145/1102351.1102459]]~
Alexander L. Strehl, Michael L. Littman~
ICML 2005, pp. 856-863.
-[[''Relating reinforcement learning performance to classification performance'':http://doi.acm.org/10.1145/1102351.1102411]]~
John Langford, Bianca Zadrozny~
ICML 2005, pp. 473-480.


*勾配法 [#kb3fd704]

-[[''Fast gradient-descent methods for temporal-difference learning with linear function approximation'':http://doi.acm.org/10.1145/1553374.1553501]]~
Richard S. Sutton, Hamid Reza Maei, Doina Precup, Shalabh Bhatnagar, David Silver, Csaba Szepesvari, Eric Wiewiora~
ICML 2009, pp. 993-1000.
-[[''Non-parametric policy gradients: a unified treatment of propositional and relational domains'':http://doi.acm.org/10.1145/1390156.1390214]]~
Kristian Kersting and Kurt Driessens~
ICML 2008, pp. 456-463.


*TD学習 [#c5efc134]

-[[''Proto-predictive representation of states with simple recurrent temporal-difference networks'':http://doi.acm.org/10.1145/1553374.1553464]]~
Takaki Makino~
ICML 2009, pp. 697-704.
-[[''Regularization and feature selection in least-squares temporal difference learning'':http://doi.acm.org/10.1145/1553374.1553442]]~
J. Zico Kolter and Andrew Ng~
ICML 2009, pp. 521-528.
-[[''Kernelized value function approximation for reinforcement learning'':http://doi.acm.org/10.1145/1553374.1553504]]~
Gavin Taylor and Ronald Parr~
ICML 2009, pp. 1017-1024.
-[[''Constraint relaxation in approximate linear programs'':http://doi.acm.org/10.1145/1553374.1553478]]~
Marek Petrik and Shlomo Zilberstein~
ICML 2009, pp. 809-816.
-[[''Preconditioned temporal difference learning'':http://doi.acm.org/10.1145/1390156.1390308]]~
Hengshuai Yao and Zhi-Qiang Liu~
ICML 2008, pp. 1208-1215.
-[[''PAC model-free reinforcement learning'':http://doi.acm.org/10.1145/1143844.1143955]]~
Alexander L. Strehl, Lihong Li, Eric Wiewiora, John Langford, Michael L. Littman~
ICML 2006, pp. 881-888.
-[[''Reinforcement learning with Gaussian processes'':http://doi.acm.org/10.1145/1102351.1102377]]~
Yaakov Engel, Shie Mannor, Ron Meir~
ICML 2005, pp. 201-208.
-[[''TD(λ) networks: temporal-difference networks with eligibility traces'':http://doi.acm.org/10.1145/1102351.1102463]]~
Brian Tanner, Richard S. Sutton~
ICML 2005, pp. 888-895.




*アクター・クリティック [#l44a6bfc]

-[[''Bayesian actor-critic algorithms'':http://doi.acm.org/10.1145/1273496.1273534]]~
Mohammad Ghavamzadeh, Yaakov Engel~
ICML 2007, pp. 297-304.


*モデル・ベースド [#xee6c308]

-[[''An analytic solution to discrete Bayesian reinforcement learning'':http://doi.acm.org/10.1145/1143844.1143932]]~
Pascal Poupart, Nikos Vlassis, Jesse Hoey, Kevin Regan~
ICML 2006, pp. 697-704.
-[[''Using inaccurate models in reinforcement learning'':http://doi.acm.org/10.1145/1143844.1143845]]~
Pieter Abbeel, Morgan Quigley, Andrew Y. Ng~
ICML 2006, pp. 1-8.



*その他・未分類 [#ne78a287]

-[[''Discovering options from example trajectories'':http://doi.acm.org/10.1145/1553374.1553529]]~
Peng Zang, Peng Zhou, David Minnen and Charles Isbell~
ICML 2009, pp. 1217-1224.
-[[''An object-oriented representation for efficient reinforcement learning'':http://doi.acm.org/10.1145/1390156.1390187]]~
Carlos Diuk, Andre Cohen, and Michael Littman~
ICML 2008, pp. 240-247.
-[[''Online kernel selection for Bayesian reinforcement learning'':http://doi.acm.org/10.1145/1390156.1390259]]~
Joseph Reisinger, Peter Stone, and Risto Miikkulainen~
ICML 2008, pp. 816-823.