平均報酬強化学習

| Topic path: Top / 強化学習 / 平均報酬強化学習

*概要 [#lbacbc1b]

(報酬を割り引かないで)獲得報酬の平均を最大化するタイプの強化学習.


*手法 [#mf01e27a]

**R-Learning, Modified R-Learning [#r6f8a9ec]
R-LearningはSchwartzさんによって提案された最初の平均報酬強化学習.
Suttonさんの教科書にも載っている.

SinghさんがR-learningのBellman方程式を修正.
ステップごとに平均報酬を更新する.

-''A reinforcement learning method for maximizing undiscounted rewards''~
A. Schwartz~
ICML 1993, pp. 298-305 (1993)
-''Reinforcement learning algorithms for average-payoff Markovian decision processes''~
S.P. Singh~
AAAI 1994, pp. 700-705


**A Model-based Algorithm for Bias-optimal [#w5c468a4]
-''Average Reward Reinforcement Learning: Foundations, Algorithms, and Empirical Results''~
S. Mahadevan~
Mach Learn, Vol. 22, No. 1-3, pp. 159-195 (1996)


**H-Learning [#zdd24677]
-''Model-based Average Reward Reinforcement Learning''~
P. Tadepalli, D. Ok~
Artificial Intelligence, Vol. 100, pp. 177-224 (1998)
-''H-learning: A Reinforcement Learning Method to Optimize Undiscounted Average Reward'']]~
-''H-learning: A Reinforcement Learning Method to Optimize Undiscounted Average Reward''~
P. Tadepalli, D. Ok~
Technical Report 94-30-1, Oregon State University, Department of Computer Science (1994)


**SMART, Relaxed SMART [#w9abd3f5]
-[[''Reinforcement learning for long-run average cost'':http://dx.doi.org/10.1016/S0377-2217(02)00874-3]]~
A. Gosavi~
European Journal of Operational Research, Vol. 155, No. 3, pp. 654-674 (2004)
-[[''Solving semi-Markov decision problems using average reward reinforcement learning'':http://pubsonline.informs.org/mansci/abstract14708]]~
Das, T., Gosavi, A., Mahadevan, S., and Marchalleck, N.~
Management Science, Vol. 45, No. 4, pp. 560–574 (1999)


**Q-P-Learning [#lfa776cc]
-[[''A Reinforcement Learning Algorithm Based on Policy Iteration for Average Reward: Empirical Results with Yield Management and Convergence Analysis'':http://www.springerlink.com/content/qk6678w01008703w/]]~
A. Gosavi~
Mach Learn, Vol. 55, No. 1, pp. 5-29 (2004)


**HAR Algorithm [#o7bbc23a]
階層型平均報酬強化学習.
-[[''Hierarchical Average Reward Reinforcement Learning'':http://jmlr.csail.mit.edu/papers/v8/ghavamzadeh07a.html]]~
M. Ghavamzadeh	and S. Mahadevan~
JMLR, Vol. 8, pp. 2629-2669 (2007)
トップ   編集 差分 バックアップ 添付 複製 名前変更 リロード   新規 一覧 単語検索 最終更新   ヘルプ   最終更新のRSS