強化学習/平均報酬強化学習
をテンプレートにして作成
開始行:
*概要 [#lbacbc1b]
(報酬を割り引かないで)獲得報酬の平均を最大化するタイプ...
*手法 [#mf01e27a]
**R-Learning, Modified R-Learning [#r6f8a9ec]
R-LearningはSchwartzさんによって提案された最初の平均報酬...
Suttonさんの教科書にも載っている.
SinghさんがR-learningのBellman方程式を修正.
ステップごとに平均報酬を更新する.
-''A reinforcement learning method for maximizing undisco...
A. Schwartz~
ICML 1993, pp. 298-305 (1993)
-''Reinforcement learning algorithms for average-payoff M...
S.P. Singh~
AAAI 1994, pp. 700-705
**A Model-based Algorithm for Bias-optimal [#w5c468a4]
-''Average Reward Reinforcement Learning: Foundations, Al...
S. Mahadevan~
Mach Learn, Vol. 22, No. 1-3, pp. 159-195 (1996)
**H-Learning [#zdd24677]
-''Model-based Average Reward Reinforcement Learning''~
P. Tadepalli, D. Ok~
Artificial Intelligence, Vol. 100, pp. 177-224 (1998)
-''H-learning: A Reinforcement Learning Method to Optimiz...
P. Tadepalli, D. Ok~
Technical Report 94-30-1, Oregon State University, Depart...
**SMART, Relaxed SMART [#w9abd3f5]
-[[''Reinforcement learning for long-run average cost'':h...
A. Gosavi~
European Journal of Operational Research, Vol. 155, No. 3...
-[[''Solving semi-Markov decision problems using average ...
Das, T., Gosavi, A., Mahadevan, S., and Marchalleck, N.~
Management Science, Vol. 45, No. 4, pp. 560–574 (1999)
**Q-P-Learning [#lfa776cc]
-[[''A Reinforcement Learning Algorithm Based on Policy I...
A. Gosavi~
Mach Learn, Vol. 55, No. 1, pp. 5-29 (2004)
**HAR Algorithm [#o7bbc23a]
階層型平均報酬強化学習.
-[[''Hierarchical Average Reward Reinforcement Learning''...
M. Ghavamzadeh and S. Mahadevan~
JMLR, Vol. 8, pp. 2629-2669 (2007)
終了行:
*概要 [#lbacbc1b]
(報酬を割り引かないで)獲得報酬の平均を最大化するタイプ...
*手法 [#mf01e27a]
**R-Learning, Modified R-Learning [#r6f8a9ec]
R-LearningはSchwartzさんによって提案された最初の平均報酬...
Suttonさんの教科書にも載っている.
SinghさんがR-learningのBellman方程式を修正.
ステップごとに平均報酬を更新する.
-''A reinforcement learning method for maximizing undisco...
A. Schwartz~
ICML 1993, pp. 298-305 (1993)
-''Reinforcement learning algorithms for average-payoff M...
S.P. Singh~
AAAI 1994, pp. 700-705
**A Model-based Algorithm for Bias-optimal [#w5c468a4]
-''Average Reward Reinforcement Learning: Foundations, Al...
S. Mahadevan~
Mach Learn, Vol. 22, No. 1-3, pp. 159-195 (1996)
**H-Learning [#zdd24677]
-''Model-based Average Reward Reinforcement Learning''~
P. Tadepalli, D. Ok~
Artificial Intelligence, Vol. 100, pp. 177-224 (1998)
-''H-learning: A Reinforcement Learning Method to Optimiz...
P. Tadepalli, D. Ok~
Technical Report 94-30-1, Oregon State University, Depart...
**SMART, Relaxed SMART [#w9abd3f5]
-[[''Reinforcement learning for long-run average cost'':h...
A. Gosavi~
European Journal of Operational Research, Vol. 155, No. 3...
-[[''Solving semi-Markov decision problems using average ...
Das, T., Gosavi, A., Mahadevan, S., and Marchalleck, N.~
Management Science, Vol. 45, No. 4, pp. 560–574 (1999)
**Q-P-Learning [#lfa776cc]
-[[''A Reinforcement Learning Algorithm Based on Policy I...
A. Gosavi~
Mach Learn, Vol. 55, No. 1, pp. 5-29 (2004)
**HAR Algorithm [#o7bbc23a]
階層型平均報酬強化学習.
-[[''Hierarchical Average Reward Reinforcement Learning''...
M. Ghavamzadeh and S. Mahadevan~
JMLR, Vol. 8, pp. 2629-2669 (2007)
ページ名: