概要 †
(報酬を割り引かないで)獲得報酬の平均を最大化するタイプの強化学習.
手法 †
R-Learning, Modified R-Learning †
R-LearningはSchwartzさんによって提案された最初の平均報酬強化学習. Suttonさんの教科書にも載っている.
SinghさんがR-learningのBellman方程式を修正. ステップごとに平均報酬を更新する.
- A reinforcement learning method for maximizing undiscounted rewards
A. Schwartz
ICML 1993, pp. 298-305 (1993) - Reinforcement learning algorithms for average-payoff Markovian decision processes
S.P. Singh
AAAI 1994, pp. 700-705
A Model-based Algorithm for Bias-optimal †
- Average Reward Reinforcement Learning: Foundations, Algorithms, and Empirical Results
S. Mahadevan
Mach Learn, Vol. 22, No. 1-3, pp. 159-195 (1996)
H-Learning †
- Model-based Average Reward Reinforcement Learning
P. Tadepalli, D. Ok
Artificial Intelligence, Vol. 100, pp. 177-224 (1998) - H-learning: A Reinforcement Learning Method to Optimize Undiscounted Average Reward
P. Tadepalli, D. Ok
Technical Report 94-30-1, Oregon State University, Department of Computer Science (1994)
SMART, Relaxed SMART †
- Reinforcement learning for long-run average cost
A. Gosavi
European Journal of Operational Research, Vol. 155, No. 3, pp. 654-674 (2004) - Solving semi-Markov decision problems using average reward reinforcement learning
Das, T., Gosavi, A., Mahadevan, S., and Marchalleck, N.
Management Science, Vol. 45, No. 4, pp. 560–574 (1999)
Q-P-Learning †
- A Reinforcement Learning Algorithm Based on Policy Iteration for Average Reward: Empirical Results with Yield Management and Convergence Analysis
A. Gosavi
Mach Learn, Vol. 55, No. 1, pp. 5-29 (2004)
HAR Algorithm †
階層型平均報酬強化学習.
- Hierarchical Average Reward Reinforcement Learning
M. Ghavamzadeh and S. Mahadevan
JMLR, Vol. 8, pp. 2629-2669 (2007)