強化学習#

読み: きょうかがくしゅう
英名: Reinforcement Learning

強化学習とは最適な意思決定ルールを得る手法の一つで，意思決定者が未知の環境下で逐次的にシステムの観測と意思決定を繰り返していきながら「報酬」を最大化するような行動のルールを推定する手法 [1] である．

強化学習においては「エージェント」と「環境」が設定される．環境には「状態」があり，状態に応じてエージェントは「行動」を決定する．エージェントの行動によって環境の状態は更新され，エージェントは報酬（即時報酬）を得る．報酬を長期的観点から最大化されるようなエージェントの行動ルールを推定するのが強化学習である．

強化学習は近年深層学習と組み合わせることにより発展が進み，ゲームへの応用 [2] を初め様々な応用分野がある．最適化への応用としては，巡回セールスマン問題への応用 [3] やグラフ上の組合せ最適化問題への応用 [4] がある．最適化ソルバへの応用としては分枝変数選択 [5] や切除平面選択 [6] などへの適用の研究が進んでいる．

参考文献

[1]

組合せ最適化ソリューション（強化学習AI×シミュレーション）. URL: https://www.msi.co.jp/solution/s4/ReinforcementLearning.html.

[2]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing Atari with Deep Reinforcement Learning. NIPS Deep Learning Workshop, dec 2013. URL: http://arxiv.org/abs/1312.5602, arXiv:1312.5602.

[3]

Irwan Bello, Hieu Pham, Quoc V. Le, Mohammad Norouzi, and Samy Bengio. Neural Combinatorial Optimization with Reinforcement Learning. International Conference on Learning Representations, nov 2016. URL: http://arxiv.org/abs/1611.09940, arXiv:1611.09940.

[4]

Elias Khalil, Hanjun Dai, Yuyu Zhang, Bistra Dilkina, and Le Song. Learning Combinatorial Optimization Algorithms over Graphs. In I Guyon, U Von Luxburg, S Bengio, H Wallach, R Fergus, S Vishwanathan, and R Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL: https://proceedings.neurips.cc/paper_files/paper/2017/file/d9896106ca98d3d05b8cbdf4fd8b13a1-Paper.pdf.

[5]

Elias Khalil, Pierre Le Bodic, Le Song, George Nemhauser, and Bistra Dilkina. Learning to Branch in Mixed Integer Programming. Proceedings of the AAAI Conference on Artificial Intelligence, feb 2016. URL: https://ojs.aaai.org/index.php/AAAI/article/view/10080, doi:10.1609/aaai.v30i1.10080.

[6]

Yunhao Tang, Shipra Agrawal, and Yuri Faenza. Reinforcement Learning for Integer Programming: Learning to Cut. Proceedings of Machine Learning Research, jun 2019. URL: http://arxiv.org/abs/1906.04859, arXiv:1906.04859.