The full text of Dimitri P. Bertsekas's book A Course in Reinforcement Learning is available online for free. It's also available for purchase in print form. About 450 pages. It's the textbook for his course at Arizona State University "Reinforcement Learning and Optimal Control".
I've gone through more than half of Richard Sutton and Andrew Barto's book Reinforcement Learning: An Introduction (though I confess to have 'cheated' and not done all the exercises). It might be worth reading this book, too, to see the same material from an alternate point of view.
"Reinforcement learning can be viewed as the art and science of sequential decision making for large and difficult problems, often in the presence of imprecisely known and changing environment conditions. Dynamic programming is a broad and well-established algorithmic methodology for making optimal sequential decisions, and is the theoretical foundation upon which reinforcement learning rests. This is unlikely to change in the future, despite the rapid pace of technological innovation. In fact, there are strong connections between sequential decision making and the new wave of technological change, generative technology, transformers, GPT applications, and natural language processing ideas, as we will aim to show in this book."
"In dynamic programming there are two principal objects to compute: the optimal value function that provides the optimal cost that can be attained starting from any given initial state, and the optimal policy that provides the optimal decision to apply at any given state and time. Unfortunately, the exact application of dynamic programming runs into formidable computational difficulties, commonly referred to as the curse of dimensionality. To address these, reinforcement learning aims to approximate the optimal value function and policy, by using manageable off-line and/or on-line computation, which often involves neural networks (hence the alternative name Neuro-Dynamic Programming)."
"Thus there are two major methodological approaches in reinforcement learning: approximation in value space, where we approximate in some way the optimal value function, and approximation in policy space, whereby we construct a suboptimal policy by using some form of optimization over a suitably restricted class of policies."
"The book focuses primarily on approximation in value space, with limited coverage of approximation in policy space. However, it is structured so that it can be easily supplemented by an instructor who wishes to go into approximation in policy space in greater detail, using any of a number of available sources."
"An important part of our line of development is a new conceptual framework, which aims to bridge the gaps between the artificial intelligence, control theory, and operations research views of our subject. This framework, the focus of the author's recent monograph 'Lessons from AlphaZero ...',, centers on approximate forms of dynamic programming that are inspired by some of the major successes of reinforcement learning involving games. Primary examples are the recent (2017) AlphaZero program (which plays chess), and the similarly structured and earlier (1990s) TD-Gammon program (which plays backgammon)."
There are no comments yet.