Stochastic Learning and Optimization A Sensitivity-Based Approach

"Performance optimization is vital in the design and operation of modern engineering systems, including communications, manufacturing, robotics, and logistics. Most engineering systems are too complicated to model, or the system parameters cannot be easily identified, so learning techniques hav...

Full description

Saved in:
Bibliographic Details
Main Author Cao, Xi-Ren
Format eBook Book
LanguageEnglish
Published New York Springer-Verlag 2007
Springer
Springer US
Edition1. Aufl.
Subjects
Online AccessGet full text

Cover

Loading…
Table of Contents:
  • Performance Difference Formulas -- Performance Derivative Formulas -- Optimization -- Learning: Estimating Aggregated Potentials -- Aggregated Potentials -- Aggregated Potentials in the Event-Based Optimization -- Applications and Examples -- Manufacturing -- Service Rate Control -- General Applications -- Problems -- Constructing Sensitivity Formulas -- Motivation -- Markov Chains on the Same State Space -- Event-Based Systems -- Sample-Path Construction* -- Parameterized Systems: An Example -- Markov Chains with Different State Spaces* -- One Is a Subspace of the Other* -- A More General Case -- Summary -- Problems -- Part III Appendices: Mathematical Background -- Probability and Markov Processes -- Probability -- Markov Processes -- Problems -- Stochastic Matrices -- Canonical Form -- Eigenvalues -- The Limiting Matrix -- Problems -- Queueing Theory -- Single-Server Queues -- Queueing Networks -- Some Useful Techniques -- Problems -- Notation and Abbreviations -- References -- Index
  • MDPs with Discounted Rewards -- The nth-Bias Optimization* -- nth-Bias Difference Formulas* -- Optimality Equations* -- Policy Iteration* -- nth-Bias Optimal Policy Spaces* -- Problems -- Sample-Path-Based Policy Iteration -- Motivation -- Convergence Properties -- Convergence of Potential Estimates -- Sample Paths with a Fixed Number of Regenerative Periods -- Sample Paths with Increasing Lengths -- ``Fast" Algorithms* -- The Algorithm That Stops in a Finite Number of Periods* -- With Stochastic Approximation* -- Problems -- Reinforcement Learning -- Stochastic Approximation -- Finding the Zeros of a Function Recursively -- Estimating Mean Values -- Temporal Difference Methods -- TD Methods for Potentials -- Q-Factors and Other Extensions -- TD Methods for Performance Derivatives -- TD Methods and Performance Optimization -- PA-Based Optimization -- Q-Learning -- Optimistic On-Line Policy Iteration -- Value Iteration -- Summary of the Learning and Optimization Methods -- Problems -- Adaptive Control Problems as MDPs -- Control Problems and MDPs -- Control Systems Modelled as MDPs -- A Comparison of the Two Approaches -- MDPs with Continuous State Spaces -- Operators on Continuous Spaces -- Potentials and Policy Iteration -- Linear Control Systems and the Riccati Equation -- The LQ Problem -- The JLQ Problem* -- On-Line Optimization and Adaptive Control -- Discretization and Estimation -- Discussion -- Problems -- Part II The Event-Based Optimization - A New Approach -- Event-Based Optimization of Markov Systems -- An Overview -- Summary of Previous Chapters -- An Overview of the Event-Based Approach -- Events Associated with Markov Chains -- The Event and Event Space -- The Probabilities of Events -- The Basic Ideas Illustrated by Examples -- Classification of Three Types of Events -- Event-Based Optimization -- The Problem Formulation
  • Intro -- Preface -- Contents -- Introduction -- An Overview of Learning and Optimization -- Problem Description -- Optimal Policies -- Fundamental Limitations of Learning and Optimization -- A Sensitivity-Based View of Learning and Optimization -- Problem Formulations in Different Disciplines -- Perturbation Analysis (PA) -- Markov Decision Processes (MDPs) -- Reinforcement Learning (RL) -- Identification and Adaptive Control (I&amp -- AC) -- Event-Based Optimization and Potential Aggregation -- A Map of the Learning and Optimization World -- Terminology and Notation -- Problems -- Part I Four Disciplines in Learning and Optimization -- Perturbation Analysis -- Perturbation Analysis of Markov Chains -- Constructing a Perturbed Sample Path -- Perturbation Realization Factors and Performance Potentials -- Performance Derivative Formulas -- Gradients with Discounted Reward Criteria -- Higher-Order Derivatives and the MacLaurin Series -- Performance Sensitivities of Markov Processes -- Performance Sensitivities of Semi-Markov Processes* -- Fundamentals for Semi-Markov Processes* -- Performance Sensitivity Formulas* -- Perturbation Analysis of Queueing Systems -- Constructing a Perturbed Sample Path -- Perturbation Realization -- Performance Derivatives -- Remarks on Theoretical Issues* -- Other Methods* -- Problems -- Learning and Optimization with Perturbation Analysis -- The Potentials -- Numerical Methods -- Learning Potentials from Sample Paths -- Coupling* -- Performance Derivatives -- Estimating through Potentials -- Learning Directly -- Optimization with PA -- Gradient Methods and Stochastic Approximation -- Optimization with Long Sample Paths -- Applications -- Problems -- Markov Decision Processes -- Ergodic Chains -- Policy Iteration -- Bias Optimality -- MDPs with Discounted Rewards -- Multi-Chains -- Policy Iteration -- Bias Optimality