Max Weight Learning Algorithms for Scheduling in Unknown Environments

We consider a discrete time queueing system where a controller makes a 2-stage decision every slot. The decision at the first stage reveals a hidden source of randomness with a control-dependent (but unknown) probability distribution. The decision at the second stage generates an attribute vector th...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on automatic control Vol. 57; no. 5; pp. 1179 - 1191
Main Authors	Neely, M. J., Rager, S. T., La Porta, T. F.
Format	Journal Article
Language	English
Published	New York, NY IEEE 01.05.2012 Institute of Electrical and Electronics Engineers The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithms Applied sciences Artificial intelligence Channel estimation Channels Computer science; control theory; systems Computer systems and distributed systems. User interface Decision theory. Utility theory Dynamic scheduling Exact sciences and technology Joints Mathematical analysis Operational research and scientific management Operational research. Management science Opportunistic routing Optimization Optimized production technology overhead and feedback queueing analysis Queues Queuing theory. Traffic theory Randomness Routing Scheduling Software Vectors Vectors (mathematics) wireless networks Averaging method Utility function Feedback regulation Probability distribution Network management Multiple decision Opportunistic routing Feedback queueing analysis Dynamic programming Learning algorithm Queue Mathematical programming Multiple access overhead and feedback Decision making Discrete time systems Routing wireless networks Scheduling Distributed system Random distribution Stochastic programming Utility theory Time average Randomness Wireless network Artificial intelligence
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We consider a discrete time queueing system where a controller makes a 2-stage decision every slot. The decision at the first stage reveals a hidden source of randomness with a control-dependent (but unknown) probability distribution. The decision at the second stage generates an attribute vector that depends on this revealed randomness. The goal is to stabilize all queues and optimize a utility function of time average attributes, subject to an additional set of time average constraints. This setting fits a wide class of stochastic optimization problems, including multi-user wireless scheduling with dynamic channel measurement decisions, and wireless multi-hop routing with multi-receiver diversity and opportunistic routing decisions. We develop a simple max-weight algorithm that learns efficient behavior by averaging functionals of previous outcomes.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 ObjectType-Article-2 ObjectType-Feature-1 content type line 23
ISSN:	0018-9286 1558-2523
DOI:	10.1109/TAC.2012.2191874