Data-Efficient Reinforcement Learning for Energy Optimization of Power-Assisted Wheelchairs

The objective of this paper is to develop a method for assisting users to push power-assisted wheelchairs (PAWs) in such a way that the electrical energy consumption over a predefined distance-to-go is optimal, while at the same time bringing users to a desired fatigue level. This assistive task is...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on industrial electronics (1982) Vol. 66; no. 12; pp. 9734 - 9744
Main Authors	Feng, Guoxi, Busoniu, Lucian, Guerra, Thierry-Marie, Mohammad, Sami
Format	Journal Article
Language	English
Published	New York IEEE 01.12.2019 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Adaptation models Algorithms Assistive control Computer simulation disabled persons Energy consumption Fatigue Force Heuristic algorithms Iterative methods Machine learning Markov chains Numerical models Optimal control Optimization Parameterization power-assisted wheelchairs reinforcement learning Wheelchairs
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The objective of this paper is to develop a method for assisting users to push power-assisted wheelchairs (PAWs) in such a way that the electrical energy consumption over a predefined distance-to-go is optimal, while at the same time bringing users to a desired fatigue level. This assistive task is formulated as an optimal control problem and solved by Feng et al. using the model-free approach gradient of partially observable Markov decision processes. To increase the data efficiency of the model-free framework, we here propose to use policy learning by weighting exploration with the returns (PoWER) with 25 control parameters. Moreover, we provide a new near-optimality analysis of the finite-horizon fuzzy Q -iteration, which derives a model-based baseline solution to verify numerically the near-optimality of the presented model-free approaches. Simulation results show that the PoWER algorithm with the new parameterization converges to a near-optimal solution within 200 trials and possesses the adaptability to cope with changes of the human fatigue dynamics. Finally, 24 experimental trials are carried out on the PAW system, with fatigue feedback provided by the user via a joystick. The performance tends to increase gradually after learning. The results obtained demonstrate the effectiveness and the feasibility of PoWER in our application.
ISSN:	0278-0046 1557-9948
DOI:	10.1109/TIE.2019.2903751