Data-Efficient Reinforcement Learning for Energy Optimization of Power-Assisted Wheelchairs

The objective of this paper is to develop a method for assisting users to push power-assisted wheelchairs (PAWs) in such a way that the electrical energy consumption over a predefined distance-to-go is optimal, while at the same time bringing users to a desired fatigue level. This assistive task is...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on industrial electronics (1982) Vol. 66; no. 12; pp. 9734 - 9744
Main Authors Feng, Guoxi, Busoniu, Lucian, Guerra, Thierry-Marie, Mohammad, Sami
Format Journal Article
LanguageEnglish
Published New York IEEE 01.12.2019
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The objective of this paper is to develop a method for assisting users to push power-assisted wheelchairs (PAWs) in such a way that the electrical energy consumption over a predefined distance-to-go is optimal, while at the same time bringing users to a desired fatigue level. This assistive task is formulated as an optimal control problem and solved by Feng et al. using the model-free approach gradient of partially observable Markov decision processes. To increase the data efficiency of the model-free framework, we here propose to use policy learning by weighting exploration with the returns (PoWER) with 25 control parameters. Moreover, we provide a new near-optimality analysis of the finite-horizon fuzzy Q -iteration, which derives a model-based baseline solution to verify numerically the near-optimality of the presented model-free approaches. Simulation results show that the PoWER algorithm with the new parameterization converges to a near-optimal solution within 200 trials and possesses the adaptability to cope with changes of the human fatigue dynamics. Finally, 24 experimental trials are carried out on the PAW system, with fatigue feedback provided by the user via a joystick. The performance tends to increase gradually after learning. The results obtained demonstrate the effectiveness and the feasibility of PoWER in our application.
ISSN:0278-0046
1557-9948
DOI:10.1109/TIE.2019.2903751