Reinforcement-Learning-Based Tracking Control of Waste Water Treatment Process Under Realistic System Conditions and Control Performance Requirements

The tracking control of a wastewater treatment process (WWTP) is considered. The process is highly nonlinear, with strong coupling, difficult to model mathematically, and the operation is subject to unknown disturbances. We address this multivariable tracking control problem by applying the direct h...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on systems, man, and cybernetics. Systems Vol. 52; no. 8; pp. 5284 - 5294
Main Authors	Yang, Qinmin, Cao, Weiwei, Meng, Wenchao, Si, Jennie
Format	Journal Article
Language	English
Published	New York IEEE 01.08.2022 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Action strategy approximation Aerodynamics Control methods Control systems cost function estimation Coupling Couplings direct heuristic dynamic programming (direct HDP or dHDP) Dissolved oxygen Distance learning Disturbances Dynamic programming Flow velocity Microorganisms Multivariable control online learning Optimal control Oxygen transfer Process control Tracking control Tracking errors Wastewater Wastewater treatment wastewater treatment process (WWTP) Water treatment
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The tracking control of a wastewater treatment process (WWTP) is considered. The process is highly nonlinear, with strong coupling, difficult to model mathematically, and the operation is subject to unknown disturbances. We address this multivariable tracking control problem by applying the direct heuristic dynamic programming (dHDP)-based reinforcement learning control. The control goal is to track a desired reference of the dissolved oxygen (DO) concentration of the 5th aerobic zone (<inline-formula> <tex-math notation="LaTeX">S_{O5} </tex-math></inline-formula>) and nitrate concentration of the 2nd anoxic zone (<inline-formula> <tex-math notation="LaTeX">S_{NO2} </tex-math></inline-formula>) by manipulating the oxygen transfer coefficient of the 5th aerobic zone (<inline-formula> <tex-math notation="LaTeX">K_{L}a_{5} </tex-math></inline-formula>) and internal recycle flow rate (<inline-formula> <tex-math notation="LaTeX">Q_{a} </tex-math></inline-formula>). The dHDP aims at achieving a minimal accumulated WWTP tracking error while dealing with strong coupling between the <inline-formula> <tex-math notation="LaTeX">S_{O5} </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">S_{NO2} </tex-math></inline-formula> and eliminating unknown disturbances in the process. The proposed dHDP approach devises an optimal control strategy entirely driven by WWTP process data as an online learning control method. We have conducted extensive and systematic simulations based on the well-known BSM1 platform of the WWTP controlled by dHDP to compare and contrast performances with other methods.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2168-2216 2168-2232
DOI:	10.1109/TSMC.2021.3122802