Multi-objective fuzzy Q-learning to solve continuous state-action problems

Many real world problems are multi-objective. Thus, the need for multi-objective learning and optimization algorithms is inevitable. Although the multi-objective optimization algorithms are well-studied, the multi-objective learning algorithms have attracted less attention. In this paper, a fuzzy mu...

Full description

Saved in:
Bibliographic Details
Published inNeurocomputing (Amsterdam) Vol. 516; pp. 115 - 132
Main Authors Asgharnia, Amirhossein, Schwartz, Howard, Atia, Mohamed
Format Journal Article
LanguageEnglish
Published Elsevier B.V 07.01.2023
Subjects
Online AccessGet full text
ISSN0925-2312
1872-8286
DOI10.1016/j.neucom.2022.10.035

Cover

Abstract Many real world problems are multi-objective. Thus, the need for multi-objective learning and optimization algorithms is inevitable. Although the multi-objective optimization algorithms are well-studied, the multi-objective learning algorithms have attracted less attention. In this paper, a fuzzy multi-objective reinforcement learning algorithm is proposed, and we refer to it as the multi-objective fuzzy Q-learning (MOFQL) algorithm. The algorithm is implemented to solve a bi-objective reach-avoid game. The majority of the multi-objective reinforcement algorithms proposed address solving problems in the discrete state-action domain. However, the MOFQL algorithm can also handle problems in a continuous state-action domain. A fuzzy inference system (FIS) is implemented to estimate the value function for the bi-objective problem. We used a temporal difference (TD) approach to update the fuzzy rules. The proposed method is a multi-policy multi-objective algorithm and can find the non-convex regions of the Pareto front.
AbstractList Many real world problems are multi-objective. Thus, the need for multi-objective learning and optimization algorithms is inevitable. Although the multi-objective optimization algorithms are well-studied, the multi-objective learning algorithms have attracted less attention. In this paper, a fuzzy multi-objective reinforcement learning algorithm is proposed, and we refer to it as the multi-objective fuzzy Q-learning (MOFQL) algorithm. The algorithm is implemented to solve a bi-objective reach-avoid game. The majority of the multi-objective reinforcement algorithms proposed address solving problems in the discrete state-action domain. However, the MOFQL algorithm can also handle problems in a continuous state-action domain. A fuzzy inference system (FIS) is implemented to estimate the value function for the bi-objective problem. We used a temporal difference (TD) approach to update the fuzzy rules. The proposed method is a multi-policy multi-objective algorithm and can find the non-convex regions of the Pareto front.
Author Asgharnia, Amirhossein
Atia, Mohamed
Schwartz, Howard
Author_xml – sequence: 1
  givenname: Amirhossein
  surname: Asgharnia
  fullname: Asgharnia, Amirhossein
  email: amirhosseinasgharnia@cmail.carleton.ca
– sequence: 2
  givenname: Howard
  surname: Schwartz
  fullname: Schwartz, Howard
– sequence: 3
  givenname: Mohamed
  surname: Atia
  fullname: Atia, Mohamed
BookMark eNqFkM1KAzEURoNUsK2-gYt5gYz5mZlkXAhStCoVEXQd0kwiGaZJSTKF9ulNqSsXurpw73cufGcGJs47DcA1RiVGuLnpS6dH5TclQYTkVYlofQammDMCOeHNBExRS2pIKCYXYBZjjxBmmLRT8PI6DslCv-61SnanCzMeDvviHQ5aBmfdV5F8Ef2QL8q7ZN3ox1jEJJOGMhPeFdvg14PexEtwbuQQ9dXPnIPPx4ePxRNcvS2fF_crqChqEtQdrdrGKEIkUrRFqKuZZKyuKakNRw3LMUwbxSpOCUUmp6mpJCOc16oxhs5Bdfqrgo8xaCO2wW5k2AuMxNGH6MXJhzj6OG6zj4zd_sKUzTVygxSkHf6D706wzsV2VgcRldVO6c6GLE503v794BsEk4Cv
CitedBy_id crossref_primary_10_1016_j_ijhydene_2024_12_141
crossref_primary_10_1007_s42979_023_01876_0
crossref_primary_10_1007_s10489_024_05906_z
crossref_primary_10_1016_j_neucom_2024_128203
crossref_primary_10_1016_j_jprocont_2023_103063
crossref_primary_10_1016_j_neucom_2024_129194
crossref_primary_10_1016_j_neucom_2024_127491
Cites_doi 10.1609/aaai.v29i1.9617
10.1016/j.neucom.2016.09.141
10.1109/FUZZY.1997.622790
10.1007/BF00992698
10.1145/1390156.1390162
10.1007/s12065-020-00394-9
10.1007/978-3-540-89378-3_37
10.1016/j.neucom.2017.02.096
10.1613/jair.301
10.1613/jair.4961
10.1109/SSCI47803.2020.9308211
10.1109/ICMLA.2012.108
10.1057/jors.1980.114
10.1007/s10994-010-5232-5
10.1016/j.neucom.2016.10.100
10.1109/AISP.2017.8324111
10.1109/4235.996017
10.1016/0305-0548(82)90008-9
10.1613/jair.3987
10.1145/1102351.1102427
10.1016/j.engappai.2009.08.008
10.1007/978-3-642-37140-0_28
10.1109/TCYB.2018.2868715
10.1145/1566374.1566417
10.1109/TEVC.2003.810758
10.1109/TSMC.2014.2358639
ContentType Journal Article
Copyright 2022 Elsevier B.V.
Copyright_xml – notice: 2022 Elsevier B.V.
DBID AAYXX
CITATION
DOI 10.1016/j.neucom.2022.10.035
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1872-8286
EndPage 132
ExternalDocumentID 10_1016_j_neucom_2022_10_035
S0925231222013108
GroupedDBID ---
--K
--M
.DC
.~1
0R~
123
1B1
1~.
1~5
4.4
457
4G.
53G
5VS
7-5
71M
8P~
9JM
9JN
AABNK
AACTN
AADPK
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAXLA
AAXUO
AAYFN
ABBOA
ABCQJ
ABFNM
ABJNI
ABMAC
ABYKQ
ACDAQ
ACGFS
ACRLP
ACZNC
ADBBV
ADEZE
AEBSH
AEKER
AENEX
AFKWA
AFTJW
AFXIZ
AGHFR
AGUBO
AGWIK
AGYEJ
AHHHB
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJOXV
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
AXJTR
BKOJK
BLXMC
CS3
DU5
EBS
EFJIC
EFLBG
EO8
EO9
EP2
EP3
F5P
FDB
FIRID
FNPLU
FYGXN
G-Q
GBLVA
GBOLZ
IHE
J1W
KOM
LG9
M41
MO0
MOBAO
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
Q38
ROL
RPZ
SDF
SDG
SDP
SES
SPC
SPCBC
SSN
SSV
SSZ
T5K
ZMT
~G-
29N
AAQXK
AATTM
AAXKI
AAYWO
AAYXX
ABWVN
ABXDB
ACNNM
ACRPL
ACVFH
ADCNI
ADJOM
ADMUD
ADNMO
AEIPS
AEUPX
AFJKZ
AFPUW
AGCQF
AGQPQ
AGRNS
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
ASPBG
AVWKF
AZFZN
BNPGV
CITATION
EJD
FEDTE
FGOYB
HLZ
HVGLF
HZ~
R2-
RIG
SBC
SEW
SSH
WUQ
XPP
ID FETCH-LOGICAL-c306t-ed3496fc22a0c3900d57a7755325f8067c30136c7483230f3493f4a72885c6ff3
IEDL.DBID AIKHN
ISSN 0925-2312
IngestDate Tue Jul 01 04:24:50 EDT 2025
Thu Apr 24 22:58:49 EDT 2025
Fri Feb 23 02:39:28 EST 2024
IsPeerReviewed true
IsScholarly true
Keywords Q-learning
Multi-objective reinforcement learning
Reinforcement learning
Differential games
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c306t-ed3496fc22a0c3900d57a7755325f8067c30136c7483230f3493f4a72885c6ff3
PageCount 18
ParticipantIDs crossref_primary_10_1016_j_neucom_2022_10_035
crossref_citationtrail_10_1016_j_neucom_2022_10_035
elsevier_sciencedirect_doi_10_1016_j_neucom_2022_10_035
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2023-01-07
PublicationDateYYYYMMDD 2023-01-07
PublicationDate_xml – month: 01
  year: 2023
  text: 2023-01-07
  day: 07
PublicationDecade 2020
PublicationTitle Neurocomputing (Amsterdam)
PublicationYear 2023
Publisher Elsevier B.V
Publisher_xml – name: Elsevier B.V
References Nariman-Zadeh, Salehpour, Jamali, Haghgoo (b0055) 2010; 23
Parisi, Pirotta, Restelli (b0130) 2016; 57
M. Pirotta, S. Parisi, and M. Restelli, Multi-objective reinforcement learning with continuous pareto frontier approximation, Proceedings of the National Conference on Artificial Intelligence, vol. 4, pp. 2928–2934, 2015.
Kaelbling, Littman, Moore (b0005) 1996; 4
Zitzler, Thiele, Laumanns, Fonseca, Da Fonseca (b0155) 2003; 7
Deb, Pratap, Agarwal, Meyarivan (b0165) 2002; 6
Roijers, Vamplew, Whiteson, Dazeley (b0030) 2013; 48
L. Barrett and S. Narayanan, Learning all optimal policies with multiple criteria, Proceedings of the 25th International Conference on Machine Learning, pp. 41–47, 2008.
Ruiz-Montiel, Mandow, Pérez-de-la Cruz (b0145) 2017; 263
A. Asgharnia, H.M. Schwartz, and M. Atia, Deception in a multi-agent adversarial game: the game of guarding several territories, 2020 IEEE Symposium Series on Computational Intelligence, SSCI 2020, pp. 1321–1327, 2020.
Showalter, Schwartz (b0060) 2021; 14
Zhang, Wang, Zhang (b0170) 2018; 49
Brys, Harutyunyan, Vrancx, Nowé, Taylor (b0050) 2017; 263
H. Mossalam, Y.M. Assael, D.M. Roijers, and S. Whiteson, Multi-objective deep reinforcement learning, arXiv preprint arXiv:1610.02707, 2016.
P. Vamplew, J. Yearwood, R. Dazeley, and A. Berry, On the limitations of scalarisation for multi-objective reinforcement learning of pareto fronts, pp. 372–378, 2008.
Watkins, Dayan (b0160) 1992; 8
White (b0035) 1982; 9
Mannor, Shimkin (b0070) 2002; vol. 14
Van Moffaert, Nowé (b0120) 2014; 15
P.Y. Glorennec and L. Jouffe, Fuzzy Q-learning, in: IEEE International Conference on Fuzzy Systems, vol. 2, pp. 659–662, 1997.
Z. Daavarani Asl, V. Derhami, and M. Yazdian-Dehkordi, A new approach on multi-agent multi-objective reinforcement learning based on agents’ preferences, 19th CSI International Symposium on Artificial Intelligence and Signal Processing, AISP 2017, vol. 2018-Janua, pp. 75–79, 2018.
M.A. Khamis and W.E.S.A. Gomaa, Enhanced multiagent multi-objective reinforcement learning for urban traffic light control, Proceedings - 2012 11th International Conference on Machine Learning and Applications, ICMLA 2012, vol. 1, no. 2, pp. 586–591, 2012.
Ng, Harada, Russell (b0015) 1999; 3
M. Babes, E.M. De Cote, and M.L. Littman, Social reward shaping in the Prisoner’s dilemma, Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS, vol. 3, no. Aamas, pp. 1357–1360, 2008.
S. Natarajan and P. Tadepalli, Dynamic preferences in multi-criteria reinforcement learning, in Proceedings of the 22nd international conference on Machine learning, pp. 601–608, 2005.
H. Zhang, D.C. Parkes, and Y. Chen, Policy teaching through reward function learning, Proceedings of the ACM Conference on Electronic Commerce, pp. 295–304, 2009.
Daellenbach, De Kluyver (b0095) 1980
Sutton, Barto (b0010) 2018
Vamplew, Dazeley, Foale (b0140) 2017; 263
A. Castelletti, G. Corani, A.E. Rizzoli, R. Soncini Sessa, and E. Weber, Reinforcement learning in the operational management of a water system, Modelling and Control in Environmental Issues 2001, pp. 325–330, 2002.
K. Van Moffaert, M.M. Drugan, and A. Nowé, Hypervolume-based multi-objective reinforcement learning, pp. 352–366, 2013.
Liu, Xu, Hu (b0075) 2015; 45
Mannor, Shimkin (b0100) 2004; 5
Vamplew, Dazeley, Berry, Issabekov, Dekker (b0110) 2011; 84
10.1016/j.neucom.2022.10.035_b0150
Zitzler (10.1016/j.neucom.2022.10.035_b0155) 2003; 7
10.1016/j.neucom.2022.10.035_b0090
10.1016/j.neucom.2022.10.035_b0135
Vamplew (10.1016/j.neucom.2022.10.035_b0110) 2011; 84
Parisi (10.1016/j.neucom.2022.10.035_b0130) 2016; 57
Daellenbach (10.1016/j.neucom.2022.10.035_b0095) 1980
Brys (10.1016/j.neucom.2022.10.035_b0050) 2017; 263
10.1016/j.neucom.2022.10.035_b0115
Liu (10.1016/j.neucom.2022.10.035_b0075) 2015; 45
White (10.1016/j.neucom.2022.10.035_b0035) 1982; 9
Watkins (10.1016/j.neucom.2022.10.035_b0160) 1992; 8
Kaelbling (10.1016/j.neucom.2022.10.035_b0005) 1996; 4
Nariman-Zadeh (10.1016/j.neucom.2022.10.035_b0055) 2010; 23
Deb (10.1016/j.neucom.2022.10.035_b0165) 2002; 6
10.1016/j.neucom.2022.10.035_b0040
Mannor (10.1016/j.neucom.2022.10.035_b0100) 2004; 5
Sutton (10.1016/j.neucom.2022.10.035_b0010) 2018
10.1016/j.neucom.2022.10.035_b0080
Ruiz-Montiel (10.1016/j.neucom.2022.10.035_b0145) 2017; 263
10.1016/j.neucom.2022.10.035_b0025
Mannor (10.1016/j.neucom.2022.10.035_b0070) 2002; vol. 14
10.1016/j.neucom.2022.10.035_b0125
10.1016/j.neucom.2022.10.035_b0045
10.1016/j.neucom.2022.10.035_b0065
Ng (10.1016/j.neucom.2022.10.035_b0015) 1999; 3
10.1016/j.neucom.2022.10.035_b0085
10.1016/j.neucom.2022.10.035_b0020
Roijers (10.1016/j.neucom.2022.10.035_b0030) 2013; 48
Vamplew (10.1016/j.neucom.2022.10.035_b0140) 2017; 263
Zhang (10.1016/j.neucom.2022.10.035_b0170) 2018; 49
10.1016/j.neucom.2022.10.035_b0105
Showalter (10.1016/j.neucom.2022.10.035_b0060) 2021; 14
Van Moffaert (10.1016/j.neucom.2022.10.035_b0120) 2014; 15
References_xml – volume: 6
  start-page: 182
  year: 2002
  end-page: 197
  ident: b0165
  article-title: A fast and elitist multiobjective genetic algorithm: NSGA-II
  publication-title: IEEE Transactions on Evolutionary Computation
– reference: P.Y. Glorennec and L. Jouffe, Fuzzy Q-learning, in: IEEE International Conference on Fuzzy Systems, vol. 2, pp. 659–662, 1997.
– volume: 5
  start-page: 325
  year: 2004
  end-page: 360
  ident: b0100
  article-title: A geometric approach to multi-criterion reinforcement learning
  publication-title: Journal of Machine Learning Research
– reference: A. Asgharnia, H.M. Schwartz, and M. Atia, Deception in a multi-agent adversarial game: the game of guarding several territories, 2020 IEEE Symposium Series on Computational Intelligence, SSCI 2020, pp. 1321–1327, 2020.
– volume: vol. 14
  year: 2002
  ident: b0070
  publication-title: The steering approach for multi-criteria reinforcement learning
– start-page: 591
  year: 1980
  end-page: 594
  ident: b0095
  article-title: Note on multiple objective dynamic programming
  publication-title: Journal of the Operational Research Society
– volume: 263
  start-page: 74
  year: 2017
  end-page: 86
  ident: b0140
  article-title: Softmax exploration strategies for multiobjective reinforcement learning
  publication-title: Neurocomputing
– reference: M. Babes, E.M. De Cote, and M.L. Littman, Social reward shaping in the Prisoner’s dilemma, Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS, vol. 3, no. Aamas, pp. 1357–1360, 2008.
– reference: K. Van Moffaert, M.M. Drugan, and A. Nowé, Hypervolume-based multi-objective reinforcement learning, pp. 352–366, 2013.
– reference: A. Castelletti, G. Corani, A.E. Rizzoli, R. Soncini Sessa, and E. Weber, Reinforcement learning in the operational management of a water system, Modelling and Control in Environmental Issues 2001, pp. 325–330, 2002.
– year: 2018
  ident: b0010
  article-title: Reinforcement learning: An introduction
– volume: 7
  start-page: 117
  year: 2003
  end-page: 132
  ident: b0155
  article-title: Performance assessment of multiobjective optimizers: An analysis and review
  publication-title: IEEE Transactions on evolutionary computation
– volume: 49
  start-page: 4441
  year: 2018
  end-page: 4449
  ident: b0170
  article-title: Data-based optimal control of multiagent systems: A reinforcement learning design approach
  publication-title: IEEE transactions on cybernetics
– reference: M. Pirotta, S. Parisi, and M. Restelli, Multi-objective reinforcement learning with continuous pareto frontier approximation, Proceedings of the National Conference on Artificial Intelligence, vol. 4, pp. 2928–2934, 2015.
– volume: 8
  start-page: 279
  year: 1992
  end-page: 292
  ident: b0160
  article-title: Q-Learning
  publication-title: Machine learning
– reference: Z. Daavarani Asl, V. Derhami, and M. Yazdian-Dehkordi, A new approach on multi-agent multi-objective reinforcement learning based on agents’ preferences, 19th CSI International Symposium on Artificial Intelligence and Signal Processing, AISP 2017, vol. 2018-Janua, pp. 75–79, 2018.
– reference: M.A. Khamis and W.E.S.A. Gomaa, Enhanced multiagent multi-objective reinforcement learning for urban traffic light control, Proceedings - 2012 11th International Conference on Machine Learning and Applications, ICMLA 2012, vol. 1, no. 2, pp. 586–591, 2012.
– volume: 45
  start-page: 385
  year: 2015
  end-page: 398
  ident: b0075
  article-title: Multiobjective reinforcement learning: A comprehensive overview
  publication-title: IEEE Transactions on Systems, Man, and Cybernetics: Systems
– volume: 3
  start-page: 278
  year: 1999
  end-page: 287
  ident: b0015
  article-title: Policy invariance under reward transformations: Theory and application to reward shaping
  publication-title: Sixteenth International Conference on Machine Learning
– reference: H. Zhang, D.C. Parkes, and Y. Chen, Policy teaching through reward function learning, Proceedings of the ACM Conference on Electronic Commerce, pp. 295–304, 2009.
– volume: 15
  start-page: 3483
  year: 2014
  end-page: 3512
  ident: b0120
  article-title: Multi-objective reinforcement learning using sets of pareto dominating policies
  publication-title: The Journal of Machine Learning Research
– reference: H. Mossalam, Y.M. Assael, D.M. Roijers, and S. Whiteson, Multi-objective deep reinforcement learning, arXiv preprint arXiv:1610.02707, 2016.
– volume: 84
  start-page: 51
  year: 2011
  end-page: 80
  ident: b0110
  article-title: Empirical evaluation methods for multiobjective reinforcement learning algorithms
  publication-title: Machine Learning
– volume: 48
  start-page: 67
  year: 2013
  end-page: 113
  ident: b0030
  article-title: A survey of multi-objective sequential decision-making
  publication-title: Journal of Artificial Intelligence Research
– volume: 263
  start-page: 15
  year: 2017
  end-page: 25
  ident: b0145
  article-title: A temporal difference method for multi-objective reinforcement learning
  publication-title: Neurocomputing
– volume: 23
  start-page: 543
  year: 2010
  end-page: 551
  ident: b0055
  article-title: Pareto optimization of a five-degree of freedom vehicle vibration model using a multi-objective uniform-diversity genetic algorithm (MUGA)
  publication-title: Engineering Applications of Artificial Intelligence
– reference: L. Barrett and S. Narayanan, Learning all optimal policies with multiple criteria, Proceedings of the 25th International Conference on Machine Learning, pp. 41–47, 2008.
– volume: 9
  start-page: 101
  year: 1982
  end-page: 107
  ident: b0035
  article-title: The set of efficient solutions for multiple objective shortest path problems
  publication-title: Computers and Operations Research
– reference: P. Vamplew, J. Yearwood, R. Dazeley, and A. Berry, On the limitations of scalarisation for multi-objective reinforcement learning of pareto fronts, pp. 372–378, 2008.
– reference: S. Natarajan and P. Tadepalli, Dynamic preferences in multi-criteria reinforcement learning, in Proceedings of the 22nd international conference on Machine learning, pp. 601–608, 2005.
– volume: 263
  start-page: 48
  year: 2017
  end-page: 59
  ident: b0050
  article-title: Multi-objectivization and ensembles of shapings in reinforcement learning
  publication-title: Neurocomputing
– volume: 4
  start-page: 237
  year: 1996
  end-page: 285
  ident: b0005
  article-title: Reinforcement learning: A survey
  publication-title: Journal of artificial intelligence research
– volume: 14
  start-page: 1415
  year: 2021
  end-page: 1430
  ident: b0060
  article-title: Neuromodulated multiobjective evolutionary neurocontrollers without speciation
  publication-title: Evolutionary Intelligence
– volume: 57
  start-page: 187
  year: 2016
  end-page: 227
  ident: b0130
  article-title: Multi-objective reinforcement learning through continuous pareto manifold approximation
  publication-title: Journal of Artificial Intelligence Research
– ident: 10.1016/j.neucom.2022.10.035_b0125
  doi: 10.1609/aaai.v29i1.9617
– ident: 10.1016/j.neucom.2022.10.035_b0040
– volume: 263
  start-page: 74
  year: 2017
  ident: 10.1016/j.neucom.2022.10.035_b0140
  article-title: Softmax exploration strategies for multiobjective reinforcement learning
  publication-title: Neurocomputing
  doi: 10.1016/j.neucom.2016.09.141
– volume: vol. 14
  year: 2002
  ident: 10.1016/j.neucom.2022.10.035_b0070
  publication-title: The steering approach for multi-criteria reinforcement learning
– ident: 10.1016/j.neucom.2022.10.035_b0090
  doi: 10.1109/FUZZY.1997.622790
– volume: 8
  start-page: 279
  issue: 3–4
  year: 1992
  ident: 10.1016/j.neucom.2022.10.035_b0160
  article-title: Q-Learning
  publication-title: Machine learning
  doi: 10.1007/BF00992698
– ident: 10.1016/j.neucom.2022.10.035_b0085
  doi: 10.1145/1390156.1390162
– volume: 5
  start-page: 325
  year: 2004
  ident: 10.1016/j.neucom.2022.10.035_b0100
  article-title: A geometric approach to multi-criterion reinforcement learning
  publication-title: Journal of Machine Learning Research
– volume: 14
  start-page: 1415
  issue: 4
  year: 2021
  ident: 10.1016/j.neucom.2022.10.035_b0060
  article-title: Neuromodulated multiobjective evolutionary neurocontrollers without speciation
  publication-title: Evolutionary Intelligence
  doi: 10.1007/s12065-020-00394-9
– ident: 10.1016/j.neucom.2022.10.035_b0045
  doi: 10.1007/978-3-540-89378-3_37
– volume: 263
  start-page: 48
  year: 2017
  ident: 10.1016/j.neucom.2022.10.035_b0050
  article-title: Multi-objectivization and ensembles of shapings in reinforcement learning
  publication-title: Neurocomputing
  doi: 10.1016/j.neucom.2017.02.096
– volume: 4
  start-page: 237
  year: 1996
  ident: 10.1016/j.neucom.2022.10.035_b0005
  article-title: Reinforcement learning: A survey
  publication-title: Journal of artificial intelligence research
  doi: 10.1613/jair.301
– ident: 10.1016/j.neucom.2022.10.035_b0135
– volume: 57
  start-page: 187
  year: 2016
  ident: 10.1016/j.neucom.2022.10.035_b0130
  article-title: Multi-objective reinforcement learning through continuous pareto manifold approximation
  publication-title: Journal of Artificial Intelligence Research
  doi: 10.1613/jair.4961
– ident: 10.1016/j.neucom.2022.10.035_b0105
  doi: 10.1109/SSCI47803.2020.9308211
– ident: 10.1016/j.neucom.2022.10.035_b0020
– ident: 10.1016/j.neucom.2022.10.035_b0065
  doi: 10.1109/ICMLA.2012.108
– start-page: 591
  year: 1980
  ident: 10.1016/j.neucom.2022.10.035_b0095
  article-title: Note on multiple objective dynamic programming
  publication-title: Journal of the Operational Research Society
  doi: 10.1057/jors.1980.114
– volume: 84
  start-page: 51
  issue: 1–2
  year: 2011
  ident: 10.1016/j.neucom.2022.10.035_b0110
  article-title: Empirical evaluation methods for multiobjective reinforcement learning algorithms
  publication-title: Machine Learning
  doi: 10.1007/s10994-010-5232-5
– volume: 263
  start-page: 15
  year: 2017
  ident: 10.1016/j.neucom.2022.10.035_b0145
  article-title: A temporal difference method for multi-objective reinforcement learning
  publication-title: Neurocomputing
  doi: 10.1016/j.neucom.2016.10.100
– ident: 10.1016/j.neucom.2022.10.035_b0150
  doi: 10.1109/AISP.2017.8324111
– volume: 6
  start-page: 182
  issue: 2
  year: 2002
  ident: 10.1016/j.neucom.2022.10.035_b0165
  article-title: A fast and elitist multiobjective genetic algorithm: NSGA-II
  publication-title: IEEE Transactions on Evolutionary Computation
  doi: 10.1109/4235.996017
– volume: 9
  start-page: 101
  issue: 2
  year: 1982
  ident: 10.1016/j.neucom.2022.10.035_b0035
  article-title: The set of efficient solutions for multiple objective shortest path problems
  publication-title: Computers and Operations Research
  doi: 10.1016/0305-0548(82)90008-9
– volume: 48
  start-page: 67
  year: 2013
  ident: 10.1016/j.neucom.2022.10.035_b0030
  article-title: A survey of multi-objective sequential decision-making
  publication-title: Journal of Artificial Intelligence Research
  doi: 10.1613/jair.3987
– ident: 10.1016/j.neucom.2022.10.035_b0080
  doi: 10.1145/1102351.1102427
– volume: 23
  start-page: 543
  issue: 4
  year: 2010
  ident: 10.1016/j.neucom.2022.10.035_b0055
  article-title: Pareto optimization of a five-degree of freedom vehicle vibration model using a multi-objective uniform-diversity genetic algorithm (MUGA)
  publication-title: Engineering Applications of Artificial Intelligence
  doi: 10.1016/j.engappai.2009.08.008
– volume: 3
  start-page: 278
  year: 1999
  ident: 10.1016/j.neucom.2022.10.035_b0015
  article-title: Policy invariance under reward transformations: Theory and application to reward shaping
  publication-title: Sixteenth International Conference on Machine Learning
– ident: 10.1016/j.neucom.2022.10.035_b0115
  doi: 10.1007/978-3-642-37140-0_28
– year: 2018
  ident: 10.1016/j.neucom.2022.10.035_b0010
– volume: 49
  start-page: 4441
  issue: 12
  year: 2018
  ident: 10.1016/j.neucom.2022.10.035_b0170
  article-title: Data-based optimal control of multiagent systems: A reinforcement learning design approach
  publication-title: IEEE transactions on cybernetics
  doi: 10.1109/TCYB.2018.2868715
– ident: 10.1016/j.neucom.2022.10.035_b0025
  doi: 10.1145/1566374.1566417
– volume: 7
  start-page: 117
  issue: 2
  year: 2003
  ident: 10.1016/j.neucom.2022.10.035_b0155
  article-title: Performance assessment of multiobjective optimizers: An analysis and review
  publication-title: IEEE Transactions on evolutionary computation
  doi: 10.1109/TEVC.2003.810758
– volume: 15
  start-page: 3483
  issue: 1
  year: 2014
  ident: 10.1016/j.neucom.2022.10.035_b0120
  article-title: Multi-objective reinforcement learning using sets of pareto dominating policies
  publication-title: The Journal of Machine Learning Research
– volume: 45
  start-page: 385
  issue: 3
  year: 2015
  ident: 10.1016/j.neucom.2022.10.035_b0075
  article-title: Multiobjective reinforcement learning: A comprehensive overview
  publication-title: IEEE Transactions on Systems, Man, and Cybernetics: Systems
  doi: 10.1109/TSMC.2014.2358639
SSID ssj0017129
Score 2.4242074
Snippet Many real world problems are multi-objective. Thus, the need for multi-objective learning and optimization algorithms is inevitable. Although the...
SourceID crossref
elsevier
SourceType Enrichment Source
Index Database
Publisher
StartPage 115
SubjectTerms Differential games
Multi-objective reinforcement learning
Q-learning
Reinforcement learning
Title Multi-objective fuzzy Q-learning to solve continuous state-action problems
URI https://dx.doi.org/10.1016/j.neucom.2022.10.035
Volume 516
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3PS8MwFH7M7eLF3-L8RQ5es7VJ07THMRxzwkB0sFvp0kQ2pBvaHdzBv92XNh0KouCxaR6U1-R9L-333gdwIyxI-7GiAvGMBlpkNNYYDGXGPK3wOlIly3ccDifBaCqmDejXtTCWVulifxXTy2jtRrrOm93VfN599GKGpygfAc72jLEFvy3G41A0odW7ux-Otz8TpM-qlntMUGtQV9CVNK9cry1thCGWdSzNq9R9-wGhvqDO4AD2XLpIetUTHUJD50ewX0sxELczj2FUFtLS5WxRBTBi1pvNO3mgThXimRRLgssM71hy-jxf44mflNVEtCptIE5a5u0EJoPbp_6QOpkEqjDfL6jObNN3oxhLPcVjz8uETKUU-BqEiRCNcJrPQyUD3L3cMzibmyCVLIqECo3hp9DMl7k-A5JpYbJUKZNKFTBPpgHnqZmJKMS0KZBpG3jtmkS5HuJWyuIlqclii6RyaGIdakfRoW2gW6tV1UPjj_my9nrybS0kGOZ_tTz_t-UF7Foh-fLjiryEZvG61leYbhSza9jpfPjXblF9AmiW1C8
linkProvider Elsevier
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3PS8MwFA5jHvTib3H-zMFrtjZpmvYowzHnHIgb7Fa6NJGJdEPbgzv4t_uSpkNBFDw2fYHykrzvpXzvfQhdcQPSfiwJBzwjgeIZiRUEQ5FRT0l4jqRl-Y7C_iQYTPm0gbp1LYyhVbrYX8V0G63dSMd5s7OczzuPXkzhFuUDwJmeMabgdyPgTBheX_tjzfPwhU-rhnuUE2Ne189ZkleuSkMaoYBkbUPysqpvP-DTF8zp7aJtlyzi6-p79lBD5ftopxZiwO5cHqCBLaMli9lzFb6wLlerd_xAnCbEEy4WGDYZvDHU9Hlewn0f21oiUhU2YCcs83aIJr2bcbdPnEgCkZDtF0RlpuW7lpSmnmSx52VcpEJwWASuI8AiMPNZKEUAZ5d5GqyZDlJBo4jLUGt2hJr5IlfHCGeK6yyVUqdCBtQTacBYqmc8CiFpCkTaQqx2TSJdB3EjZPGS1FSx56RyaGIcakbBoS1E1rOWVQeNP-xF7fXk205IIMj_OvPk3zMv0WZ_fD9Mhreju1O0ZSTl7W8WcYaaxWupziHxKGYXdmN9AjCk1Po
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Multi-objective+fuzzy+Q-learning+to+solve+continuous+state-action+problems&rft.jtitle=Neurocomputing+%28Amsterdam%29&rft.au=Asgharnia%2C+Amirhossein&rft.au=Schwartz%2C+Howard&rft.au=Atia%2C+Mohamed&rft.date=2023-01-07&rft.pub=Elsevier+B.V&rft.issn=0925-2312&rft.eissn=1872-8286&rft.volume=516&rft.spage=115&rft.epage=132&rft_id=info:doi/10.1016%2Fj.neucom.2022.10.035&rft.externalDocID=S0925231222013108
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0925-2312&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0925-2312&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0925-2312&client=summon