Multi-objective fuzzy Q-learning to solve continuous state-action problems

Many real world problems are multi-objective. Thus, the need for multi-objective learning and optimization algorithms is inevitable. Although the multi-objective optimization algorithms are well-studied, the multi-objective learning algorithms have attracted less attention. In this paper, a fuzzy mu...

Full description

Saved in:

Bibliographic Details
Published in	Neurocomputing (Amsterdam) Vol. 516; pp. 115 - 132
Main Authors	Asgharnia, Amirhossein, Schwartz, Howard, Atia, Mohamed
Format	Journal Article
Language	English
Published	Elsevier B.V 07.01.2023
Subjects	Differential games Multi-objective reinforcement learning Q-learning Reinforcement learning Q-learning Multi-objective reinforcement learning Reinforcement learning Differential games
Online Access	Get full text
ISSN	0925-2312 1872-8286
DOI	10.1016/j.neucom.2022.10.035

Cover

Abstract	Many real world problems are multi-objective. Thus, the need for multi-objective learning and optimization algorithms is inevitable. Although the multi-objective optimization algorithms are well-studied, the multi-objective learning algorithms have attracted less attention. In this paper, a fuzzy multi-objective reinforcement learning algorithm is proposed, and we refer to it as the multi-objective fuzzy Q-learning (MOFQL) algorithm. The algorithm is implemented to solve a bi-objective reach-avoid game. The majority of the multi-objective reinforcement algorithms proposed address solving problems in the discrete state-action domain. However, the MOFQL algorithm can also handle problems in a continuous state-action domain. A fuzzy inference system (FIS) is implemented to estimate the value function for the bi-objective problem. We used a temporal difference (TD) approach to update the fuzzy rules. The proposed method is a multi-policy multi-objective algorithm and can find the non-convex regions of the Pareto front.
AbstractList	Many real world problems are multi-objective. Thus, the need for multi-objective learning and optimization algorithms is inevitable. Although the multi-objective optimization algorithms are well-studied, the multi-objective learning algorithms have attracted less attention. In this paper, a fuzzy multi-objective reinforcement learning algorithm is proposed, and we refer to it as the multi-objective fuzzy Q-learning (MOFQL) algorithm. The algorithm is implemented to solve a bi-objective reach-avoid game. The majority of the multi-objective reinforcement algorithms proposed address solving problems in the discrete state-action domain. However, the MOFQL algorithm can also handle problems in a continuous state-action domain. A fuzzy inference system (FIS) is implemented to estimate the value function for the bi-objective problem. We used a temporal difference (TD) approach to update the fuzzy rules. The proposed method is a multi-policy multi-objective algorithm and can find the non-convex regions of the Pareto front.
Author	Asgharnia, Amirhossein Atia, Mohamed Schwartz, Howard
Author_xml	– sequence: 1 givenname: Amirhossein surname: Asgharnia fullname: Asgharnia, Amirhossein email: amirhosseinasgharnia@cmail.carleton.ca – sequence: 2 givenname: Howard surname: Schwartz fullname: Schwartz, Howard – sequence: 3 givenname: Mohamed surname: Atia fullname: Atia, Mohamed
BookMark	eNqFkM1KAzEURoNUsK2-gYt5gYz5mZlkXAhStCoVEXQd0kwiGaZJSTKF9ulNqSsXurpw73cufGcGJs47DcA1RiVGuLnpS6dH5TclQYTkVYlofQammDMCOeHNBExRS2pIKCYXYBZjjxBmmLRT8PI6DslCv-61SnanCzMeDvviHQ5aBmfdV5F8Ef2QL8q7ZN3ox1jEJJOGMhPeFdvg14PexEtwbuQQ9dXPnIPPx4ePxRNcvS2fF_crqChqEtQdrdrGKEIkUrRFqKuZZKyuKakNRw3LMUwbxSpOCUUmp6mpJCOc16oxhs5Bdfqrgo8xaCO2wW5k2AuMxNGH6MXJhzj6OG6zj4zd_sKUzTVygxSkHf6D706wzsV2VgcRldVO6c6GLE503v794BsEk4Cv
CitedBy_id	crossref_primary_10_1016_j_ijhydene_2024_12_141 crossref_primary_10_1007_s42979_023_01876_0 crossref_primary_10_1007_s10489_024_05906_z crossref_primary_10_1016_j_neucom_2024_128203 crossref_primary_10_1016_j_jprocont_2023_103063 crossref_primary_10_1016_j_neucom_2024_129194 crossref_primary_10_1016_j_neucom_2024_127491
Cites_doi	10.1609/aaai.v29i1.9617 10.1016/j.neucom.2016.09.141 10.1109/FUZZY.1997.622790 10.1007/BF00992698 10.1145/1390156.1390162 10.1007/s12065-020-00394-9 10.1007/978-3-540-89378-3_37 10.1016/j.neucom.2017.02.096 10.1613/jair.301 10.1613/jair.4961 10.1109/SSCI47803.2020.9308211 10.1109/ICMLA.2012.108 10.1057/jors.1980.114 10.1007/s10994-010-5232-5 10.1016/j.neucom.2016.10.100 10.1109/AISP.2017.8324111 10.1109/4235.996017 10.1016/0305-0548(82)90008-9 10.1613/jair.3987 10.1145/1102351.1102427 10.1016/j.engappai.2009.08.008 10.1007/978-3-642-37140-0_28 10.1109/TCYB.2018.2868715 10.1145/1566374.1566417 10.1109/TEVC.2003.810758 10.1109/TSMC.2014.2358639
ContentType	Journal Article
Copyright	2022 Elsevier B.V.
Copyright_xml	– notice: 2022 Elsevier B.V.
DBID	AAYXX CITATION
DOI	10.1016/j.neucom.2022.10.035
DatabaseName	CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	1872-8286
EndPage	132
ExternalDocumentID	10_1016_j_neucom_2022_10_035 S0925231222013108
GroupedDBID	--- --K --M .DC .~1 0R~ 123 1B1 1~. 1~5 4.4 457 4G. 53G 5VS 7-5 71M 8P~ 9JM 9JN AABNK AACTN AADPK AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAXLA AAXUO AAYFN ABBOA ABCQJ ABFNM ABJNI ABMAC ABYKQ ACDAQ ACGFS ACRLP ACZNC ADBBV ADEZE AEBSH AEKER AENEX AFKWA AFTJW AFXIZ AGHFR AGUBO AGWIK AGYEJ AHHHB AHZHX AIALX AIEXJ AIKHN AITUG AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD AXJTR BKOJK BLXMC CS3 DU5 EBS EFJIC EFLBG EO8 EO9 EP2 EP3 F5P FDB FIRID FNPLU FYGXN G-Q GBLVA GBOLZ IHE J1W KOM LG9 M41 MO0 MOBAO N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. Q38 ROL RPZ SDF SDG SDP SES SPC SPCBC SSN SSV SSZ T5K ZMT ~G- 29N AAQXK AATTM AAXKI AAYWO AAYXX ABWVN ABXDB ACNNM ACRPL ACVFH ADCNI ADJOM ADMUD ADNMO AEIPS AEUPX AFJKZ AFPUW AGCQF AGQPQ AGRNS AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP ASPBG AVWKF AZFZN BNPGV CITATION EJD FEDTE FGOYB HLZ HVGLF HZ~ R2- RIG SBC SEW SSH WUQ XPP
ID	FETCH-LOGICAL-c306t-ed3496fc22a0c3900d57a7755325f8067c30136c7483230f3493f4a72885c6ff3
IEDL.DBID	AIKHN
ISSN	0925-2312
IngestDate	Tue Jul 01 04:24:50 EDT 2025 Thu Apr 24 22:58:49 EDT 2025 Fri Feb 23 02:39:28 EST 2024
IsPeerReviewed	true
IsScholarly	true
Keywords	Q-learning Multi-objective reinforcement learning Reinforcement learning Differential games
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c306t-ed3496fc22a0c3900d57a7755325f8067c30136c7483230f3493f4a72885c6ff3
PageCount	18
ParticipantIDs	crossref_primary_10_1016_j_neucom_2022_10_035 crossref_citationtrail_10_1016_j_neucom_2022_10_035 elsevier_sciencedirect_doi_10_1016_j_neucom_2022_10_035
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2023-01-07
PublicationDateYYYYMMDD	2023-01-07
PublicationDate_xml	– month: 01 year: 2023 text: 2023-01-07 day: 07
PublicationDecade	2020
PublicationTitle	Neurocomputing (Amsterdam)
PublicationYear	2023
Publisher	Elsevier B.V
Publisher_xml	– name: Elsevier B.V
References	Nariman-Zadeh, Salehpour, Jamali, Haghgoo (b0055) 2010; 23 Parisi, Pirotta, Restelli (b0130) 2016; 57 M. Pirotta, S. Parisi, and M. Restelli, Multi-objective reinforcement learning with continuous pareto frontier approximation, Proceedings of the National Conference on Artificial Intelligence, vol. 4, pp. 2928–2934, 2015. Kaelbling, Littman, Moore (b0005) 1996; 4 Zitzler, Thiele, Laumanns, Fonseca, Da Fonseca (b0155) 2003; 7 Deb, Pratap, Agarwal, Meyarivan (b0165) 2002; 6 Roijers, Vamplew, Whiteson, Dazeley (b0030) 2013; 48 L. Barrett and S. Narayanan, Learning all optimal policies with multiple criteria, Proceedings of the 25th International Conference on Machine Learning, pp. 41–47, 2008. Ruiz-Montiel, Mandow, Pérez-de-la Cruz (b0145) 2017; 263 A. Asgharnia, H.M. Schwartz, and M. Atia, Deception in a multi-agent adversarial game: the game of guarding several territories, 2020 IEEE Symposium Series on Computational Intelligence, SSCI 2020, pp. 1321–1327, 2020. Showalter, Schwartz (b0060) 2021; 14 Zhang, Wang, Zhang (b0170) 2018; 49 Brys, Harutyunyan, Vrancx, Nowé, Taylor (b0050) 2017; 263 H. Mossalam, Y.M. Assael, D.M. Roijers, and S. Whiteson, Multi-objective deep reinforcement learning, arXiv preprint arXiv:1610.02707, 2016. P. Vamplew, J. Yearwood, R. Dazeley, and A. Berry, On the limitations of scalarisation for multi-objective reinforcement learning of pareto fronts, pp. 372–378, 2008. Watkins, Dayan (b0160) 1992; 8 White (b0035) 1982; 9 Mannor, Shimkin (b0070) 2002; vol. 14 Van Moffaert, Nowé (b0120) 2014; 15 P.Y. Glorennec and L. Jouffe, Fuzzy Q-learning, in: IEEE International Conference on Fuzzy Systems, vol. 2, pp. 659–662, 1997. Z. Daavarani Asl, V. Derhami, and M. Yazdian-Dehkordi, A new approach on multi-agent multi-objective reinforcement learning based on agents’ preferences, 19th CSI International Symposium on Artificial Intelligence and Signal Processing, AISP 2017, vol. 2018-Janua, pp. 75–79, 2018. M.A. Khamis and W.E.S.A. Gomaa, Enhanced multiagent multi-objective reinforcement learning for urban traffic light control, Proceedings - 2012 11th International Conference on Machine Learning and Applications, ICMLA 2012, vol. 1, no. 2, pp. 586–591, 2012. Ng, Harada, Russell (b0015) 1999; 3 M. Babes, E.M. De Cote, and M.L. Littman, Social reward shaping in the Prisoner’s dilemma, Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS, vol. 3, no. Aamas, pp. 1357–1360, 2008. S. Natarajan and P. Tadepalli, Dynamic preferences in multi-criteria reinforcement learning, in Proceedings of the 22nd international conference on Machine learning, pp. 601–608, 2005. H. Zhang, D.C. Parkes, and Y. Chen, Policy teaching through reward function learning, Proceedings of the ACM Conference on Electronic Commerce, pp. 295–304, 2009. Daellenbach, De Kluyver (b0095) 1980 Sutton, Barto (b0010) 2018 Vamplew, Dazeley, Foale (b0140) 2017; 263 A. Castelletti, G. Corani, A.E. Rizzoli, R. Soncini Sessa, and E. Weber, Reinforcement learning in the operational management of a water system, Modelling and Control in Environmental Issues 2001, pp. 325–330, 2002. K. Van Moffaert, M.M. Drugan, and A. Nowé, Hypervolume-based multi-objective reinforcement learning, pp. 352–366, 2013. Liu, Xu, Hu (b0075) 2015; 45 Mannor, Shimkin (b0100) 2004; 5 Vamplew, Dazeley, Berry, Issabekov, Dekker (b0110) 2011; 84 10.1016/j.neucom.2022.10.035_b0150 Zitzler (10.1016/j.neucom.2022.10.035_b0155) 2003; 7 10.1016/j.neucom.2022.10.035_b0090 10.1016/j.neucom.2022.10.035_b0135 Vamplew (10.1016/j.neucom.2022.10.035_b0110) 2011; 84 Parisi (10.1016/j.neucom.2022.10.035_b0130) 2016; 57 Daellenbach (10.1016/j.neucom.2022.10.035_b0095) 1980 Brys (10.1016/j.neucom.2022.10.035_b0050) 2017; 263 10.1016/j.neucom.2022.10.035_b0115 Liu (10.1016/j.neucom.2022.10.035_b0075) 2015; 45 White (10.1016/j.neucom.2022.10.035_b0035) 1982; 9 Watkins (10.1016/j.neucom.2022.10.035_b0160) 1992; 8 Kaelbling (10.1016/j.neucom.2022.10.035_b0005) 1996; 4 Nariman-Zadeh (10.1016/j.neucom.2022.10.035_b0055) 2010; 23 Deb (10.1016/j.neucom.2022.10.035_b0165) 2002; 6 10.1016/j.neucom.2022.10.035_b0040 Mannor (10.1016/j.neucom.2022.10.035_b0100) 2004; 5 Sutton (10.1016/j.neucom.2022.10.035_b0010) 2018 10.1016/j.neucom.2022.10.035_b0080 Ruiz-Montiel (10.1016/j.neucom.2022.10.035_b0145) 2017; 263 10.1016/j.neucom.2022.10.035_b0025 Mannor (10.1016/j.neucom.2022.10.035_b0070) 2002; vol. 14 10.1016/j.neucom.2022.10.035_b0125 10.1016/j.neucom.2022.10.035_b0045 10.1016/j.neucom.2022.10.035_b0065 Ng (10.1016/j.neucom.2022.10.035_b0015) 1999; 3 10.1016/j.neucom.2022.10.035_b0085 10.1016/j.neucom.2022.10.035_b0020 Roijers (10.1016/j.neucom.2022.10.035_b0030) 2013; 48 Vamplew (10.1016/j.neucom.2022.10.035_b0140) 2017; 263 Zhang (10.1016/j.neucom.2022.10.035_b0170) 2018; 49 10.1016/j.neucom.2022.10.035_b0105 Showalter (10.1016/j.neucom.2022.10.035_b0060) 2021; 14 Van Moffaert (10.1016/j.neucom.2022.10.035_b0120) 2014; 15
References_xml	– volume: 6 start-page: 182 year: 2002 end-page: 197 ident: b0165 article-title: A fast and elitist multiobjective genetic algorithm: NSGA-II publication-title: IEEE Transactions on Evolutionary Computation – reference: P.Y. Glorennec and L. Jouffe, Fuzzy Q-learning, in: IEEE International Conference on Fuzzy Systems, vol. 2, pp. 659–662, 1997. – volume: 5 start-page: 325 year: 2004 end-page: 360 ident: b0100 article-title: A geometric approach to multi-criterion reinforcement learning publication-title: Journal of Machine Learning Research – reference: A. Asgharnia, H.M. Schwartz, and M. Atia, Deception in a multi-agent adversarial game: the game of guarding several territories, 2020 IEEE Symposium Series on Computational Intelligence, SSCI 2020, pp. 1321–1327, 2020. – volume: vol. 14 year: 2002 ident: b0070 publication-title: The steering approach for multi-criteria reinforcement learning – start-page: 591 year: 1980 end-page: 594 ident: b0095 article-title: Note on multiple objective dynamic programming publication-title: Journal of the Operational Research Society – volume: 263 start-page: 74 year: 2017 end-page: 86 ident: b0140 article-title: Softmax exploration strategies for multiobjective reinforcement learning publication-title: Neurocomputing – reference: M. Babes, E.M. De Cote, and M.L. Littman, Social reward shaping in the Prisoner’s dilemma, Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS, vol. 3, no. Aamas, pp. 1357–1360, 2008. – reference: K. Van Moffaert, M.M. Drugan, and A. Nowé, Hypervolume-based multi-objective reinforcement learning, pp. 352–366, 2013. – reference: A. Castelletti, G. Corani, A.E. Rizzoli, R. Soncini Sessa, and E. Weber, Reinforcement learning in the operational management of a water system, Modelling and Control in Environmental Issues 2001, pp. 325–330, 2002. – year: 2018 ident: b0010 article-title: Reinforcement learning: An introduction – volume: 7 start-page: 117 year: 2003 end-page: 132 ident: b0155 article-title: Performance assessment of multiobjective optimizers: An analysis and review publication-title: IEEE Transactions on evolutionary computation – volume: 49 start-page: 4441 year: 2018 end-page: 4449 ident: b0170 article-title: Data-based optimal control of multiagent systems: A reinforcement learning design approach publication-title: IEEE transactions on cybernetics – reference: M. Pirotta, S. Parisi, and M. Restelli, Multi-objective reinforcement learning with continuous pareto frontier approximation, Proceedings of the National Conference on Artificial Intelligence, vol. 4, pp. 2928–2934, 2015. – volume: 8 start-page: 279 year: 1992 end-page: 292 ident: b0160 article-title: Q-Learning publication-title: Machine learning – reference: Z. Daavarani Asl, V. Derhami, and M. Yazdian-Dehkordi, A new approach on multi-agent multi-objective reinforcement learning based on agents’ preferences, 19th CSI International Symposium on Artificial Intelligence and Signal Processing, AISP 2017, vol. 2018-Janua, pp. 75–79, 2018. – reference: M.A. Khamis and W.E.S.A. Gomaa, Enhanced multiagent multi-objective reinforcement learning for urban traffic light control, Proceedings - 2012 11th International Conference on Machine Learning and Applications, ICMLA 2012, vol. 1, no. 2, pp. 586–591, 2012. – volume: 45 start-page: 385 year: 2015 end-page: 398 ident: b0075 article-title: Multiobjective reinforcement learning: A comprehensive overview publication-title: IEEE Transactions on Systems, Man, and Cybernetics: Systems – volume: 3 start-page: 278 year: 1999 end-page: 287 ident: b0015 article-title: Policy invariance under reward transformations: Theory and application to reward shaping publication-title: Sixteenth International Conference on Machine Learning – reference: H. Zhang, D.C. Parkes, and Y. Chen, Policy teaching through reward function learning, Proceedings of the ACM Conference on Electronic Commerce, pp. 295–304, 2009. – volume: 15 start-page: 3483 year: 2014 end-page: 3512 ident: b0120 article-title: Multi-objective reinforcement learning using sets of pareto dominating policies publication-title: The Journal of Machine Learning Research – reference: H. Mossalam, Y.M. Assael, D.M. Roijers, and S. Whiteson, Multi-objective deep reinforcement learning, arXiv preprint arXiv:1610.02707, 2016. – volume: 84 start-page: 51 year: 2011 end-page: 80 ident: b0110 article-title: Empirical evaluation methods for multiobjective reinforcement learning algorithms publication-title: Machine Learning – volume: 48 start-page: 67 year: 2013 end-page: 113 ident: b0030 article-title: A survey of multi-objective sequential decision-making publication-title: Journal of Artificial Intelligence Research – volume: 263 start-page: 15 year: 2017 end-page: 25 ident: b0145 article-title: A temporal difference method for multi-objective reinforcement learning publication-title: Neurocomputing – volume: 23 start-page: 543 year: 2010 end-page: 551 ident: b0055 article-title: Pareto optimization of a five-degree of freedom vehicle vibration model using a multi-objective uniform-diversity genetic algorithm (MUGA) publication-title: Engineering Applications of Artificial Intelligence – reference: L. Barrett and S. Narayanan, Learning all optimal policies with multiple criteria, Proceedings of the 25th International Conference on Machine Learning, pp. 41–47, 2008. – volume: 9 start-page: 101 year: 1982 end-page: 107 ident: b0035 article-title: The set of efficient solutions for multiple objective shortest path problems publication-title: Computers and Operations Research – reference: P. Vamplew, J. Yearwood, R. Dazeley, and A. Berry, On the limitations of scalarisation for multi-objective reinforcement learning of pareto fronts, pp. 372–378, 2008. – reference: S. Natarajan and P. Tadepalli, Dynamic preferences in multi-criteria reinforcement learning, in Proceedings of the 22nd international conference on Machine learning, pp. 601–608, 2005. – volume: 263 start-page: 48 year: 2017 end-page: 59 ident: b0050 article-title: Multi-objectivization and ensembles of shapings in reinforcement learning publication-title: Neurocomputing – volume: 4 start-page: 237 year: 1996 end-page: 285 ident: b0005 article-title: Reinforcement learning: A survey publication-title: Journal of artificial intelligence research – volume: 14 start-page: 1415 year: 2021 end-page: 1430 ident: b0060 article-title: Neuromodulated multiobjective evolutionary neurocontrollers without speciation publication-title: Evolutionary Intelligence – volume: 57 start-page: 187 year: 2016 end-page: 227 ident: b0130 article-title: Multi-objective reinforcement learning through continuous pareto manifold approximation publication-title: Journal of Artificial Intelligence Research – ident: 10.1016/j.neucom.2022.10.035_b0125 doi: 10.1609/aaai.v29i1.9617 – ident: 10.1016/j.neucom.2022.10.035_b0040 – volume: 263 start-page: 74 year: 2017 ident: 10.1016/j.neucom.2022.10.035_b0140 article-title: Softmax exploration strategies for multiobjective reinforcement learning publication-title: Neurocomputing doi: 10.1016/j.neucom.2016.09.141 – volume: vol. 14 year: 2002 ident: 10.1016/j.neucom.2022.10.035_b0070 publication-title: The steering approach for multi-criteria reinforcement learning – ident: 10.1016/j.neucom.2022.10.035_b0090 doi: 10.1109/FUZZY.1997.622790 – volume: 8 start-page: 279 issue: 3–4 year: 1992 ident: 10.1016/j.neucom.2022.10.035_b0160 article-title: Q-Learning publication-title: Machine learning doi: 10.1007/BF00992698 – ident: 10.1016/j.neucom.2022.10.035_b0085 doi: 10.1145/1390156.1390162 – volume: 5 start-page: 325 year: 2004 ident: 10.1016/j.neucom.2022.10.035_b0100 article-title: A geometric approach to multi-criterion reinforcement learning publication-title: Journal of Machine Learning Research – volume: 14 start-page: 1415 issue: 4 year: 2021 ident: 10.1016/j.neucom.2022.10.035_b0060 article-title: Neuromodulated multiobjective evolutionary neurocontrollers without speciation publication-title: Evolutionary Intelligence doi: 10.1007/s12065-020-00394-9 – ident: 10.1016/j.neucom.2022.10.035_b0045 doi: 10.1007/978-3-540-89378-3_37 – volume: 263 start-page: 48 year: 2017 ident: 10.1016/j.neucom.2022.10.035_b0050 article-title: Multi-objectivization and ensembles of shapings in reinforcement learning publication-title: Neurocomputing doi: 10.1016/j.neucom.2017.02.096 – volume: 4 start-page: 237 year: 1996 ident: 10.1016/j.neucom.2022.10.035_b0005 article-title: Reinforcement learning: A survey publication-title: Journal of artificial intelligence research doi: 10.1613/jair.301 – ident: 10.1016/j.neucom.2022.10.035_b0135 – volume: 57 start-page: 187 year: 2016 ident: 10.1016/j.neucom.2022.10.035_b0130 article-title: Multi-objective reinforcement learning through continuous pareto manifold approximation publication-title: Journal of Artificial Intelligence Research doi: 10.1613/jair.4961 – ident: 10.1016/j.neucom.2022.10.035_b0105 doi: 10.1109/SSCI47803.2020.9308211 – ident: 10.1016/j.neucom.2022.10.035_b0020 – ident: 10.1016/j.neucom.2022.10.035_b0065 doi: 10.1109/ICMLA.2012.108 – start-page: 591 year: 1980 ident: 10.1016/j.neucom.2022.10.035_b0095 article-title: Note on multiple objective dynamic programming publication-title: Journal of the Operational Research Society doi: 10.1057/jors.1980.114 – volume: 84 start-page: 51 issue: 1–2 year: 2011 ident: 10.1016/j.neucom.2022.10.035_b0110 article-title: Empirical evaluation methods for multiobjective reinforcement learning algorithms publication-title: Machine Learning doi: 10.1007/s10994-010-5232-5 – volume: 263 start-page: 15 year: 2017 ident: 10.1016/j.neucom.2022.10.035_b0145 article-title: A temporal difference method for multi-objective reinforcement learning publication-title: Neurocomputing doi: 10.1016/j.neucom.2016.10.100 – ident: 10.1016/j.neucom.2022.10.035_b0150 doi: 10.1109/AISP.2017.8324111 – volume: 6 start-page: 182 issue: 2 year: 2002 ident: 10.1016/j.neucom.2022.10.035_b0165 article-title: A fast and elitist multiobjective genetic algorithm: NSGA-II publication-title: IEEE Transactions on Evolutionary Computation doi: 10.1109/4235.996017 – volume: 9 start-page: 101 issue: 2 year: 1982 ident: 10.1016/j.neucom.2022.10.035_b0035 article-title: The set of efficient solutions for multiple objective shortest path problems publication-title: Computers and Operations Research doi: 10.1016/0305-0548(82)90008-9 – volume: 48 start-page: 67 year: 2013 ident: 10.1016/j.neucom.2022.10.035_b0030 article-title: A survey of multi-objective sequential decision-making publication-title: Journal of Artificial Intelligence Research doi: 10.1613/jair.3987 – ident: 10.1016/j.neucom.2022.10.035_b0080 doi: 10.1145/1102351.1102427 – volume: 23 start-page: 543 issue: 4 year: 2010 ident: 10.1016/j.neucom.2022.10.035_b0055 article-title: Pareto optimization of a five-degree of freedom vehicle vibration model using a multi-objective uniform-diversity genetic algorithm (MUGA) publication-title: Engineering Applications of Artificial Intelligence doi: 10.1016/j.engappai.2009.08.008 – volume: 3 start-page: 278 year: 1999 ident: 10.1016/j.neucom.2022.10.035_b0015 article-title: Policy invariance under reward transformations: Theory and application to reward shaping publication-title: Sixteenth International Conference on Machine Learning – ident: 10.1016/j.neucom.2022.10.035_b0115 doi: 10.1007/978-3-642-37140-0_28 – year: 2018 ident: 10.1016/j.neucom.2022.10.035_b0010 – volume: 49 start-page: 4441 issue: 12 year: 2018 ident: 10.1016/j.neucom.2022.10.035_b0170 article-title: Data-based optimal control of multiagent systems: A reinforcement learning design approach publication-title: IEEE transactions on cybernetics doi: 10.1109/TCYB.2018.2868715 – ident: 10.1016/j.neucom.2022.10.035_b0025 doi: 10.1145/1566374.1566417 – volume: 7 start-page: 117 issue: 2 year: 2003 ident: 10.1016/j.neucom.2022.10.035_b0155 article-title: Performance assessment of multiobjective optimizers: An analysis and review publication-title: IEEE Transactions on evolutionary computation doi: 10.1109/TEVC.2003.810758 – volume: 15 start-page: 3483 issue: 1 year: 2014 ident: 10.1016/j.neucom.2022.10.035_b0120 article-title: Multi-objective reinforcement learning using sets of pareto dominating policies publication-title: The Journal of Machine Learning Research – volume: 45 start-page: 385 issue: 3 year: 2015 ident: 10.1016/j.neucom.2022.10.035_b0075 article-title: Multiobjective reinforcement learning: A comprehensive overview publication-title: IEEE Transactions on Systems, Man, and Cybernetics: Systems doi: 10.1109/TSMC.2014.2358639
SSID	ssj0017129
Score	2.4242074
Snippet	Many real world problems are multi-objective. Thus, the need for multi-objective learning and optimization algorithms is inevitable. Although the...
SourceID	crossref elsevier
SourceType	Enrichment Source Index Database Publisher
StartPage	115
SubjectTerms	Differential games Multi-objective reinforcement learning Q-learning Reinforcement learning
Title	Multi-objective fuzzy Q-learning to solve continuous state-action problems
URI	https://dx.doi.org/10.1016/j.neucom.2022.10.035
Volume	516
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3PS8MwFH7M7eLF3-L8RQ5es7VJ07THMRxzwkB0sFvp0kQ2pBvaHdzBv92XNh0KouCxaR6U1-R9L-333gdwIyxI-7GiAvGMBlpkNNYYDGXGPK3wOlIly3ccDifBaCqmDejXtTCWVulifxXTy2jtRrrOm93VfN599GKGpygfAc72jLEFvy3G41A0odW7ux-Otz8TpM-qlntMUGtQV9CVNK9cry1thCGWdSzNq9R9-wGhvqDO4AD2XLpIetUTHUJD50ewX0sxELczj2FUFtLS5WxRBTBi1pvNO3mgThXimRRLgssM71hy-jxf44mflNVEtCptIE5a5u0EJoPbp_6QOpkEqjDfL6jObNN3oxhLPcVjz8uETKUU-BqEiRCNcJrPQyUD3L3cMzibmyCVLIqECo3hp9DMl7k-A5JpYbJUKZNKFTBPpgHnqZmJKMS0KZBpG3jtmkS5HuJWyuIlqclii6RyaGIdakfRoW2gW6tV1UPjj_my9nrybS0kGOZ_tTz_t-UF7Foh-fLjiryEZvG61leYbhSza9jpfPjXblF9AmiW1C8
linkProvider	Elsevier
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3PS8MwFA5jHvTib3H-zMFrtjZpmvYowzHnHIgb7Fa6NJGJdEPbgzv4t_uSpkNBFDw2fYHykrzvpXzvfQhdcQPSfiwJBzwjgeIZiRUEQ5FRT0l4jqRl-Y7C_iQYTPm0gbp1LYyhVbrYX8V0G63dSMd5s7OczzuPXkzhFuUDwJmeMabgdyPgTBheX_tjzfPwhU-rhnuUE2Ne189ZkleuSkMaoYBkbUPysqpvP-DTF8zp7aJtlyzi6-p79lBD5ftopxZiwO5cHqCBLaMli9lzFb6wLlerd_xAnCbEEy4WGDYZvDHU9Hlewn0f21oiUhU2YCcs83aIJr2bcbdPnEgCkZDtF0RlpuW7lpSmnmSx52VcpEJwWASuI8AiMPNZKEUAZ5d5GqyZDlJBo4jLUGt2hJr5IlfHCGeK6yyVUqdCBtQTacBYqmc8CiFpCkTaQqx2TSJdB3EjZPGS1FSx56RyaGIcakbBoS1E1rOWVQeNP-xF7fXk205IIMj_OvPk3zMv0WZ_fD9Mhreju1O0ZSTl7W8WcYaaxWupziHxKGYXdmN9AjCk1Po
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Multi-objective+fuzzy+Q-learning+to+solve+continuous+state-action+problems&rft.jtitle=Neurocomputing+%28Amsterdam%29&rft.au=Asgharnia%2C+Amirhossein&rft.au=Schwartz%2C+Howard&rft.au=Atia%2C+Mohamed&rft.date=2023-01-07&rft.pub=Elsevier+B.V&rft.issn=0925-2312&rft.eissn=1872-8286&rft.volume=516&rft.spage=115&rft.epage=132&rft_id=info:doi/10.1016%2Fj.neucom.2022.10.035&rft.externalDocID=S0925231222013108
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0925-2312&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0925-2312&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0925-2312&client=summon