基于自适应增强随机搜索的航天器追逃博弈策略研究

V448.2; 针对航天器与非合作目标追逃博弈的生存型微分对策拦截问题,基于强化学习研究了追逃博弈策略,提出了自适应增强随机搜索(adaptive-augmented random search,A-ARS)算法.针对序贯决策的稀疏奖励难题,设计了基于策略参数空间扰动的探索方法,加快策略收敛速度;针对可能过早陷入局部最优问题设计了新颖度函数并引导策略更新,可提升数据利用效率;通过数值仿真验证并与增强随机搜索(augmented random search,ARS)、近端策略优化算法(proximal policy optimization,PPO)以及深度确定性策略梯度下降算法(deep de...

Full description

Saved in:

Bibliographic Details
Published in	西北工业大学学报 Vol. 42; no. 1; pp. 117 - 128
Main Authors	焦杰, 苟永杰, 吴文博, 泮斌峰
Format	Journal Article
Language	Chinese
Published	航天飞行动力学技术国家级重点实验室,陕西西安 710072%上海宇航系统工程研究所,上海 201108 01.02.2024 西北工业大学航天学院,陕西西安 710072
Subjects	non-cooperative target 稀疏奖励 sparse reward pursuit game 微分对策强化学习 differential game theory 非合作目标追逃博弈 reinforcement learning
Online Access	Get full text
ISSN	1000-2758
DOI	10.1051/jnwpu/20244210117

Cover

Abstract	V448.2; 针对航天器与非合作目标追逃博弈的生存型微分对策拦截问题,基于强化学习研究了追逃博弈策略,提出了自适应增强随机搜索(adaptive-augmented random search,A-ARS)算法.针对序贯决策的稀疏奖励难题,设计了基于策略参数空间扰动的探索方法,加快策略收敛速度;针对可能过早陷入局部最优问题设计了新颖度函数并引导策略更新,可提升数据利用效率;通过数值仿真验证并与增强随机搜索(augmented random search,ARS)、近端策略优化算法(proximal policy optimization,PPO)以及深度确定性策略梯度下降算法(deep deterministic policy gradient,DDPG)进行对比,验证了此方法的有效性和先进性.
AbstractList	V448.2; 针对航天器与非合作目标追逃博弈的生存型微分对策拦截问题,基于强化学习研究了追逃博弈策略,提出了自适应增强随机搜索(adaptive-augmented random search,A-ARS)算法.针对序贯决策的稀疏奖励难题,设计了基于策略参数空间扰动的探索方法,加快策略收敛速度;针对可能过早陷入局部最优问题设计了新颖度函数并引导策略更新,可提升数据利用效率;通过数值仿真验证并与增强随机搜索(augmented random search,ARS)、近端策略优化算法(proximal policy optimization,PPO)以及深度确定性策略梯度下降算法(deep deterministic policy gradient,DDPG)进行对比,验证了此方法的有效性和先进性.
Abstract_FL	To solve the problem of the survival differential policy interception between a spacecraft and a non-coop-erative target pursuit game,the pursuit game policy is studied based on reinforcement learning,and the adaptive-augmented random search algorithm is proposed.Firstly,to solve the sparse reward problem of sequential decision making,an exploration method based on the spatial perturbation of parameters of the policy is designed,thus accel-erating its convergence speed.Secondly,to avoid the possibility of falling into local optimum prematurely,a novelty degree function is designed to guide the policy update,enhancing the efficiency of data utilization.Finally,the ef-fectiveness and advancement of the exploration method are verified with numerical simulations and compared with those of the augmented random search algorithm,the proximal policy optimization algorithm and the deep determin-istic policy gradient algorithm.
Author	焦杰泮斌峰苟永杰吴文博
AuthorAffiliation	西北工业大学航天学院,陕西西安 710072;航天飞行动力学技术国家级重点实验室,陕西西安 710072%上海宇航系统工程研究所,上海 201108
AuthorAffiliation_xml	– name: 西北工业大学航天学院,陕西西安 710072;航天飞行动力学技术国家级重点实验室,陕西西安 710072%上海宇航系统工程研究所,上海 201108
Author_FL	JIAO Jie GOU Yongjie WU Wenbo PAN Binfeng
Author_FL_xml	– sequence: 1 fullname: JIAO Jie – sequence: 2 fullname: GOU Yongjie – sequence: 3 fullname: WU Wenbo – sequence: 4 fullname: PAN Binfeng
Author_xml	– sequence: 1 fullname: 焦杰 – sequence: 2 fullname: 苟永杰 – sequence: 3 fullname: 吴文博 – sequence: 4 fullname: 泮斌峰
BookMark	eNotz71Kw1AYxvEzVLDWXoCX4BD7vufkc5RSP6DgoHvJOTkpFknFUBq3ooJdtA5VMBSVomALbuIQKt6Mp0nvwohOz_b8-a2QQtAOJCFrCBsIBlZaQfe4U6FAdZ0iIFoFUkQA0Khl2MukHIaHHAwHQae2XiQ19Zh8J9fZ5XTRO1PJUI0f1CxZxIP5KJnfjNL3cRpfZP2pep6o-9fs63PRO1dXsZr107e79PYlfRqmk49VsuS7R6Es_2-J7G_VDqo7Wn1ve7e6WdfCvGdozKHMFJ7gtil0jugI22eINhW6IYUF4EjGpTSl6QN4LkNTWKaTS8B3UbASWf977bqB7wbNRqvdOQnyXiPizVMvivgvG3K1wX4AX_toEw
ClassificationCodes	V448.2
ContentType	Journal Article
Copyright	Copyright © Wanfang Data Co. Ltd. All Rights Reserved.
Copyright_xml	– notice: Copyright © Wanfang Data Co. Ltd. All Rights Reserved.
DBID	2B. 4A8 92I 93N PSX TCJ
DOI	10.1051/jnwpu/20244210117
DatabaseName	Wanfang Data Journals - Hong Kong WANFANG Data Centre Wanfang Data Journals 万方数据期刊 - 香港版 China Online Journals (COJ) China Online Journals (COJ)
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
DocumentTitle_FL	Research on game strategy of spacecraft chase and escape based on adaptive augmented random search
EndPage	128
ExternalDocumentID	xbgydxxb202401015
GroupedDBID	2B. 4A8 92I 93N AFKRA ALMA_UNASSIGNED_HOLDINGS BENPR CCPQU PHGZM PHGZT PIMPY PMFND PSX TCJ
ID	FETCH-LOGICAL-s1045-39236cdcb86c4b119c8f31182c45ec7009e3bee6e6f00da316c7692440fa1c3
ISSN	1000-2758
IngestDate	Thu May 29 04:00:29 EDT 2025
IsPeerReviewed	true
IsScholarly	true
Issue	1
Keywords	non-cooperative target 稀疏奖励 sparse reward pursuit game 微分对策强化学习 differential game theory 非合作目标追逃博弈 reinforcement learning
Language	Chinese
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-s1045-39236cdcb86c4b119c8f31182c45ec7009e3bee6e6f00da316c7692440fa1c3
PageCount	12
ParticipantIDs	wanfang_journals_xbgydxxb202401015
PublicationCentury	2000
PublicationDate	2024-02-01
PublicationDateYYYYMMDD	2024-02-01
PublicationDate_xml	– month: 02 year: 2024 text: 2024-02-01 day: 01
PublicationDecade	2020
PublicationTitle	西北工业大学学报
PublicationTitle_FL	Journal of Northwestern Polytechnical University
PublicationYear	2024
Publisher	航天飞行动力学技术国家级重点实验室,陕西西安 710072%上海宇航系统工程研究所,上海 201108 西北工业大学航天学院,陕西西安 710072
Publisher_xml	– name: 航天飞行动力学技术国家级重点实验室,陕西西安 710072%上海宇航系统工程研究所,上海 201108 – name: 西北工业大学航天学院,陕西西安 710072
SSID	ssib059104284 ssib001129888 ssib046626106 ssib036436219 ssib044765131 ssib044604139 ssib051375596 ssib002258180
Score	2.3723547
Snippet	V448.2; 针对航天器与非合作目标追逃博弈的生存型微分对策拦截问题,基于强化学习研究了追逃博弈策略,提出了自适应增强随机搜索(adaptive-augmented random...
SourceID	wanfang
SourceType	Aggregation Database
StartPage	117
Title	基于自适应增强随机搜索的航天器追逃博弈策略研究
URI	https://d.wanfangdata.com.cn/periodical/xbgydxxb202401015
Volume	42
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3NT9VAEG8QLl6MRo3fIcY9kUq73e7HsfsoISYSo5hwI21p8fQ0AgE5ETWRix8HNJEQNUQTIfFmPLxg_GcsD_4LZ7bta-FxUC_NvpnZ2Zn5bbu7fbtTy7ohWMZFGgvbj2Jus0RRO45ZZnvMy9LIgwmESZJ0e5JP3Ge3pv3pgcGl5umShfhmsnLsuZL_QRVogCuekv0HZHtKgQBlwBeugDBc_wpjEvpEjRMdkJDhVYYklASW9wFQFG5ikBRlgKUYFgJKVGgoLVNLEQW1xknIiSooUHCwHAqiGcpDAWWY0SyNZtADLGVaVySQyNJgxljVqIcsOYYVi7agIugJgMKNQtDgG4pjDBOoTfPmRBl1ggyoRVUtooRRJUxFcFaWytESwwLlAW8UOJEBCFcdChsBJwqOGiPaqTngmMY4Akc7qLpfxMewQECQwzHClYO1CFT2SBBWIi1jr1dpKV-sUFbtxS5vhePCCjH1DE5Ad0tN6IwsC0of9ROCXkAYmHiBAILhoznaRBygxTApNB3sxmBQolUpg20ptKFoCyhgCW2Z_qEQrQqMkUOwgJxUI5i4SVBCK1yk6UXagFUKiYaXwgRFm0IR9R6sAl2Uur9jGC9N10KjjrYyYmaXzeEV8xhQUSTrr8ZfRvueM8Vg6hanest5mVskEegb8mFUwSG_vfRoEU83AZCMuk6v7qFc6svx3JPZ5eUYpTC5on_CGqJC4P6OIR1O3rlbryRgHiybmfyoj1kRqt8eTOQ5rTMhMsYd1kiwy5jgvlv_wc44p7yxoQF4Atby9W-YtTNqvo_ei1G1B8N3R413ow3fzBHDdha15xqz4anT1qlyGTscFM-kM9bAyoOzVph_7PzuvNp_sXOw-jTvrOdbH_LdzsHG673Nzt6bze73re7G8_21nfzzdv7-6_6vnwerz_KXG_nuWvfbu-7bL91P693tH-ese-PhVGvCLj_TYs-D0b4NKyyPJ7NJLHnCYtdVicw8fG-RMD9NBCziUi9OU57yzHFmI8_lieAKXHGyyE2889Zg-2E7vWANK5h9uTLyVZYw5kF0qaI0i2CB6gpwXFy0rpdOz5QP4fmZPkQv_Y3QZetkfcdfsQYXHi-mV2FxsRBfKzvCHyo-018
linkProvider	ProQuest
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=%E5%9F%BA%E4%BA%8E%E8%87%AA%E9%80%82%E5%BA%94%E5%A2%9E%E5%BC%BA%E9%9A%8F%E6%9C%BA%E6%90%9C%E7%B4%A2%E7%9A%84%E8%88%AA%E5%A4%A9%E5%99%A8%E8%BF%BD%E9%80%83%E5%8D%9A%E5%BC%88%E7%AD%96%E7%95%A5%E7%A0%94%E7%A9%B6&rft.jtitle=%E8%A5%BF%E5%8C%97%E5%B7%A5%E4%B8%9A%E5%A4%A7%E5%AD%A6%E5%AD%A6%E6%8A%A5&rft.au=%E7%84%A6%E6%9D%B0&rft.au=%E8%8B%9F%E6%B0%B8%E6%9D%B0&rft.au=%E5%90%B4%E6%96%87%E5%8D%9A&rft.au=%E6%B3%AE%E6%96%8C%E5%B3%B0&rft.date=2024-02-01&rft.pub=%E8%88%AA%E5%A4%A9%E9%A3%9E%E8%A1%8C%E5%8A%A8%E5%8A%9B%E5%AD%A6%E6%8A%80%E6%9C%AF%E5%9B%BD%E5%AE%B6%E7%BA%A7%E9%87%8D%E7%82%B9%E5%AE%9E%E9%AA%8C%E5%AE%A4%2C%E9%99%95%E8%A5%BF+%E8%A5%BF%E5%AE%89+710072%25%E4%B8%8A%E6%B5%B7%E5%AE%87%E8%88%AA%E7%B3%BB%E7%BB%9F%E5%B7%A5%E7%A8%8B%E7%A0%94%E7%A9%B6%E6%89%80%2C%E4%B8%8A%E6%B5%B7+201108&rft.issn=1000-2758&rft.volume=42&rft.issue=1&rft.spage=117&rft.epage=128&rft_id=info:doi/10.1051%2Fjnwpu%2F20244210117&rft.externalDocID=xbgydxxb202401015
thumbnail_s	http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=http%3A%2F%2Fwww.wanfangdata.com.cn%2Fimages%2FPeriodicalImages%2Fxbgydxxb%2Fxbgydxxb.jpg