基于自适应增强随机搜索的航天器追逃博弈策略研究

V448.2; 针对航天器与非合作目标追逃博弈的生存型微分对策拦截问题,基于强化学习研究了追逃博弈策略,提出了自适应增强随机搜索(adaptive-augmented random search,A-ARS)算法.针对序贯决策的稀疏奖励难题,设计了基于策略参数空间扰动的探索方法,加快策略收敛速度;针对可能过早陷入局部最优问题设计了新颖度函数并引导策略更新,可提升数据利用效率;通过数值仿真验证并与增强随机搜索(augmented random search,ARS)、近端策略优化算法(proximal policy optimization,PPO)以及深度确定性策略梯度下降算法(deep de...

Full description

Saved in:
Bibliographic Details
Published in西北工业大学学报 Vol. 42; no. 1; pp. 117 - 128
Main Authors 焦杰, 苟永杰, 吴文博, 泮斌峰
Format Journal Article
LanguageChinese
Published 航天飞行动力学技术国家级重点实验室,陕西 西安 710072%上海宇航系统工程研究所,上海 201108 01.02.2024
西北工业大学 航天学院,陕西 西安 710072
Subjects
Online AccessGet full text
ISSN1000-2758
DOI10.1051/jnwpu/20244210117

Cover

Abstract V448.2; 针对航天器与非合作目标追逃博弈的生存型微分对策拦截问题,基于强化学习研究了追逃博弈策略,提出了自适应增强随机搜索(adaptive-augmented random search,A-ARS)算法.针对序贯决策的稀疏奖励难题,设计了基于策略参数空间扰动的探索方法,加快策略收敛速度;针对可能过早陷入局部最优问题设计了新颖度函数并引导策略更新,可提升数据利用效率;通过数值仿真验证并与增强随机搜索(augmented random search,ARS)、近端策略优化算法(proximal policy optimization,PPO)以及深度确定性策略梯度下降算法(deep deterministic policy gradient,DDPG)进行对比,验证了此方法的有效性和先进性.
AbstractList V448.2; 针对航天器与非合作目标追逃博弈的生存型微分对策拦截问题,基于强化学习研究了追逃博弈策略,提出了自适应增强随机搜索(adaptive-augmented random search,A-ARS)算法.针对序贯决策的稀疏奖励难题,设计了基于策略参数空间扰动的探索方法,加快策略收敛速度;针对可能过早陷入局部最优问题设计了新颖度函数并引导策略更新,可提升数据利用效率;通过数值仿真验证并与增强随机搜索(augmented random search,ARS)、近端策略优化算法(proximal policy optimization,PPO)以及深度确定性策略梯度下降算法(deep deterministic policy gradient,DDPG)进行对比,验证了此方法的有效性和先进性.
Abstract_FL To solve the problem of the survival differential policy interception between a spacecraft and a non-coop-erative target pursuit game,the pursuit game policy is studied based on reinforcement learning,and the adaptive-augmented random search algorithm is proposed.Firstly,to solve the sparse reward problem of sequential decision making,an exploration method based on the spatial perturbation of parameters of the policy is designed,thus accel-erating its convergence speed.Secondly,to avoid the possibility of falling into local optimum prematurely,a novelty degree function is designed to guide the policy update,enhancing the efficiency of data utilization.Finally,the ef-fectiveness and advancement of the exploration method are verified with numerical simulations and compared with those of the augmented random search algorithm,the proximal policy optimization algorithm and the deep determin-istic policy gradient algorithm.
Author 焦杰
泮斌峰
苟永杰
吴文博
AuthorAffiliation 西北工业大学 航天学院,陕西 西安 710072;航天飞行动力学技术国家级重点实验室,陕西 西安 710072%上海宇航系统工程研究所,上海 201108
AuthorAffiliation_xml – name: 西北工业大学 航天学院,陕西 西安 710072;航天飞行动力学技术国家级重点实验室,陕西 西安 710072%上海宇航系统工程研究所,上海 201108
Author_FL JIAO Jie
GOU Yongjie
WU Wenbo
PAN Binfeng
Author_FL_xml – sequence: 1
  fullname: JIAO Jie
– sequence: 2
  fullname: GOU Yongjie
– sequence: 3
  fullname: WU Wenbo
– sequence: 4
  fullname: PAN Binfeng
Author_xml – sequence: 1
  fullname: 焦杰
– sequence: 2
  fullname: 苟永杰
– sequence: 3
  fullname: 吴文博
– sequence: 4
  fullname: 泮斌峰
BookMark eNotz71Kw1AYxvEzVLDWXoCX4BD7vufkc5RSP6DgoHvJOTkpFknFUBq3ooJdtA5VMBSVomALbuIQKt6Mp0nvwohOz_b8-a2QQtAOJCFrCBsIBlZaQfe4U6FAdZ0iIFoFUkQA0Khl2MukHIaHHAwHQae2XiQ19Zh8J9fZ5XTRO1PJUI0f1CxZxIP5KJnfjNL3cRpfZP2pep6o-9fs63PRO1dXsZr107e79PYlfRqmk49VsuS7R6Es_2-J7G_VDqo7Wn1ve7e6WdfCvGdozKHMFJ7gtil0jugI22eINhW6IYUF4EjGpTSl6QN4LkNTWKaTS8B3UbASWf977bqB7wbNRqvdOQnyXiPizVMvivgvG3K1wX4AX_toEw
ClassificationCodes V448.2
ContentType Journal Article
Copyright Copyright © Wanfang Data Co. Ltd. All Rights Reserved.
Copyright_xml – notice: Copyright © Wanfang Data Co. Ltd. All Rights Reserved.
DBID 2B.
4A8
92I
93N
PSX
TCJ
DOI 10.1051/jnwpu/20244210117
DatabaseName Wanfang Data Journals - Hong Kong
WANFANG Data Centre
Wanfang Data Journals
万方数据期刊 - 香港版
China Online Journals (COJ)
China Online Journals (COJ)
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
DocumentTitle_FL Research on game strategy of spacecraft chase and escape based on adaptive augmented random search
EndPage 128
ExternalDocumentID xbgydxxb202401015
GroupedDBID 2B.
4A8
92I
93N
AFKRA
ALMA_UNASSIGNED_HOLDINGS
BENPR
CCPQU
PHGZM
PHGZT
PIMPY
PMFND
PSX
TCJ
ID FETCH-LOGICAL-s1045-39236cdcb86c4b119c8f31182c45ec7009e3bee6e6f00da316c7692440fa1c3
ISSN 1000-2758
IngestDate Thu May 29 04:00:29 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 1
Keywords non-cooperative target
稀疏奖励
sparse reward
pursuit game
微分对策
强化学习
differential game theory
非合作目标
追逃博弈
reinforcement learning
Language Chinese
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-s1045-39236cdcb86c4b119c8f31182c45ec7009e3bee6e6f00da316c7692440fa1c3
PageCount 12
ParticipantIDs wanfang_journals_xbgydxxb202401015
PublicationCentury 2000
PublicationDate 2024-02-01
PublicationDateYYYYMMDD 2024-02-01
PublicationDate_xml – month: 02
  year: 2024
  text: 2024-02-01
  day: 01
PublicationDecade 2020
PublicationTitle 西北工业大学学报
PublicationTitle_FL Journal of Northwestern Polytechnical University
PublicationYear 2024
Publisher 航天飞行动力学技术国家级重点实验室,陕西 西安 710072%上海宇航系统工程研究所,上海 201108
西北工业大学 航天学院,陕西 西安 710072
Publisher_xml – name: 航天飞行动力学技术国家级重点实验室,陕西 西安 710072%上海宇航系统工程研究所,上海 201108
– name: 西北工业大学 航天学院,陕西 西安 710072
SSID ssib059104284
ssib001129888
ssib046626106
ssib036436219
ssib044765131
ssib044604139
ssib051375596
ssib002258180
Score 2.3723547
Snippet V448.2; 针对航天器与非合作目标追逃博弈的生存型微分对策拦截问题,基于强化学习研究了追逃博弈策略,提出了自适应增强随机搜索(adaptive-augmented random...
SourceID wanfang
SourceType Aggregation Database
StartPage 117
Title 基于自适应增强随机搜索的航天器追逃博弈策略研究
URI https://d.wanfangdata.com.cn/periodical/xbgydxxb202401015
Volume 42
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3NT9VAEG8QLl6MRo3fIcY9kUq73e7HsfsoISYSo5hwI21p8fQ0AgE5ETWRix8HNJEQNUQTIfFmPLxg_GcsD_4LZ7bta-FxUC_NvpnZ2Zn5bbu7fbtTy7ohWMZFGgvbj2Jus0RRO45ZZnvMy9LIgwmESZJ0e5JP3Ge3pv3pgcGl5umShfhmsnLsuZL_QRVogCuekv0HZHtKgQBlwBeugDBc_wpjEvpEjRMdkJDhVYYklASW9wFQFG5ikBRlgKUYFgJKVGgoLVNLEQW1xknIiSooUHCwHAqiGcpDAWWY0SyNZtADLGVaVySQyNJgxljVqIcsOYYVi7agIugJgMKNQtDgG4pjDBOoTfPmRBl1ggyoRVUtooRRJUxFcFaWytESwwLlAW8UOJEBCFcdChsBJwqOGiPaqTngmMY4Akc7qLpfxMewQECQwzHClYO1CFT2SBBWIi1jr1dpKV-sUFbtxS5vhePCCjH1DE5Ad0tN6IwsC0of9ROCXkAYmHiBAILhoznaRBygxTApNB3sxmBQolUpg20ptKFoCyhgCW2Z_qEQrQqMkUOwgJxUI5i4SVBCK1yk6UXagFUKiYaXwgRFm0IR9R6sAl2Uur9jGC9N10KjjrYyYmaXzeEV8xhQUSTrr8ZfRvueM8Vg6hanest5mVskEegb8mFUwSG_vfRoEU83AZCMuk6v7qFc6svx3JPZ5eUYpTC5on_CGqJC4P6OIR1O3rlbryRgHiybmfyoj1kRqt8eTOQ5rTMhMsYd1kiwy5jgvlv_wc44p7yxoQF4Atby9W-YtTNqvo_ei1G1B8N3R413ow3fzBHDdha15xqz4anT1qlyGTscFM-kM9bAyoOzVph_7PzuvNp_sXOw-jTvrOdbH_LdzsHG673Nzt6bze73re7G8_21nfzzdv7-6_6vnwerz_KXG_nuWvfbu-7bL91P693tH-ese-PhVGvCLj_TYs-D0b4NKyyPJ7NJLHnCYtdVicw8fG-RMD9NBCziUi9OU57yzHFmI8_lieAKXHGyyE2889Zg-2E7vWANK5h9uTLyVZYw5kF0qaI0i2CB6gpwXFy0rpdOz5QP4fmZPkQv_Y3QZetkfcdfsQYXHi-mV2FxsRBfKzvCHyo-018
linkProvider ProQuest
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=%E5%9F%BA%E4%BA%8E%E8%87%AA%E9%80%82%E5%BA%94%E5%A2%9E%E5%BC%BA%E9%9A%8F%E6%9C%BA%E6%90%9C%E7%B4%A2%E7%9A%84%E8%88%AA%E5%A4%A9%E5%99%A8%E8%BF%BD%E9%80%83%E5%8D%9A%E5%BC%88%E7%AD%96%E7%95%A5%E7%A0%94%E7%A9%B6&rft.jtitle=%E8%A5%BF%E5%8C%97%E5%B7%A5%E4%B8%9A%E5%A4%A7%E5%AD%A6%E5%AD%A6%E6%8A%A5&rft.au=%E7%84%A6%E6%9D%B0&rft.au=%E8%8B%9F%E6%B0%B8%E6%9D%B0&rft.au=%E5%90%B4%E6%96%87%E5%8D%9A&rft.au=%E6%B3%AE%E6%96%8C%E5%B3%B0&rft.date=2024-02-01&rft.pub=%E8%88%AA%E5%A4%A9%E9%A3%9E%E8%A1%8C%E5%8A%A8%E5%8A%9B%E5%AD%A6%E6%8A%80%E6%9C%AF%E5%9B%BD%E5%AE%B6%E7%BA%A7%E9%87%8D%E7%82%B9%E5%AE%9E%E9%AA%8C%E5%AE%A4%2C%E9%99%95%E8%A5%BF+%E8%A5%BF%E5%AE%89+710072%25%E4%B8%8A%E6%B5%B7%E5%AE%87%E8%88%AA%E7%B3%BB%E7%BB%9F%E5%B7%A5%E7%A8%8B%E7%A0%94%E7%A9%B6%E6%89%80%2C%E4%B8%8A%E6%B5%B7+201108&rft.issn=1000-2758&rft.volume=42&rft.issue=1&rft.spage=117&rft.epage=128&rft_id=info:doi/10.1051%2Fjnwpu%2F20244210117&rft.externalDocID=xbgydxxb202401015
thumbnail_s http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=http%3A%2F%2Fwww.wanfangdata.com.cn%2Fimages%2FPeriodicalImages%2Fxbgydxxb%2Fxbgydxxb.jpg