【CICC原创】多智能体博弈中的分布式学习：原理与算法

作者：中国指挥与控制学会发布时间：2024-10-13

（《指挥与控制学报》刊文精选）

引用格式谭少林，谷海波，刘克新. 多智能体博弈中的分布式学习：原理与算法 [J]. 指挥与控制学报，2024，10（2）：127-136

TAN S L, GU H B, LIU K X. Distributed learning for multi-agent games: theory and algorithms[J]. Journal of Command and Control, 2024, 10（2）: 127-136

摘要

自主智能决策是未来无人系统发展的核心技术，而博弈学习是实现自主智能决策的关键方法之一。围绕多智能体博弈中分布式学习领域，系统地介绍其基本问题、研究背景及意义；针对连续动作空间博弈与离散动作空间博弈两种典型博弈类型，综述多智能体博弈分布式学习算法的构建及收敛性研究进展；给出博弈学习领域尚待突破的挑战性问题。

在计算机、通信、电子、传感及控制等诸多领域飞速发展的推动下，现代生产生活工具已经从机械化、电气化、信息化，逐步朝智能化方向发展[1-2]。智能化时代，不仅意味着单体智能，即单个机器具备感知、决策、控制等方面的智能水平，也意味着群体智能，即众多机器、装备等连接的整体呈现出协同、涌现等有组织、系统层次的智能。分布式智能系统是群体智能的典型体现，由具备收集信息、作出决策、产生信息的机器（装备），通过分布式通信网络连接所形成的整体，能够以协作的方式完成系统层级的任务。一些代表性的分布式智能系统包括分布式传感网络、集群机器人作业系统、智慧交通系统等。

分布式智能系统具备两个典型的特征。1）局部计算。指系统中每个组成单元需要内嵌某种形式的计算机来满足信息处理、决策生成等需求。2）网络通信。指系统各单元通过静态或动态的通信网络互联，以实现信息交互、规避冲突等功能。与集中式任务解决方案相比，分布式智能系统能够规避集中式超大规模计算需求，在数据隐私保护与安全、结构灵活性、功能鲁棒性、性价比等方面具有显著优势。

分布式智能系统的发展也面临着诸多挑战。在功能层面上，其中一个核心问题是：各个进行独立决策的单元之间如何通过与相邻单元进行交互实现协同作业目的[3-5]。例如，在传感器覆盖问题中，每个传感器需要对其探测区域选取进行决策，最终目标是最大化整个区域内某一事件的探测概率。那么这些智能化传感器如何进行独立决策，并依据局部通信进行协同，则是其功能层面上的核心问题。

分布式智能协同决策问题的研究方法一般分为两类。第1类是分布式优化方法，即将协同决策问题建模为优化问题：每个单元具有自己独立的动作集合，各单元之间需要进行动作协同来最大化总体性能指标。在该方法中，分布式智能系统的组成单元被视为利益无关的执行者，其目标是调整自身动作，优化系统性能。第2类方法是多智能体博弈方法。该方法将协同决策问题建模为非合作博弈问题：每个单元具有自己独立的动作集合以及目标函数，各单元之间需要进行动作协同实现均衡。在该方法中，分布式智能系统的组成单元是利益相关方，其目标是调整自身动作，优化自身收益。

多智能体博弈中的分布式学习的研究得到了快速发展[6-12]。针对连续动作空间博弈和离散动作空间博弈等典型博弈模型，已经发展了多种典型的分布式学习算法，并在各类分布式协同决策问题中取得了成功应用。文献[13]从游戏博弈的角度出发，延伸到作战指挥中，对智能决策问题展开了详细的探讨。文献[14]考察了陆战对抗中的智能体博弈策略生成方法。而文献[15]系统性地建立了空间轨道博弈的基本概念、原理与方法。本文旨在对多智能体博弈学习领域所取得的典型结果进行一个阶段性的综述，阐明各算法的构造方法及收敛性质，并对该领域尚待突破的挑战性问题进行展望。

References

[1]　SHOHAM Y, LEYTON B K. Multiagent systems: algorithmic, game-theoretic, and logical foundations[M]. New York: Cambridge University Press, 2008.

[2]　虞文武, 温广辉, 陈关荣, 等. 多智能体系统分布式协同控制 [M]. 北京: 高等教育出版社, 2016.

YU W W, WEN G H, CHEN G R, et al. Distributed cooperative control of multi-agent system[M]. Beijing: Higher Education Press, 2016. （in Chinese）

[3]　OH K K, PARK M C, AHN H S. A survey of multi-agent formation control[J]. Automatica, 2015, 53: 424-440.

[7]　TAN S, WANG Y. A payoff-based learning approach for Nash equilibrium seeking in continuous potential games[J]. Neurocomputing, 2022, 468: 431-440.

[8]　SHAMMA J S, ARSLAN G. Dynamic fictitious play, dynamic gradient play, and distributed convergence to Nash equilibria[J]. IEEE Transactions on Automatic Control, 2005, 50（3）: 312-1327.

[9]　CORTES A, MARTINEZ S. Self-triggered best-response dynamics for continuous games[J]. IEEE Transactions on Automatic Control, 2015, 60（4）: 1115-1120.

[12] DENG Z, NIAN X. Distributed generalized Nash equilibrium seeking algorithm design for aggregative games over weight-balanced digraphs[J]. IEEE Transactions on Neural Networks and Learning Systems, 2019, 30（3）: 695-706.

[13] 胡晓峰, 齐大伟. 智能决策问题探讨——从游戏博弈到作战指挥, 距离还有多远 [J]. 指挥与控制学报, 2020, 6（4）: 356-363.

HU X F, QI D W. On problems of intelligent decision-making——how far is it from game-playing to operational command[J]. Journal of Command and Control, 2020, 6（4）: 356-363. （in Chinese）

[14] 王玉宾, 孙怡峰, 吴疆, 等. 陆战对抗中的智能体博弈策略生成方法 [J]. 指挥与控制学报, 2022, 8（4）: 441-450.

WANG Y B, SUN Y F, WU J, et al. An agent game strategy generation method for land warfare[J]. Journal of Command and Control, 2022, 8（4）: 441-450. （in Chinese）

[15] 赵力冉, 党朝辉, 张育林. 空间轨道博弈: 概念、原理与方法 [J]. 指挥与控制学报, 2021, 7（3）: 215-224.

ZHAO L R, DANG Z H, ZHANG Y L. Orbital game: concepts, principles and methods[J]. Journal of Command and Control, 2021, 7（3）: 215-224. （in Chinese）

[16] NISIAN N, ROUGHGARDEN T, TARDOS E，et al. Algorithmic game theory[M]. USA: Cambridge University Press, 2007.

[17] PAPADIMITRIOU C H. On the complexity of the parity argument and other inefficient proofs of existence[J]. Journal of Computer and System Sciences, 1994, 48（3）: 498-532.

[18] BARREIRO G J, OBANDO G, QUIJANO N. Distributed population dynamics: optimization and control applications[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2017, 47（2）: 304-314.

[19] MARDEN J R, YOUNG H P, ARSLAN G. Payoff-based dynamics for multiplayer weakly acyclic games[J]. SIAM Journal of Control and Optimization, 2009, 48（1）: 373-396.

[20] YOUNG H P. Learning by trial and error[J]. Games and Economic Behavior, 2009, 65（2）: 626-643.

[21] TAN S, FANG Z, WANG Y, et al. Consensus-based multi-population game dynamics for distributed Nash equilibria seeking and optimization[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2023, 53（2）: 813-823.

[24] BASAR T, OLSDER G. Dynamic noncooperative game theory[M]. Philadelphia, PA, USA: SIAM, 1999.

[25] MERTIKOPOULOS P, ZHOU Z Y. Learning in games with continuous action sets and unknown payoff functions[J]. Mathematical Programming, 2019, 173（1-2）: 465-507.

[26] MAZUMDAR E, RATLIFF L J, SASTRY S S. On gradient-based learning in continuous games[J]. SIAM Journal on Mathematics of Data Science, 2020, 2（1）: 103-131.

[27] SCUTARI G, PALOMAR D P, FACCHINEI F. Convex optimization, game theory, and variational inequality theory[J]. IEEE Signal Processing Magine, 2010, 27（3）: 35-49.

[30] GADJOV D, PAVEL L. A passivity-based approach to Nash equilibrium seeking over networks[J]. IEEE Transactions on Automatic Control, 2019, 64（3）: 1077-1092.

[33] BIANCHI M, GRAMMATICO S. Fully distributed Nash equilibrium seeking over time-varying communication networks with linear convergence rate[J]. IEEE Control System Letters, 2021, 5（2）: 499-504.

[35] SALEHISADAGHIANI F, PAVEL L. Distributed Nash equilibrium seeking: a gossip-based algorithm[J]. Automatica, 2016, 72: 209-216.

[36] TAN S, WANG Y. Graphical Nash equilibria and replicator dynamics on complex networks[J]. IEEE Transactions on Ne-

ural Networks and Learning Systems, 2020, 31（6）: 1831-1842.

[38] GADJOV D, PAVEL L. Distributed Nash equilibrium seeking resilient to adversaries[C]// Proceedings of the 60th IEEE Conference on Decision and Control, Austin, TX, USA, 2021: 191-196.

[40] FRIHAUF P, KRSTIC M, BASAR T. Nash equilibrium seeking in noncooperative games[J]. IEEE Transaction on Automatic Control, 2012, 57（5）: 1192-1207.

[43] MENG Q, NIAN X, CHEN Y, et al. Attack-resilient distributed Nash equilibrium seeking of uncertain multiagent systems over unreliable communication networks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022: 1-15.

[46] YE M, LI D, HAN Q, et al. Distributed Nash equilibrium seeking for general networked games with bounded disturbances[J]. IEEE/CAA Journal of Automatica Sinica, 2023, 10（2）: 376-387.

[47] MARDEN J R, ARSLAN G, SHAMMA J S. Cooperative control and potential games[J]. IEEE Transactions on Systems, Man, and Cybernetics-B, 2009, 39（6）: 1393-1407.

[48] MARDEN J R, ARSLAN G, SHAMMA J S. Joint strategy fictitious play with inertia for potential games[J]. IEEE Transactions on Automatic Control, 2009, 54（2）: 208-220.

[49] MARDEN J R, SHAMMA J S. Revisiting log-linear learning: asynchrony, completeness and payoff-based implementation[J]. Games and Economic Behavior, 2012, 75（2）: 788-808.

[52] 谷海波, 刘克新, 吕金虎. 集群系统协同控制: 机遇与挑战 [J]. 指挥与控制学报, 2021, 7（1）: 1-10.

GU H B, LIU K X, LYU J H. Cooperative control of swarm systems: opportunities and challenges[J]. Journal of Command and Control, 2021, 7（1）: 1-10. （in Chinese）

[53] 刘学达, 何明, 禹明刚, 等. 基于演化博弈的无人机集群协同应用 [J]. 指挥与控制学报, 2021, 7（2）: 167-173.

LIU X D, HE M, YU M G, et al. Collaborative application of UAV swarm based on evolutionary game[J]. Journal of Command and Control, 2021, 7（2）: 167-173. （in Chinese）

关注公众号了解更多

会员申请请在公众号内回复“个人会员”或“单位会员

欢迎关注中国指挥与控制学会媒体矩阵

CICC官方抖音

CICC头条号

CICC微博号

CICC官方网站

CICC官方微信公众号

《指挥与控制学报》官网

国际无人系统大会官网

中国指挥控制大会官网

全国兵棋推演大赛