Enhancing LLM QoS Through Cloud-Edge Collaboration: A Diffusion-Based Multi-Agent Reinforcement Learning Approach

doi:10.1109/TSC.2025.3562362

科研成果详情

题名	Enhancing LLM QoS Through Cloud-Edge Collaboration: A Diffusion-Based Multi-Agent Reinforcement Learning Approach
作者	Yao，Zhi 1,2; Tang，Zhiqing 2; Yang，Wenmian 2; Jia，Weijia 2,3
发表日期	2025
发表期刊	IEEE Transactions on Services Computing
卷号	18 期号:3 页码:1412-1427
摘要	Large Language Models (LLMs) are widely used across various domains, but deploying them in cloud data centers often leads to significant response delays and high costs, undermining Quality of Service (QoS) at the network edge. Although caching LLM request results at the edge using vector databases can greatly reduce response times and costs for similar requests, this approach has been overlooked in prior research. To address this, we propose a novel Vector database-assisted cloud-Edge collaborative LLM QoS Optimization (VELO) framework that caches LLM request results at the edge using vector databases, thereby reducing response times for subsequent similar requests. Unlike methods that modify LLMs directly, VELO leaves the LLM's internal structure intact and is applicable to various LLMs. Building on VELO, we formulate the QoS optimization problem as a Markov Decision Process (MDP) and design an algorithm based on Multi-Agent Reinforcement Learning (MARL). Our algorithm employs a diffusion-based policy network to extract the LLM request features, determining whether to request the LLM in the cloud or retrieve results from the edge's vector database. Implemented in a real edge system, our experimental results demonstrate that VELO significantly enhances user satisfaction by simultaneously reducing delays and resource consumption for edge users of LLMs. Our DLRS algorithm improves performance by 15.0% on average for similar requests and by 14.6% for new requests compared to the baselines.
关键词	diffusion model Edge computing multi-agent reinforcement learning request scheduling vector database
DOI	10.1109/TSC.2025.3562362
URL	查看来源
语种	英语English
Scopus入藏号	2-s2.0-105007982160
引用统计
文献类型	期刊论文
条目标识符	https://repository.uic.edu.cn/handle/39GCC9TT/13733
专题	北师香港浸会大学
通讯作者	Tang，Zhiqing; Jia，Weijia
作者单位	1.Beijing Normal University,School of Artificial Intelligence,Beijing,100875,China 2.Beijing Normal University,Institute of Artificial Intelligence and Future Networks,Zhuhai,519087,China 3.Beijing Normal-Hong Kong Baptist University,Guangdong Key Lab of AI and Multi-Modal Data Processing,Zhuhai,519087,China
通讯作者单位	北师香港浸会大学
推荐引用方式 GB/T 7714	Yao，Zhi,Tang，Zhiqing,Yang，Wenmianet al. Enhancing LLM QoS Through Cloud-Edge Collaboration: A Diffusion-Based Multi-Agent Reinforcement Learning Approach[J]. IEEE Transactions on Services Computing, 2025, 18(3): 1412-1427.
APA	Yao，Zhi, Tang，Zhiqing, Yang，Wenmian, & Jia，Weijia. (2025). Enhancing LLM QoS Through Cloud-Edge Collaboration: A Diffusion-Based Multi-Agent Reinforcement Learning Approach. IEEE Transactions on Services Computing, 18(3), 1412-1427.
MLA	Yao，Zhi,et al."Enhancing LLM QoS Through Cloud-Edge Collaboration: A Diffusion-Based Multi-Agent Reinforcement Learning Approach". IEEE Transactions on Services Computing 18.3(2025): 1412-1427.