题名 | Enhancing LLM QoS Through Cloud-Edge Collaboration: A Diffusion-Based Multi-Agent Reinforcement Learning Approach |
作者 | |
发表日期 | 2025 |
发表期刊 | IEEE Transactions on Services Computing
![]() |
卷号 | 18期号:3页码:1412-1427 |
摘要 | Large Language Models (LLMs) are widely used across various domains, but deploying them in cloud data centers often leads to significant response delays and high costs, undermining Quality of Service (QoS) at the network edge. Although caching LLM request results at the edge using vector databases can greatly reduce response times and costs for similar requests, this approach has been overlooked in prior research. To address this, we propose a novel Vector database-assisted cloud-Edge collaborative LLM QoS Optimization (VELO) framework that caches LLM request results at the edge using vector databases, thereby reducing response times for subsequent similar requests. Unlike methods that modify LLMs directly, VELO leaves the LLM's internal structure intact and is applicable to various LLMs. Building on VELO, we formulate the QoS optimization problem as a Markov Decision Process (MDP) and design an algorithm based on Multi-Agent Reinforcement Learning (MARL). Our algorithm employs a diffusion-based policy network to extract the LLM request features, determining whether to request the LLM in the cloud or retrieve results from the edge's vector database. Implemented in a real edge system, our experimental results demonstrate that VELO significantly enhances user satisfaction by simultaneously reducing delays and resource consumption for edge users of LLMs. Our DLRS algorithm improves performance by 15.0% on average for similar requests and by 14.6% for new requests compared to the baselines. |
关键词 | diffusion model Edge computing multi-agent reinforcement learning request scheduling vector database |
DOI | 10.1109/TSC.2025.3562362 |
URL | 查看来源 |
语种 | 英语English |
Scopus入藏号 | 2-s2.0-105007982160 |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | https://repository.uic.edu.cn/handle/39GCC9TT/13733 |
专题 | 北师香港浸会大学 |
通讯作者 | Tang,Zhiqing; Jia,Weijia |
作者单位 | 1.Beijing Normal University,School of Artificial Intelligence,Beijing,100875,China 2.Beijing Normal University,Institute of Artificial Intelligence and Future Networks,Zhuhai,519087,China 3.Beijing Normal-Hong Kong Baptist University,Guangdong Key Lab of AI and Multi-Modal Data Processing,Zhuhai,519087,China |
通讯作者单位 | 北师香港浸会大学 |
推荐引用方式 GB/T 7714 | Yao,Zhi,Tang,Zhiqing,Yang,Wenmianet al. Enhancing LLM QoS Through Cloud-Edge Collaboration: A Diffusion-Based Multi-Agent Reinforcement Learning Approach[J]. IEEE Transactions on Services Computing, 2025, 18(3): 1412-1427. |
APA | Yao,Zhi, Tang,Zhiqing, Yang,Wenmian, & Jia,Weijia. (2025). Enhancing LLM QoS Through Cloud-Edge Collaboration: A Diffusion-Based Multi-Agent Reinforcement Learning Approach. IEEE Transactions on Services Computing, 18(3), 1412-1427. |
MLA | Yao,Zhi,et al."Enhancing LLM QoS Through Cloud-Edge Collaboration: A Diffusion-Based Multi-Agent Reinforcement Learning Approach". IEEE Transactions on Services Computing 18.3(2025): 1412-1427. |
条目包含的文件 | 条目无相关文件。 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论