发表状态 | 已发表Published |
题名 | MLFormer: a high performance MPC linear inference framework for transformers |
作者 | |
发表日期 | 2025-04-01 |
发表期刊 | Journal of Cryptographic Engineering
![]() |
ISSN/eISSN | 2190-8508 |
卷号 | 15期号:1 |
摘要 | Transformer-based models are widely used in natural language processing tasks, and their application has been further extended to computer vision as well. In their usage, data security has become a crucial concern when deploying deep learning services on cloud platforms. To address these security concerns, Multi-party computation (MPC) is employed to prevent data and model leakage during the inference process. However, Transformer model introduces several challenges for MPC computation, including the time overhead of the Softmax (normalized exponential) function, the accuracy issue caused by the “dynamic range” of approximated division and exponential, and the high memory overhead when processing long sequences. To overcome these challenges, we propose MLformer, an MPC-based inference framework for transformer models based on Crypten Knott et al. (Adv Neural Inf Process Syst 34: 4961–4973, 2021), a secure machine learning framework suggested by Facebook AI Research group, in the semi-honest adversary model. In this framework, we replace the softmax attention with linear attention, which has linear time and memory complexity with input length. The modification eliminates the softmax function entirely, resulting in lower time and memory overhead. To ensure the accuracy of linear attention, we propose the scaled linear attention to address the dynamic range issue caused by the MPC division used and a new approximate division function is proposed to reduce the computational time of the attention block. Furthermore, to improve the efficiency and accuracy of MPC exponential and reciprocal which are commonly used in transformer model, we propose a novel MPC exponential protocol and first integrate the efficient reciprocal protocol Bar-Ilan and Beaver (in Proceedings of the 8th annual ACM symposium on principles of distributed computing, pp. 201–209, 1989) to our framework. Additionally, we optimize the computation of causal linear attention, which is utilized in private inference of auto-regression tasks, using our novel CUDA kernel functions. All the proceeding optimizations contribute to the construction of a more accurate and efficient framework. The experimental results demonstrate that our framework achieves comparable accuracy with reduced inference time and GPU memory overhead compared to the original transformer model. The speedup reaches 78.79% compared to traditional private transformer with input length of 1024 patches. |
关键词 | GPU Linear transformer Multi-party computation Parallel processing Private inference |
DOI | 10.1007/s13389-024-00365-1 |
URL | 查看来源 |
收录类别 | SCIE |
语种 | 英语English |
WOS研究方向 | Computer Science |
WOS类目 | Computer Science, Theory & Methods |
WOS记录号 | WOS:001358922700001 |
Scopus入藏号 | 2-s2.0-85209581974 |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | https://repository.uic.edu.cn/handle/39GCC9TT/12802 |
专题 | 理工科技学院 |
通讯作者 | Chen, Donglong |
作者单位 | 1.Guangdong Provincial Key Laboratory of IRADS,BNU-HKBU United International College,Zhuhai,519000,China 2.Hangzhou Innovation Institute of Beihang University,Hangzhou,311121,China 3.Zhejiang Lab,Hangzhou,310000,China 4.Sun Yat-sen University,Shenzhen,518107,China 5.Nanjing University of Aeronautics and Astronautics,Nanjing,210000,China 6.City University of Hong Kong,310000,Hong Kong,China 7.Iǧdır University,Turkey,and University of California Santa Barbara,Santa Barbara,United States |
第一作者单位 | 北师香港浸会大学 |
通讯作者单位 | 北师香港浸会大学 |
推荐引用方式 GB/T 7714 | Liu, Siqi,Liu, Zhusen,Chen, Donglonget al. MLFormer: a high performance MPC linear inference framework for transformers[J]. Journal of Cryptographic Engineering, 2025, 15(1). |
APA | Liu, Siqi., Liu, Zhusen., Chen, Donglong., Dai, Wangchen., Zhou, Lu., .. & Koç, Çetin Kaya. (2025). MLFormer: a high performance MPC linear inference framework for transformers. Journal of Cryptographic Engineering, 15(1). |
MLA | Liu, Siqi,et al."MLFormer: a high performance MPC linear inference framework for transformers". Journal of Cryptographic Engineering 15.1(2025). |
条目包含的文件 | 条目无相关文件。 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论