MLFormer: a high performance MPC linear inference framework for transformers

doi:10.1007/s13389-024-00365-1

科研成果详情

发表状态	已发表Published
题名	MLFormer: a high performance MPC linear inference framework for transformers
作者	Liu, Siqi 1,6; Liu, Zhusen 2; Chen, Donglong1,3 ; Dai, Wangchen 4; Zhou, Lu 5; Liu, Zhe 3; Cheung, Ray C.C.6; Koç, Çetin Kaya 5,7
发表日期	2025-04-01
发表期刊	Journal of Cryptographic Engineering
ISSN/eISSN	2190-8508
卷号	15 期号:1
摘要	Transformer-based models are widely used in natural language processing tasks, and their application has been further extended to computer vision as well. In their usage, data security has become a crucial concern when deploying deep learning services on cloud platforms. To address these security concerns, Multi-party computation (MPC) is employed to prevent data and model leakage during the inference process. However, Transformer model introduces several challenges for MPC computation, including the time overhead of the Softmax (normalized exponential) function, the accuracy issue caused by the “dynamic range” of approximated division and exponential, and the high memory overhead when processing long sequences. To overcome these challenges, we propose MLformer, an MPC-based inference framework for transformer models based on Crypten Knott et al. (Adv Neural Inf Process Syst 34: 4961–4973, 2021), a secure machine learning framework suggested by Facebook AI Research group, in the semi-honest adversary model. In this framework, we replace the softmax attention with linear attention, which has linear time and memory complexity with input length. The modification eliminates the softmax function entirely, resulting in lower time and memory overhead. To ensure the accuracy of linear attention, we propose the scaled linear attention to address the dynamic range issue caused by the MPC division used and a new approximate division function is proposed to reduce the computational time of the attention block. Furthermore, to improve the efficiency and accuracy of MPC exponential and reciprocal which are commonly used in transformer model, we propose a novel MPC exponential protocol and first integrate the efficient reciprocal protocol Bar-Ilan and Beaver (in Proceedings of the 8th annual ACM symposium on principles of distributed computing, pp. 201–209, 1989) to our framework. Additionally, we optimize the computation of causal linear attention, which is utilized in private inference of auto-regression tasks, using our novel CUDA kernel functions. All the proceeding optimizations contribute to the construction of a more accurate and efficient framework. The experimental results demonstrate that our framework achieves comparable accuracy with reduced inference time and GPU memory overhead compared to the original transformer model. The speedup reaches 78.79% compared to traditional private transformer with input length of 1024 patches.
关键词	GPU Linear transformer Multi-party computation Parallel processing Private inference
DOI	10.1007/s13389-024-00365-1
URL	查看来源
收录类别	SCIE
语种	英语English
WOS研究方向	Computer Science
WOS类目	Computer Science, Theory & Methods
WOS记录号	WOS:001358922700001
Scopus入藏号	2-s2.0-85209581974
引用统计
文献类型	期刊论文
条目标识符	https://repository.uic.edu.cn/handle/39GCC9TT/12802
专题	理工科技学院
通讯作者	Chen, Donglong
作者单位	1.Guangdong Provincial Key Laboratory of IRADS,BNU-HKBU United International College,Zhuhai,519000,China 2.Hangzhou Innovation Institute of Beihang University,Hangzhou,311121,China 3.Zhejiang Lab,Hangzhou,310000,China 4.Sun Yat-sen University,Shenzhen,518107,China 5.Nanjing University of Aeronautics and Astronautics,Nanjing,210000,China 6.City University of Hong Kong,310000,Hong Kong,China 7.Iǧdır University,Turkey,and University of California Santa Barbara,Santa Barbara,United States
第一作者单位	北师香港浸会大学
通讯作者单位	北师香港浸会大学
推荐引用方式 GB/T 7714	Liu, Siqi,Liu, Zhusen,Chen, Donglonget al. MLFormer: a high performance MPC linear inference framework for transformers[J]. Journal of Cryptographic Engineering, 2025, 15(1).
APA	Liu, Siqi., Liu, Zhusen., Chen, Donglong., Dai, Wangchen., Zhou, Lu., .. & Koç, Çetin Kaya. (2025). MLFormer: a high performance MPC linear inference framework for transformers. Journal of Cryptographic Engineering, 15(1).
MLA	Liu, Siqi,et al."MLFormer: a high performance MPC linear inference framework for transformers". Journal of Cryptographic Engineering 15.1(2025).