Accelerating RSA with fine-grained parallelism using GPU

doi:10.1007/978-3-319-17533-1_31

科研成果详情

题名	Accelerating RSA with fine-grained parallelism using GPU
作者	Yang, Yang 1,2,3; Guan, Zhi 1,2,3; Sun, Huiping 2,3,4; Chen, Zhong1,2,3
发表日期	2015
会议名称	11th International Conference on Information Security Practice and Experience, ISPEC 2015
会议录名称	Information Security Practice and Experience: 11th International Conference, ISPEC 2015, Beijing, China, May 5-8, 2015, Proceedings
会议录编者	Javier Lopez, Yongdong Wu
ISBN	9783319175331;9783319175324
ISSN	0302-9743
卷号	Lecture Notes in Computer Science (LNCS, volume 9065)
页码	454-468
会议日期	May 5-8, 2015
会议地点	Beijing, China
出版地	Cham
出版者	Springer
摘要	RSA is a public key cryptography widely used for end-to-end authentication and key exchange in various Internet protocols, such as SSL and TLS. Compared with symmetric cryptography, the cryptographic operations in RSA is much more time consuming. This brings pressure on performance to service providers using secure protocols, and hinders these protocols from being more widely used. Graphics Processing Units (GPUs) are increasingly used for intensive data parallelism general purpose computing. GPUs often provide better throughput than CPUs at the same cost. In this paper, we propose a new approach to parallelize Montgomery multiplication under the Single Instruction Multiple Thread (SIMT) threading model of GPUs, and construct a parallel RSA implementation based on this approach, combining with other optimization techniques both in the algorithmic level and implementation level. The performance evaluation shows our RSA implementation achieves a record-breaking latency for RSA decryption implementations on GPUs: 2.6 ms for RSA-1024 and 6.5 ms for RSA-2048. The peak throughtput of decryptions per second of our implementation reaches 5,244 for RSA-2048 and 34,981 for RSA-1024 respectively, which is much faster than existing integer-based implementations. The peak throughput of our implementation is slightly slower than the fastest floating-point based implementation, while the latency of our implementation is 3 times faster.
关键词	CRT CUDA GPGPU Montgomery multiplication RSA
DOI	10.1007/978-3-319-17533-1_31
URL	查看来源
收录类别	CPCI-S
语种	英语English
WOS研究方向	Computer Science
WOS类目	Computer Science, Information SystemsComputer Science, Theory & Methods
WOS记录号	WOS:000363247500031
Scopus入藏号	2-s2.0-84942523725
引用统计
文献类型	会议论文
条目标识符	https://repository.uic.edu.cn/handle/39GCC9TT/13516
专题	个人在本单位外知识产出
通讯作者	Guan, Zhi
作者单位	1.Institute of Software, School of EECS, Peking University,China 2.MoE Key Lab of High Confidence Software Technologies (PKU),China 3.MoE Key Lab of Network and Software Security Assurance (PKU),China 4.School of Software and Microelectronics, Peking University,China
推荐引用方式 GB/T 7714	Yang, Yang,Guan, Zhi,Sun, Huipinget al. Accelerating RSA with fine-grained parallelism using GPU[C]//Javier Lopez, Yongdong Wu. Cham: Springer, 2015: 454-468.