Enhancing HyperAttention: A Novel Approach for Improved Algorithmic Efficiency

doi:10.1145/3705956.3705968

科研成果详情

题名	Enhancing HyperAttention: A Novel Approach for Improved Algorithmic Efficiency
作者	Xie, Xinyi 1; Ding, Yi 2; Jiang, Chao 3; Hang, Tianyi 4; Lu, Tianrun 5
发表日期	2024-12-28
会议名称	HPCCT 2024: 2024 8th High Performance Computing and Cluster Technologies Conference (HPCCT)
会议录名称	HPCCT '24: Proceedings of the 2024 8th High Performance Computing and Cluster Technologies Conference
ISBN	9798400716881
页码	18-23
会议日期	July 5 - 7, 2024
会议地点	Beijing China
出版者	ACM
摘要	HyperAttention is a new attention mechanism proposed to address the inevitability of quadratic time complexity and the conditionality of linear time when dealing with extended contexts, which allows the algorithm to maintain the computational complexity in linear time despite having large entries or a large stability rank in the matrix. HyperAttention improves the speed of inference and training of the large language model, with only a slight performance degradation and a significant speedup compared to FlashAttention. In order to further improve the accuracy of HyperAttention while maintaining its speed, we optimised each module for the original algorithm to improve its efficiency. Our main contribution is to change the locally sensitive hash algorithm used in HyperAttention for detecting large values and instead use k-means for indexing, further reducing the model’s perplexity. In addition, we add applications on chatglm3-6b and baichuan-7b as well as chatglm3-6b-32k which supports long contexts in practice to make up for the lack of informativeness of HyperAttention in terms of practical applications, and test results are given. The experiments reveal that although the improved algorithm has a longer and relatively stable time complexity, it significantly reduces perplexity. Notably, with 10 to 15 replacement layers, the increase in time complexity is less than the reduction in perplexity, achieving a favourable balance between performance and efficiency. Moreover, we also analyse the feasibility of other sampling methods to improve accuracy and discuss the strategy for selecting the patch layers to apply the fast algorithm.
关键词	Hyper Attention K-means Transformer
DOI	10.1145/3705956.3705968
URL	查看来源
语种	英语English
Scopus入藏号	2-s2.0-85216584804
引用统计
文献类型	会议论文
条目标识符	https://repository.uic.edu.cn/handle/39GCC9TT/12552
专题	北师香港浸会大学
通讯作者	Xie, Xinyi
作者单位	1.Faculty of Science and Technology,Beijing Normal University,Hong Kong Baptist University United International College,Zhuhai,China 2.School of Information Science and Technology,Xiamen University Tan Kan Kee College,Zhangzhou,China 3.School of Computer Science and Technology,Dongguan University of Technology,Dongguan,China 4.School of Information Science and Technology,Xi'an Jiaotong University,Xi'an,China 5.Jinling College Nanjing University,Hebei,China
第一作者单位	理工科技学院
通讯作者单位	理工科技学院
推荐引用方式 GB/T 7714	Xie, Xinyi,Ding, Yi,Jiang, Chaoet al. Enhancing HyperAttention: A Novel Approach for Improved Algorithmic Efficiency[C]: ACM, 2024: 18-23.