科研成果详情

题名Enhancing HyperAttention: A Novel Approach for Improved Algorithmic Efficiency
作者
发表日期2024-12-28
会议名称HPCCT 2024: 2024 8th High Performance Computing and Cluster Technologies Conference (HPCCT)
会议录名称HPCCT '24: Proceedings of the 2024 8th High Performance Computing and Cluster Technologies Conference
ISBN9798400716881
页码18-23
会议日期July 5 - 7, 2024
会议地点Beijing China
出版者ACM
摘要

HyperAttention is a new attention mechanism proposed to address the inevitability of quadratic time complexity and the conditionality of linear time when dealing with extended contexts, which allows the algorithm to maintain the computational complexity in linear time despite having large entries or a large stability rank in the matrix. HyperAttention improves the speed of inference and training of the large language model, with only a slight performance degradation and a significant speedup compared to FlashAttention. In order to further improve the accuracy of HyperAttention while maintaining its speed, we optimised each module for the original algorithm to improve its efficiency. Our main contribution is to change the locally sensitive hash algorithm used in HyperAttention for detecting large values and instead use k-means for indexing, further reducing the model’s perplexity. In addition, we add applications on chatglm3-6b and baichuan-7b as well as chatglm3-6b-32k which supports long contexts in practice to make up for the lack of informativeness of HyperAttention in terms of practical applications, and test results are given. The experiments reveal that although the improved algorithm has a longer and relatively stable time complexity, it significantly reduces perplexity. Notably, with 10 to 15 replacement layers, the increase in time complexity is less than the reduction in perplexity, achieving a favourable balance between performance and efficiency. Moreover, we also analyse the feasibility of other sampling methods to improve accuracy and discuss the strategy for selecting the patch layers to apply the fast algorithm.

关键词Hyper Attention K-means Transformer
DOI10.1145/3705956.3705968
URL查看来源
语种英语English
Scopus入藏号2-s2.0-85216584804
引用统计
文献类型会议论文
条目标识符https://repository.uic.edu.cn/handle/39GCC9TT/12552
专题北师香港浸会大学
通讯作者Xie, Xinyi
作者单位
1.Faculty of Science and Technology,Beijing Normal University,Hong Kong Baptist University United International College,Zhuhai,China
2.School of Information Science and Technology,Xiamen University Tan Kan Kee College,Zhangzhou,China
3.School of Computer Science and Technology,Dongguan University of Technology,Dongguan,China
4.School of Information Science and Technology,Xi'an Jiaotong University,Xi'an,China
5.Jinling College Nanjing University,Hebei,China
第一作者单位理工科技学院
通讯作者单位理工科技学院
推荐引用方式
GB/T 7714
Xie, Xinyi,Ding, Yi,Jiang, Chaoet al. Enhancing HyperAttention: A Novel Approach for Improved Algorithmic Efficiency[C]: ACM, 2024: 18-23.
条目包含的文件
条目无相关文件。
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[Xie, Xinyi]的文章
[Ding, Yi]的文章
[Jiang, Chao]的文章
百度学术
百度学术中相似的文章
[Xie, Xinyi]的文章
[Ding, Yi]的文章
[Jiang, Chao]的文章
必应学术
必应学术中相似的文章
[Xie, Xinyi]的文章
[Ding, Yi]的文章
[Jiang, Chao]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。