题名 | Enhancing HyperAttention: A Novel Approach for Improved Algorithmic Efficiency |
作者 | |
发表日期 | 2024-12-28 |
会议名称 | HPCCT 2024: 2024 8th High Performance Computing and Cluster Technologies Conference (HPCCT) |
会议录名称 | HPCCT '24: Proceedings of the 2024 8th High Performance Computing and Cluster Technologies Conference
![]() |
ISBN | 9798400716881 |
页码 | 18-23 |
会议日期 | July 5 - 7, 2024 |
会议地点 | Beijing China |
出版者 | ACM |
摘要 | HyperAttention is a new attention mechanism proposed to address the inevitability of quadratic time complexity and the conditionality of linear time when dealing with extended contexts, which allows the algorithm to maintain the computational complexity in linear time despite having large entries or a large stability rank in the matrix. HyperAttention improves the speed of inference and training of the large language model, with only a slight performance degradation and a significant speedup compared to FlashAttention. In order to further improve the accuracy of HyperAttention while maintaining its speed, we optimised each module for the original algorithm to improve its efficiency. Our main contribution is to change the locally sensitive hash algorithm used in HyperAttention for detecting large values and instead use k-means for indexing, further reducing the model’s perplexity. In addition, we add applications on chatglm3-6b and baichuan-7b as well as chatglm3-6b-32k which supports long contexts in practice to make up for the lack of informativeness of HyperAttention in terms of practical applications, and test results are given. The experiments reveal that although the improved algorithm has a longer and relatively stable time complexity, it significantly reduces perplexity. Notably, with 10 to 15 replacement layers, the increase in time complexity is less than the reduction in perplexity, achieving a favourable balance between performance and efficiency. Moreover, we also analyse the feasibility of other sampling methods to improve accuracy and discuss the strategy for selecting the patch layers to apply the fast algorithm. |
关键词 | Hyper Attention K-means Transformer |
DOI | 10.1145/3705956.3705968 |
URL | 查看来源 |
语种 | 英语English |
Scopus入藏号 | 2-s2.0-85216584804 |
引用统计 | |
文献类型 | 会议论文 |
条目标识符 | https://repository.uic.edu.cn/handle/39GCC9TT/12552 |
专题 | 北师香港浸会大学 |
通讯作者 | Xie, Xinyi |
作者单位 | 1.Faculty of Science and Technology,Beijing Normal University,Hong Kong Baptist University United International College,Zhuhai,China 2.School of Information Science and Technology,Xiamen University Tan Kan Kee College,Zhangzhou,China 3.School of Computer Science and Technology,Dongguan University of Technology,Dongguan,China 4.School of Information Science and Technology,Xi'an Jiaotong University,Xi'an,China 5.Jinling College Nanjing University,Hebei,China |
第一作者单位 | 理工科技学院 |
通讯作者单位 | 理工科技学院 |
推荐引用方式 GB/T 7714 | Xie, Xinyi,Ding, Yi,Jiang, Chaoet al. Enhancing HyperAttention: A Novel Approach for Improved Algorithmic Efficiency[C]: ACM, 2024: 18-23. |
条目包含的文件 | 条目无相关文件。 |
个性服务 |
查看访问统计 |
谷歌学术 |
谷歌学术中相似的文章 |
[Xie, Xinyi]的文章 |
[Ding, Yi]的文章 |
[Jiang, Chao]的文章 |
百度学术 |
百度学术中相似的文章 |
[Xie, Xinyi]的文章 |
[Ding, Yi]的文章 |
[Jiang, Chao]的文章 |
必应学术 |
必应学术中相似的文章 |
[Xie, Xinyi]的文章 |
[Ding, Yi]的文章 |
[Jiang, Chao]的文章 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论