发表状态 | 已发表Published |
题名 | Joint learning of text alignment and abstractive summarization for long documents via unbalanced optimal transport |
作者 | |
发表日期 | 2024-05-15 |
发表期刊 | Natural Language Engineering
![]() |
ISSN/eISSN | 1351-3249 |
卷号 | 30期号:3页码:525-553 |
摘要 | Recently, neural abstractive text summarization (NATS) models based on sequence-to-sequence architecture have drawn a lot of attention. Real-world texts that need to be summarized range from short news with dozens of words to long reports with thousands of words. However, most existing NATS models are not good at summarizing long documents, due to the inherent limitations of their underlying neural architectures. In this paper, we focus on the task of long document summarization (LDS). Based on the inherent section structures of source documents, we divide an abstractive LDS problem into several smaller-sized problems. In this circumstance, how to provide a less-biased target summary as the supervision for each section is vital for the model’s performance. As a preliminary, we formally describe the section-to-summary-sentence (S2SS) alignment for LDS. Based on this, we propose a novel NATS framework for the LDS task. Our framework is built based on the theory of unbalanced optimal transport (UOT), and it is named as UOTSumm. It jointly learns three targets in a unified training objective, including the optimal S2SS alignment, a section-level NATS summarizer, and the number of aligned summary sentences for each section. In this way, UOTSumm directly learns the text alignment from summarization data, without resorting to any biased tool such as ROUGE. UOTSumm can be easily adapted to most existing NATS models. And we implement two versions of UOTSumm, with and without the pretrain-finetune technique. We evaluate UOTSumm on three publicly available LDS benchmarks: PubMed, arXiv, and GovReport. UOTSumm obviously outperforms its counterparts that use ROUGE for the text alignment. When combined with UOTSumm, the performance of two vanilla NATS models improves by a large margin. Besides, UOTSumm achieves better or comparable performance when compared with some recent strong baselines. |
关键词 | Abstractive text summarization Long document summarization Optimal transport Text alignment |
DOI | 10.1017/S1351324923000177 |
URL | 查看来源 |
收录类别 | SCIE ; SSCI ; A&HCI |
语种 | 英语English |
WOS研究方向 | Computer Science ; Linguistics |
WOS类目 | Computer Science, Artificial Intelligence ; Linguistics ; Language & Linguistics |
WOS记录号 | WOS:001007722000001 |
Scopus入藏号 | 2-s2.0-85193901816 |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | https://repository.uic.edu.cn/handle/39GCC9TT/11668 |
专题 | 理工科技学院 |
通讯作者 | Shen, Xin |
作者单位 | 1.Department of System Engineering and Engineering Management,The Chinese University of Hong Kong,Hong Kong 2.Guangdong Provincial Key Laboratory of Interdisciplinary Research and Application for Data Science,BNU-HKBU United International College,Zhuhai,China 3.Department of Computer Science and Technology,Tsinghua University,Beijing,China |
推荐引用方式 GB/T 7714 | Shen, Xin,Lam, Wai,Ma, Shuminet al. Joint learning of text alignment and abstractive summarization for long documents via unbalanced optimal transport[J]. Natural Language Engineering, 2024, 30(3): 525-553. |
APA | Shen, Xin, Lam, Wai, Ma, Shumin, & Wang, Huadong. (2024). Joint learning of text alignment and abstractive summarization for long documents via unbalanced optimal transport. Natural Language Engineering, 30(3), 525-553. |
MLA | Shen, Xin,et al."Joint learning of text alignment and abstractive summarization for long documents via unbalanced optimal transport". Natural Language Engineering 30.3(2024): 525-553. |
条目包含的文件 | 条目无相关文件。 |
个性服务 |
查看访问统计 |
谷歌学术 |
谷歌学术中相似的文章 |
[Shen, Xin]的文章 |
[Lam, Wai]的文章 |
[Ma, Shumin]的文章 |
百度学术 |
百度学术中相似的文章 |
[Shen, Xin]的文章 |
[Lam, Wai]的文章 |
[Ma, Shumin]的文章 |
必应学术 |
必应学术中相似的文章 |
[Shen, Xin]的文章 |
[Lam, Wai]的文章 |
[Ma, Shumin]的文章 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论