发表状态 | 已发表Published |
题名 | Record matching over query results from multiple web databases |
作者 | |
发表日期 | 2010-04-01 |
发表期刊 | IEEE Transactions on Knowledge and Data Engineering
![]() |
ISSN/eISSN | 1041-4347 |
卷号 | 22期号:4页码:578-589 |
摘要 | Record matching, which identifies the records that represent the same real-world entity, is an important step for data integration. Most state-of-the-art record matching methods are supervised, which requires the user to provide training data. These methods are not applicable for the Web database scenario, where the records to match are query results dynamically generated on-the-fly. Such records are query-dependent and a prelearned method using training examples from previous query results may fail on the results of a new query. To address the problem of record matching in the Web database scenario, we present an unsupervised, online record matching method, UDD, which, for a given query, can effectively identify duplicates from the query result records of multiple Web databases. After removal of the same-source duplicates, the presumed nonduplicate records from the same source can be used as training examples alleviating the burden of users having to manually label training examples. Starting from the nonduplicate set, we use two cooperating classifiers, a weighted component similarity summing classifier and an SVM classifier, to iteratively identify duplicates in the query results from multiple Web databases. Experimental results show that UDD works well for the Web database scenario where existing supervised methods do not apply. © 2010 IEEE. |
关键词 | Data deduplication Data integration Duplicate detection Query result record Record linkage Record matching SVM Web database |
DOI | 10.1109/TKDE.2009.90 |
URL | 查看来源 |
收录类别 | SCIE |
语种 | 英语English |
WOS研究方向 | Computer Science ; Engineering |
WOS类目 | Computer Science, Artificial Intelligence ; Computer Science, Information Systems ; Engineering, Electrical & Electronic |
WOS记录号 | WOS:000274654800009 |
Scopus入藏号 | 2-s2.0-77649261370 |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | https://repository.uic.edu.cn/handle/39GCC9TT/6633 |
专题 | 理工科技学院 |
通讯作者 | Su, Weifeng |
作者单位 | 1.Computer Science and Technology Program,BNU-HKBU United International College,Tangjiawan, Zhuhai,28, Jinfeng Road,China 2.Shenzhen Key Laboratory of Intelligent Media and Speech,PKU-HKUST Shenzhen Hong Kong Institution,Hong Kong 3.Department of Computer Science,City University of Hong Kong,Kowloon,Tat Chee Avenue,Hong Kong 4.Department of Computer Science and Engineering,Hong Kong University of Science and Technology,Kowloon,Clear Water Bay,Hong Kong |
第一作者单位 | 北师香港浸会大学 |
通讯作者单位 | 北师香港浸会大学 |
推荐引用方式 GB/T 7714 | Su, Weifeng,Wang, Jiying,Lochovsky, Frederick H. Record matching over query results from multiple web databases[J]. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(4): 578-589. |
APA | Su, Weifeng, Wang, Jiying, & Lochovsky, Frederick H. (2010). Record matching over query results from multiple web databases. IEEE Transactions on Knowledge and Data Engineering, 22(4), 578-589. |
MLA | Su, Weifeng,et al."Record matching over query results from multiple web databases". IEEE Transactions on Knowledge and Data Engineering 22.4(2010): 578-589. |
条目包含的文件 | 条目无相关文件。 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论