科研成果详情

发表状态已发表Published
题名Record matching over query results from multiple web databases
作者
发表日期2010-04-01
发表期刊IEEE Transactions on Knowledge and Data Engineering
ISSN/eISSN1041-4347
卷号22期号:4页码:578-589
摘要

Record matching, which identifies the records that represent the same real-world entity, is an important step for data integration. Most state-of-the-art record matching methods are supervised, which requires the user to provide training data. These methods are not applicable for the Web database scenario, where the records to match are query results dynamically generated on-the-fly. Such records are query-dependent and a prelearned method using training examples from previous query results may fail on the results of a new query. To address the problem of record matching in the Web database scenario, we present an unsupervised, online record matching method, UDD, which, for a given query, can effectively identify duplicates from the query result records of multiple Web databases. After removal of the same-source duplicates, the presumed nonduplicate records from the same source can be used as training examples alleviating the burden of users having to manually label training examples. Starting from the nonduplicate set, we use two cooperating classifiers, a weighted component similarity summing classifier and an SVM classifier, to iteratively identify duplicates in the query results from multiple Web databases. Experimental results show that UDD works well for the Web database scenario where existing supervised methods do not apply. © 2010 IEEE.

关键词Data deduplication Data integration Duplicate detection Query result record Record linkage Record matching SVM Web database
DOI10.1109/TKDE.2009.90
URL查看来源
收录类别SCIE
语种英语English
WOS研究方向Computer Science ; Engineering
WOS类目Computer Science, Artificial Intelligence ; Computer Science, Information Systems ; Engineering, Electrical & Electronic
WOS记录号WOS:000274654800009
Scopus入藏号2-s2.0-77649261370
引用统计
被引频次:17[WOS]   [WOS记录]     [WOS相关记录]
文献类型期刊论文
条目标识符https://repository.uic.edu.cn/handle/39GCC9TT/6633
专题理工科技学院
通讯作者Su, Weifeng
作者单位
1.Computer Science and Technology Program,BNU-HKBU United International College,Tangjiawan, Zhuhai,28, Jinfeng Road,China
2.Shenzhen Key Laboratory of Intelligent Media and Speech,PKU-HKUST Shenzhen Hong Kong Institution,Hong Kong
3.Department of Computer Science,City University of Hong Kong,Kowloon,Tat Chee Avenue,Hong Kong
4.Department of Computer Science and Engineering,Hong Kong University of Science and Technology,Kowloon,Clear Water Bay,Hong Kong
第一作者单位北师香港浸会大学
通讯作者单位北师香港浸会大学
推荐引用方式
GB/T 7714
Su, Weifeng,Wang, Jiying,Lochovsky, Frederick H. Record matching over query results from multiple web databases[J]. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(4): 578-589.
APA Su, Weifeng, Wang, Jiying, & Lochovsky, Frederick H. (2010). Record matching over query results from multiple web databases. IEEE Transactions on Knowledge and Data Engineering, 22(4), 578-589.
MLA Su, Weifeng,et al."Record matching over query results from multiple web databases". IEEE Transactions on Knowledge and Data Engineering 22.4(2010): 578-589.
条目包含的文件
条目无相关文件。
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[Su, Weifeng]的文章
[Wang, Jiying]的文章
[Lochovsky, Frederick H.]的文章
百度学术
百度学术中相似的文章
[Su, Weifeng]的文章
[Wang, Jiying]的文章
[Lochovsky, Frederick H.]的文章
必应学术
必应学术中相似的文章
[Su, Weifeng]的文章
[Wang, Jiying]的文章
[Lochovsky, Frederick H.]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。