Details of Research Outputs

Status已发表Published
TitleRecord matching over query results from multiple web databases
Creator
Date Issued2010-04-01
Source PublicationIEEE Transactions on Knowledge and Data Engineering
ISSN1041-4347
Volume22Issue:4Pages:578-589
Abstract

Record matching, which identifies the records that represent the same real-world entity, is an important step for data integration. Most state-of-the-art record matching methods are supervised, which requires the user to provide training data. These methods are not applicable for the Web database scenario, where the records to match are query results dynamically generated on-the-fly. Such records are query-dependent and a prelearned method using training examples from previous query results may fail on the results of a new query. To address the problem of record matching in the Web database scenario, we present an unsupervised, online record matching method, UDD, which, for a given query, can effectively identify duplicates from the query result records of multiple Web databases. After removal of the same-source duplicates, the presumed nonduplicate records from the same source can be used as training examples alleviating the burden of users having to manually label training examples. Starting from the nonduplicate set, we use two cooperating classifiers, a weighted component similarity summing classifier and an SVM classifier, to iteratively identify duplicates in the query results from multiple Web databases. Experimental results show that UDD works well for the Web database scenario where existing supervised methods do not apply. © 2010 IEEE.

KeywordData deduplication Data integration Duplicate detection Query result record Record linkage Record matching SVM Web database
DOI10.1109/TKDE.2009.90
URLView source
Indexed BySCIE
Language英语English
WOS Research AreaComputer Science ; Engineering
WOS SubjectComputer Science, Artificial Intelligence ; Computer Science, Information Systems ; Engineering, Electrical & Electronic
WOS IDWOS:000274654800009
Scopus ID2-s2.0-77649261370
Citation statistics
Cited Times:17[WOS]   [WOS Record]     [Related Records in WOS]
Document TypeJournal article
Identifierhttp://repository.uic.edu.cn/handle/39GCC9TT/6633
CollectionFaculty of Science and Technology
Corresponding AuthorSu, Weifeng
Affiliation
1.Computer Science and Technology Program,BNU-HKBU United International College,Tangjiawan, Zhuhai,28, Jinfeng Road,China
2.Shenzhen Key Laboratory of Intelligent Media and Speech,PKU-HKUST Shenzhen Hong Kong Institution,Hong Kong
3.Department of Computer Science,City University of Hong Kong,Kowloon,Tat Chee Avenue,Hong Kong
4.Department of Computer Science and Engineering,Hong Kong University of Science and Technology,Kowloon,Clear Water Bay,Hong Kong
First Author AffilicationBeijing Normal-Hong Kong Baptist University
Corresponding Author AffilicationBeijing Normal-Hong Kong Baptist University
Recommended Citation
GB/T 7714
Su, Weifeng,Wang, Jiying,Lochovsky, Frederick H. Record matching over query results from multiple web databases[J]. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(4): 578-589.
APA Su, Weifeng, Wang, Jiying, & Lochovsky, Frederick H. (2010). Record matching over query results from multiple web databases. IEEE Transactions on Knowledge and Data Engineering, 22(4), 578-589.
MLA Su, Weifeng,et al."Record matching over query results from multiple web databases". IEEE Transactions on Knowledge and Data Engineering 22.4(2010): 578-589.
Files in This Item:
There are no files associated with this item.
Related Services
Usage statistics
Google Scholar
Similar articles in Google Scholar
[Su, Weifeng]'s Articles
[Wang, Jiying]'s Articles
[Lochovsky, Frederick H.]'s Articles
Baidu academic
Similar articles in Baidu academic
[Su, Weifeng]'s Articles
[Wang, Jiying]'s Articles
[Lochovsky, Frederick H.]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Su, Weifeng]'s Articles
[Wang, Jiying]'s Articles
[Lochovsky, Frederick H.]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.