Title | An improved system for sentence-level novelty detection in textual streams |
Creator | |
Date Issued | 2015 |
Conference Name | 2015 International Conference on Smart and Sustainable City and Big Data, ICSSC 2015 |
Source Publication | IET Conference Publications
![]() |
Volume | 2015 |
Issue | CP672 |
Pages | 1-6 |
Conference Date | July 26-27, 2015 |
Conference Place | Shanghai |
Abstract | Novelty detection in news events has long been a difficult problem. A number of models performed well on specific data streams but certain issues are far from being solved, particularly in large data streams from the WWW where unpredictability of new terms requires adaptation in the vector space model. We present a novel event detection system based on the Incremental Term Frequency-Inverse Document Frequency (TF-IDF) weighting incorporated with Locality Sensitive Hashing (LSH). Our system could efficiently and effectively adapt to the changes within the data streams of any new terms with continual updates to the vector space model. Regarding miss probability, our proposed novelty detection framework outperforms a recognised baseline system by approximately 16% when evaluating a benchmark dataset from Google News. |
Keyword | Big data First story detection Locality sensitive hashing Novelty detection Text mining |
DOI | 10.2139/ssrn.2828008 |
URL | View source |
Language | 英语English |
Scopus ID | 2-s2.0-84964296808 |
Citation statistics | |
Document Type | Conference paper |
Identifier | http://repository.uic.edu.cn/handle/39GCC9TT/11008 |
Collection | Research outside affiliated institution |
Affiliation | 1.International Doctoral Innovation Centre,University of Nottingham,Ningbo,United Kingdom 2.School of Computer Science,University of Nottingham,United Kingdom |
Recommended Citation GB/T 7714 | Fu, Xinyu,Ch'ng, Eugene,Aickelin, Uweet al. An improved system for sentence-level novelty detection in textual streams[C], 2015: 1-6. |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment