Clustering for data matching

Authors: Apeh, E.T.

Journal: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Volume: 4251 LNAI - I

Pages: 1216-1225

eISSN: 1611-3349

ISBN: 9783540465355

ISSN: 0302-9743

DOI: 10.1007/11892960_146

Abstract:

The problem of matching data has as one of its major bottlenecks the rapid deterioration in performance of time and accuracy, as the amount of data to be processed increases. One reason for this deterioration in performance is the cost incurred by data matching systems when comparing data records to determine their similarity (or dissimilarity). Approaches such as blocking and concatenation of data attributes have been used to minimize the comparison cost. In this paper, we analyse and present Keyword and Digram clustering as alternatives for enhancing the performance of data matching systems. We compare the performance of these clustering techniques in terms of potential savings in performing comparisons and their accuracy in correctly clustering similar data. Our results on a sampled London Stock Exchange listed companies database show that using the clustering techniques can lead to improved accuracy as well as time savings in data matching systems. © Springer-Verlag Berlin Heidelberg 2006.

Source: Scopus

Clustering for data matching

Authors: Apeh, E.T. and Gabrys, B.

Journal: KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 1, PROCEEDINGS

Volume: 4251

Pages: 1216-1225

eISSN: 1611-3349

ISSN: 0302-9743

Source: Web of Science (Lite)

Clustering for Data Matching

Authors: Apeh, E. and Gabrys, B.

Editors: Howlett, R.J. and Jain, L.C.

Volume: 1

Pages: 1216-1225

Publisher: Springer

Place of Publication: Berlin

DOI: 10.1007/11892960_146

Abstract:

The problem of matching data has as one of its major bottlenecks the rapid deterioration in performance of time and accuracy, as the amount of data to be processed increases. One reason for this deterioration in performance is the cost incurred by data matching systems when comparing data records to determine their similarity (or dissimilarity). Approaches such as blocking and concatenation of data attributes have been used to minimize the comparison cost. In this paper, we analyse and present Keyword and Digram clustering as alternatives for enhancing the performance of data matching systems. We compare the performance of these clustering techniques in terms of potential savings in performing comparisons and their accuracy in correctly clustering similar data. Our results on a sampled London Stock Exchange listed companies database show that using the clustering techniques can lead to improved accuracy as well as time savings in data matching systems.

http://www.springerlink.com/content/dwht56u3431u1505/?p=60bb72cc550a4b8fb747103f79a860b9&pi=0

Source: Manual

Clustering for Data Matching.

Authors: Apeh, E.T. and Gabrys, B.

Editors: Howlett, R.J. and Jain, L.C.

Journal: KES (1)

Volume: 4251

Pages: 1216-1225

Publisher: Springer

https://doi.org/10.1007/11892960

Source: DBLP