A Method for Automatic Standardization of Text Attributes without Reference Data Sets
- Łukasz Ciszak
Data cleaning is an important step in information systems as low quality of data may seriously impact business processes that utilize the given system. Textual attributes are most prone to errors, especially during the data input stage. In this article, we propose a novel approach to automatic correction values of text attributes. the method combines approaches based on textual similarity with those using data distribution features. Contrary to all the methods in the area, our approach does not require third-party reference data. Experiments performed on realworld address data prove that the method may effectively clean the data with high accuracy.
- Record ID
- Cyran Krzysztof A, Krzysztof A Cyran Kozielski Stanisław, Stanisław Kozielski Peters James F James F Peters [et al.] (eds.): Man-Machine Interactions, Advances in Intelligent and Soft Computing , vol. 59, 2009, Springer-Verlag Berlin Heidelberg, Springer, 690 p., ISBN 978-3-642-00562-6
- Keywords in English
- data quality, data cleaning, attribute standardization
- DOI:10.1007/978-3-642-00563-3_51 Opening in a new tab
- http://link.springer.com/chapter/10.1007/978-3-642-00563-3_51 Opening in a new tab
- (en) English
- Score (nominal)
- Uniform Resource Identifier
* presented citation count is obtained through Internet information analysis and it is close to the number calculated by the Publish or PerishOpening in a new tab system.