Preprocessing for classification of thermograms in breast cancer detection

Łukasz Neumann , Robert Marek Nowak , Rafał Okuniewski , Witold Oleszkiewicz , Paweł Cichosz , Dariusz Jagodziński , Mateusz Matysiewicz

Abstract

Performance of binary classification of breast cancer suffers from high imbalance between classes. In this article we present the preprocessing module designed to negate the discrepancy in training examples. Preprocessing module is based on standardization, Synthetic Minority Oversampling Technique and undersampling. We show how each algorithm influences classification accuracy. Results indicate that described module improves overall Area Under Curve up to 10% on the tested dataset. Furthermore we propose other methods of dealing with imbalanced datasets in breast cancer classification.
Author Łukasz Neumann
Łukasz Neumann,,
-
, Robert Marek Nowak ISE
Robert Marek Nowak,,
- The Institute of Electronic Systems
, Rafał Okuniewski
Rafał Okuniewski,,
-
, Witold Oleszkiewicz
Witold Oleszkiewicz,,
-
, Paweł Cichosz ISE
Paweł Cichosz,,
- The Institute of Electronic Systems
, Dariusz Jagodziński ISE
Dariusz Jagodziński,,
- The Institute of Electronic Systems
, Mateusz Matysiewicz
Mateusz Matysiewicz,,
-
Pages100313A-1-100313A-8
Publication size in sheets0.5
Book Romaniuk Ryszard (eds.): Proc. SPIE. 10031, Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2016, vol. 10031, 2016, SPIE , ISBN 9781510604858, [781510604865 (electronic) ], 1170 p., DOI:10.1117/12.2257157
DOIDOI:10.1117/12.2249307
URL http://dx.doi.org/10.1117/12.2249307
Languageen angielski
File
100313A_neumann.pdf (file archived - login or check accessibility on faculty) 100313A_neumann.pdf 271.75 KB
Score (nominal)15
ScoreMinisterial score = 15.0, 27-03-2017, BookChapterMatConf
Ministerial score (2013-2016) = 15.0, 27-03-2017, BookChapterMatConf
Citation count*7 (2018-02-21)
Cite
Share Share



* presented citation count is obtained through Internet information analysis and it is close to the number calculated by the Publish or Perish system.
Back