Fast Discovery of Generalized Sequential Patterns
Marzena Kryszkiewicz , Łukasz Skonieczny
AbstractKnowledge in the form of generalized sequential patterns finds many applications. In this paper, we focus on optimizing GSP, which is a well-known algorithm for discovering such patterns. Our optimization consists in more selective identification of nodes to be visited while traversing a hash tree with candidates for generalized sequential patterns. It is based on the fact that elements of candidate sequences are stored as ordered sets of items. In order to reduce the number of visited nodes in the hash tree, we also propose to use not only parameters windowSize and maxGap as in original GSP, but also parameter minGap. As a result of our optimization, the number of candidates that require final time-consuming verification may be considerably decreased. In the experiments we have carried out, our optimized variant of GSP was several times faster than standard GSP.
|Publication size in sheets||0.75|
|Book||Bembenik Robert, Skonieczny Łukasz, Protaziuk Grzegorz M., Kryszkiewicz Marzena, Rybiński Henryk (eds.): Intelligent Methods and Big Data in Industrial Applications, Studies in Big Data, vol. 40, 2019, Springer International Publishing, ISBN 978-3-319-77603-3, [978-3-319-77604-0], 376 p., DOI:10.1007/978-3-319-77604-0|
|Keywords in English||data mining, sequential patterns, generalized sequential patterns, GSP|
|project||Development of new algorithms in the areas of software and computer architecture, artificial intelligence and information systems and computer graphics . Project leader: Arabas Jarosław,
, Phone: +48 22 234 7432, start date 01-06-2017, end date 31-10-2018, II/2017/DS/1, Completed
|Score|| = 15.0, BookChapterSeriesAndMatConf|
= 15.0, BookChapterSeriesAndMatConf
* presented citation count is obtained through Internet information analysis and it is close to the number calculated by the Publish or Perish system.