Fast Discovery of Generalized Sequential Patterns

Marzena Kryszkiewicz , Łukasz Skonieczny


Knowledge in the form of generalized sequential patterns finds many applications. In this paper, we focus on optimizing GSP, which is a well-known algorithm for discovering such patterns. Our optimization consists in more selective identification of nodes to be visited while traversing a hash tree with candidates for generalized sequential patterns. It is based on the fact that elements of candidate sequences are stored as ordered sets of items. In order to reduce the number of visited nodes in the hash tree, we also propose to use not only parameters windowSize and maxGap as in original GSP, but also parameter minGap. As a result of our optimization, the number of candidates that require final time-consuming verification may be considerably decreased. In the experiments we have carried out, our optimized variant of GSP was several times faster than standard GSP.
Author Marzena Kryszkiewicz (FEIT / IN)
Marzena Kryszkiewicz,,
- The Institute of Computer Science
, Łukasz Skonieczny (FEIT / IN)
Łukasz Skonieczny,,
- The Institute of Computer Science
Publication size in sheets0.75
Book Bembenik Robert, Skonieczny Łukasz, Protaziuk Grzegorz M., Kryszkiewicz Marzena, Rybiński Henryk (eds.): Intelligent Methods and Big Data in Industrial Applications, Studies in Big Data, vol. 40, 2019, Springer, ISBN 978-3-319-77603-3, [978-3-319-77604-0], 376 p., DOI:10.1007/978-3-319-77604-0
Keywords in Englishdata mining, sequential patterns, generalized sequential patterns, GSP
ProjectDevelopment of new algorithms in the areas of software and computer architecture, artificial intelligence and information systems and computer graphics . Project leader: Arabas Jarosław, , Phone: +48 22 234 7432, start date 01-06-2017, end date 31-10-2018, II/2017/DS/1, Completed
WEiTI Działalność statutowa
Languageen angielski
20170138.pdf 615.1 KB
Score (nominal)20
Score sourcepublisherList
ScoreMinisterial score = 20.0, 02-02-2020, ChapterFromConference
Citation count*
Share Share

Get link to the record

* presented citation count is obtained through Internet information analysis and it is close to the number calculated by the Publish or Perish system.
Are you sure?