Domain Specific Features Driven Information Extraction from Web Pages of Scientific Conferences

Piotr Andruszkiewicz , Rafał Hazan

Abstract

In this paper we describe information extraction from web pages of scientific conferences. We enrich already known features with our new features specific for this domain and show their importance in the process of extracting information. Moreover, we investigate various data representation models, e.g., based on single tokens or sequences, in order to find the best configuration for the task in question and set up a new baseline over publicly available corpus.
Author Piotr Andruszkiewicz (FEIT / IN)
Piotr Andruszkiewicz,,
- The Institute of Computer Science
, Rafał Hazan (FEIT / IN)
Rafał Hazan,,
- The Institute of Computer Science
Pages405-417
Publication size in sheets0.6
Book Gelbukh Alexander (eds.): Computational Linguistics and Intelligent Text Processing. 18th International Conference, CICLing 2017, Budapest, Hungary, April 17–23, 2017, Revised Selected Papers, Part I, Lecture Notes In Computer Science, vol. 10761, 2018, Springer International Publishing, ISBN 978-3-319-77112-0, [978-3-319-77113-7], 608 p., DOI:10.1007/978-3-319-77113-7
DOIDOI:10.1007/978-3-319-77113-7_32
URL https://link.springer.com/chapter/10.1007%2F978-3-319-77113-7_32
projectDevelopment of new algorithms in the areas of software and computer architecture, artificial intelligence and information systems and computer graphics . Project leader: Arabas Jarosław, , Phone: +48 22 234 7432, start date 01-06-2017, end date 31-10-2018, II/2017/DS/1, Completed
WEiTI Działalność statutowa
Languageen angielski
File
andruszkiewicz_hazan_Domain Specific Features_cicling2017-1.pdf 222.25 KB
Score (nominal)15
ScoreMinisterial score = 15.0, BookChapterSeriesAndMatConf
Ministerial score (2013-2016) = 15.0, BookChapterSeriesAndMatConf
Citation count*
Cite
Share Share

Get link to the record


* presented citation count is obtained through Internet information analysis and it is close to the number calculated by the Publish or Perish system.
Back