Automatic Extraction of Profiles from Web Pages

Piotr Andruszkiewicz , Beata Nachyła

Abstract

Web pages are usually unstructured and Information Extraction from them is not trivial. In the paper we describe the process of Information Extraction on the example of researchers’ home pages. For this reason we applied SVM, CRF, and MLN models. Performed analysis concerns texts in English language only.
Author Piotr Andruszkiewicz (FEIT / IN)
Piotr Andruszkiewicz,,
- The Institute of Computer Science
, Beata Nachyła (FEIT / IN)
Beata Nachyła,,
- The Institute of Computer Science
Pages415-431
Publication size in sheets0.8
Book Bembenik Robert, Skonieczny Łukasz, Rybiński Henryk, Kryszkiewicz Marzena, Niezgódka Marek (eds.): Intelligent Tools for Building a Scientific Information Platform: Advanced Architectures and Solutions, Studies in Computational Intelligence, vol. 467, 2013, Heidelberg New York Dordrecht London, Springer-Verlag Berlin Heidelberg , ISBN 978-3-642-35646-9, [978-3-642-35647-6], 548 p., DOI:10.1007/978-3-642-35647-6
Fronf Matter.pdf / 149 KB / No licence information
Keywords in EnglishInformation Extraction, probabilistic graphical models, SVM, CRF, MLN, resercher profiling
ASJC Classification1702 Artificial Intelligence
DOIDOI:10.1007/978-3-642-35647-6_25
ProjectEstablishment of the universal, open, hosting and communication, repository platform for network resources of knowledge to be used by science, education and open knowledge society. Project leader: Kryszkiewicz Marzena, , Phone: +48 22 234 7701, start date 16-08-2010, planned end date 16-08-2013, end date 31-10-2013, WEiTI/2012/PS/1, Completed
BG PW Projects financed by NCRD [Projekty finansowane przez NCBiR (NCBR)]
Languageen angielski
Score (nominal)5
ScoreMinisterial score = 5.0, 15-06-2020, MonographChapterAuthor
Publication indicators GS Citations = 2.0; Scopus SNIP (Source Normalised Impact per Paper): 2013 = 0.54
Citation count*2 (2015-05-14)
Cite
Share Share

Get link to the record


* presented citation count is obtained through Internet information analysis and it is close to the number calculated by the Publish or Perish system.
Back
Confirmation
Are you sure?