H\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\imath$\end{document}LεX: A System for Semantic Information Extraction from Web Documents

Recognizing and extracting meaningful information from Web unstructured documents, taking into account their semantics, is an important problem of information and knowledge management. This paper describes H\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts}...

Full description

Saved in:
Bibliographic Details
Published inEnterprise Information Systems pp. 194 - 209
Main Authors Ruffolo, Massimo, Manna, Marco
Format Book Chapter
LanguageEnglish
Published Berlin, Heidelberg Springer Berlin Heidelberg 2008
SeriesLecture Notes in Business Information Processing
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Recognizing and extracting meaningful information from Web unstructured documents, taking into account their semantics, is an important problem of information and knowledge management. This paper describes H\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\imath$\end{document}LεX, a system implementing a novel logic-based approach to information extraction from unstructured documents. The approach adopted in the H\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\imath$\end{document}LεX system is founded on a new two-dimensional representation of documents, and heavily exploits DLP +  - an extension of disjunctive logic programming for ontology representation and reasoning, which has been recently implemented on top of the DLV reasoning environment. Unlike previous systems, which are mainly syntactic, H\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\imath$\end{document}LεX combines both semantic and syntactic knowledge for a powerful information extraction. Ontologies, representing the semantics of information to be extracted, are encoded in DLP + , while the extraction patterns are expressed using regular expressions and an ad hoc two-dimensional grammar. The execution of DLP +  reasoning modules, encoding the grammar expressions, yields the actual extraction of information from the input document. H\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\imath$\end{document}LεX allows the semantic information extraction from both HTML pages and flat text documents by using synthetic and very expressive extraction patterns.
ISBN:3540775803
9783540775805
ISSN:1865-1348
1865-1356
DOI:10.1007/978-3-540-77581-2_13