Modeling, encoding and querying multi-structured documents

The issue of multi-structured documents became prominent with the emergence of the digital Humanities field of practices. Many distinct structures may be defined simultaneously on the same original content for matching different documentary tasks. For example, a document may have both a structure fo...

Full description

Saved in:
Bibliographic Details
Published inInformation processing & management Vol. 48; no. 5; pp. 931 - 955
Main Authors Portier, Pierre-Édouard, Chatti, Noureddine, Calabretto, Sylvie, Egyed-Zsigmond, Elöd, Pinon, Jean-Marie
Format Journal Article
LanguageEnglish
Published Kidlington Elsevier Ltd 01.09.2012
Elsevier
Elsevier Science Ltd
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The issue of multi-structured documents became prominent with the emergence of the digital Humanities field of practices. Many distinct structures may be defined simultaneously on the same original content for matching different documentary tasks. For example, a document may have both a structure for the logical organization of content (logical structure), and a structure expressing a set of content formatting rules (physical structure). In this paper, we present MSDM, a generic model for multi-structured documents, in which several important features are established. We also address the problem of efficiently encoding multi-structured documents by introducing MultiX, a new XML formalism based on the MSDM model. Finally, we propose a library of Xquery functions for querying MultiX documents. We will illustrate all the contributions with a use case based on a fragment of an old manuscript.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ObjectType-Article-2
ObjectType-Feature-1
ISSN:0306-4573
1873-5371
DOI:10.1016/j.ipm.2011.11.004