A data structure for representing multi-version texts online

The digitisation of cultural heritage and linguistics texts has long been troubled by the problem of how to represent overlapping structures arising from different markup perspectives (‘overlapping hierarchies’) or from different versions of the same work (‘textual variation’). These two problems ca...

Full description

Saved in:
Bibliographic Details
Published inInternational journal of human-computer studies Vol. 67; no. 6; pp. 497 - 514
Main Authors Schmidt, Desmond, Colomb, Robert
Format Journal Article
LanguageEnglish
Published Oxford Elsevier Ltd 01.06.2009
Elsevier
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The digitisation of cultural heritage and linguistics texts has long been troubled by the problem of how to represent overlapping structures arising from different markup perspectives (‘overlapping hierarchies’) or from different versions of the same work (‘textual variation’). These two problems can be reduced to one by observing that every case of overlapping hierarchies is also a case of textual variation. Overlapping textual structures can be accurately modelled either as a minimally redundant directed graph, or, more practically, as an ordered list of pairs, each containing a set of versions and a fragment of text or data. This ‘pairs-list’ representation is provably equivalent to the graph representation. It can record texts consisting of thousands of versions or perspectives without becoming overloaded with data, and the most common operations on variant text, e.g. comparison between two versions, can be performed in linear time. This representation also separates variation or other overlapping structures from the document content, leading to a simplification of markup suitable for wiki-like web applications.
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ISSN:1071-5819
1095-9300
DOI:10.1016/j.ijhcs.2009.02.001