A data structure for representing multi-version texts online
The digitisation of cultural heritage and linguistics texts has long been troubled by the problem of how to represent overlapping structures arising from different markup perspectives (‘overlapping hierarchies’) or from different versions of the same work (‘textual variation’). These two problems ca...
Saved in:
Published in | International journal of human-computer studies Vol. 67; no. 6; pp. 497 - 514 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Oxford
Elsevier Ltd
01.06.2009
Elsevier |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The digitisation of cultural heritage and linguistics texts has long been troubled by the problem of how to represent overlapping structures arising from different markup perspectives (‘overlapping hierarchies’) or from different versions of the same work (‘textual variation’). These two problems can be reduced to one by observing that every case of overlapping hierarchies is also a case of textual variation. Overlapping textual structures can be accurately modelled either as a minimally redundant directed graph, or, more practically, as an ordered list of pairs, each containing a set of versions and a fragment of text or data. This ‘pairs-list’ representation is provably equivalent to the graph representation. It can record texts consisting of thousands of versions or perspectives without becoming overloaded with data, and the most common operations on variant text, e.g. comparison between two versions, can be performed in linear time. This representation also separates variation or other overlapping structures from the document content, leading to a simplification of markup suitable for wiki-like web applications. |
---|---|
Bibliography: | ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23 |
ISSN: | 1071-5819 1095-9300 |
DOI: | 10.1016/j.ijhcs.2009.02.001 |