Improving accessibility of digitization outputs: EODOPEN project research findings

Purpose The current predominant delivery format resulting from digitization is PDF, which is not appropriate for the blind, partially sighted and people who read on mobile devices. To meet the needs of both communities, as well as broader ones, alternative file formats are required. With the finding...

Full description

Saved in:
Bibliographic Details
Published inDigital library perspectives Vol. 40; no. 2; pp. 187 - 211
Main Authors Kavčič Čolić, Alenka, Hari, Andreja
Format Journal Article
LanguageEnglish
Published Emerald Publishing Limited 14.05.2024
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Purpose The current predominant delivery format resulting from digitization is PDF, which is not appropriate for the blind, partially sighted and people who read on mobile devices. To meet the needs of both communities, as well as broader ones, alternative file formats are required. With the findings of the eBooks-On-Demand-Network Opening Publications for European Netizens project research, this study aims to improve access to digitized content for these communities. Design/methodology/approach In 2022, the authors conducted research on the digitization experiences of 13 EODOPEN partners at their organizations. The authors distributed the same sample of scans in English with different characteristics, and in accordance with Web content accessibility guidelines, the authors created 24 criteria to analyze their digitization workflows, output formats and optical character recognition (OCR) quality. Findings In this contribution, the authors present the results of a trial implementation among EODOPEN partners regarding their digitization workflows, used delivery file formats and the resulting quality of OCR results, depending on the type of digitization output file format. It was shown that partners using the OCR tool ABBYY FineReader Professional and producing scanning outputs in tagged PDF and PDF/UA formats achieved better results according to set criteria. Research limitations/implications The trial implementations were limited to 13 project partners’ organizations only. Originality/value This research paper can be a valuable contribution to the field of massive digitization practices, particularly in terms of improving the accessibility of the output delivery file formats.
AbstractList Purpose The current predominant delivery format resulting from digitization is PDF, which is not appropriate for the blind, partially sighted and people who read on mobile devices. To meet the needs of both communities, as well as broader ones, alternative file formats are required. With the findings of the eBooks-On-Demand-Network Opening Publications for European Netizens project research, this study aims to improve access to digitized content for these communities. Design/methodology/approach In 2022, the authors conducted research on the digitization experiences of 13 EODOPEN partners at their organizations. The authors distributed the same sample of scans in English with different characteristics, and in accordance with Web content accessibility guidelines, the authors created 24 criteria to analyze their digitization workflows, output formats and optical character recognition (OCR) quality. Findings In this contribution, the authors present the results of a trial implementation among EODOPEN partners regarding their digitization workflows, used delivery file formats and the resulting quality of OCR results, depending on the type of digitization output file format. It was shown that partners using the OCR tool ABBYY FineReader Professional and producing scanning outputs in tagged PDF and PDF/UA formats achieved better results according to set criteria. Research limitations/implications The trial implementations were limited to 13 project partners’ organizations only. Originality/value This research paper can be a valuable contribution to the field of massive digitization practices, particularly in terms of improving the accessibility of the output delivery file formats.
Author Hari, Andreja
Kavčič Čolić, Alenka
Author_xml – sequence: 1
  givenname: Alenka
  surname: Kavčič Čolić
  fullname: Kavčič Čolić, Alenka
  email: alenka.kavcic@nuk.uni-lj.si
– sequence: 2
  givenname: Andreja
  surname: Hari
  fullname: Hari, Andreja
  email: andreja.hari@nuk.uni-lj.si
BookMark eNptkMFKAzEURYNUsNbuXeYHYl8mTSZxJ23VwmCL6DpkMpma0s6UJBXq1ztDXSgID-7bnMvlXKNB0zYOoVsKd5SCnMyLNQFFMsgYAZBwgYYZcEW4pGLw679C4xi3AEBzwbiYDtHrcn8I7advNthY62L0pd_5dMJtjSu_8cl_meTbBrfHdDimeI8Xq_lqvXjBHbZ1NuHgojPBfuDaN1XXE2_QZW120Y1_coTeHxdvs2dSrJ6Ws4eCWAYiEVvyjBqV10IpPlWiEowZkDWXwjAuu4OS2UxwnoGhrMoNk91ESlUlLTclGyE499rQxhhcrQ_B7004aQq616I7LRqU7rXoXkuHTM6I27tgdtV_xB-R7BscL2Uh
Cites_doi 10.1109/ICDAR.2019.00255
10.1109/ICDAR.2019.00166
10.1145/2232817.2232836
10.1109/BigData52589.2021.9671586
10.1109/ICCIT48885.2019.9038593
10.1007/s10209-015-0438-8
10.1109/ICTIIA54654.2022.9935961
10.1117/12.893909
10.1109/ICDAR.2019.00243
10.1177/0961000616679880
10.1108/07378830310467445
10.18352/lq.10322
ContentType Journal Article
Copyright Alenka Kavčič Čolić and Andreja Hari.
Copyright_xml – notice: Alenka Kavčič Čolić and Andreja Hari.
DBID XDTOA
AAYXX
CITATION
DOI 10.1108/DLP-09-2023-0080
DatabaseName Emerald Open Access Journals
CrossRef
DatabaseTitle CrossRef
DatabaseTitleList CrossRef

Database_xml – sequence: 1
  dbid: XDTOA
  name: Emerald Open Access Journals
  url: https://www.emerald.com/insight
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISSN 2059-5816
EndPage 211
ExternalDocumentID 10_1108_DLP_09_2023_0080
10.1108/DLP-09-2023-0080
GroupedDBID .X0
3FY
5VS
9F-
AAMCF
AAUDR
ABIJV
ABSDC
ABYQI
ADOMW
AJEBP
ALMA_UNASSIGNED_HOLDINGS
ALSLI
AODMV
ARAPS
AUCOK
BENPR
BVLZF
EBS
FNNZZ
GEI
GMN
GQ.
H13
HCIFZ
KLENG
M1O
M2O
M42
TGG
TMF
TMT
XDTOA
Z11
Z12
Z21
AAYXX
CITATION
ID FETCH-LOGICAL-c306t-cb521a97f6995496d633a08f586a3583580b3c265520a13d7a38acc119d8c5ab3
IEDL.DBID XDTOA
ISSN 2059-5816
IngestDate Fri Aug 23 02:03:12 EDT 2024
Thu Aug 22 06:15:05 EDT 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 2
Keywords Digitization
Blind and partially sighted
Accessibility
Mobile technology users
Language English
License Published in Asia Pacific Journal of Innovation and Entrepreneurship. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c306t-cb521a97f6995496d633a08f586a3583580b3c265520a13d7a38acc119d8c5ab3
OpenAccessLink https://www.emerald.com/insight/content/doi/10.1108/DLP-09-2023-0080/full/html
PageCount 25
ParticipantIDs crossref_primary_10_1108_DLP_09_2023_0080
emerald_primary_10_1108_DLP-09-2023-0080
PublicationCentury 2000
PublicationDate 2024-05-14
PublicationDateYYYYMMDD 2024-05-14
PublicationDate_xml – month: 05
  year: 2024
  text: 2024-05-14
  day: 14
PublicationDecade 2020
PublicationTitle Digital library perspectives
PublicationYear 2024
Publisher Emerald Publishing Limited
Publisher_xml – name: Emerald Publishing Limited
References (key2024051304324865200_ref020) 2017; 16
(key2024051304324865200_ref010) 2019
(key2024051304324865200_ref001) 2023
key2024051304324865200_ref030
key2024051304324865200_ref011
key2024051304324865200_ref032
key2024051304324865200_ref016
(key2024051304324865200_ref033) 2020
(key2024051304324865200_ref008) 2020
key2024051304324865200_ref019
(key2024051304324865200_ref017) 2019
(key2024051304324865200_ref015) 2020; 30
(key2024051304324865200_ref031) 2003; 21
(key2024051304324865200_ref002) 2022
(key2024051304324865200_ref012) 2021
(key2024051304324865200_ref006) 2011
(key2024051304324865200_ref003) 2019; 51
(key2024051304324865200_ref018) 2017; 37
(key2024051304324865200_ref013) 2012
key2024051304324865200_ref022
key2024051304324865200_ref023
(key2024051304324865200_ref029) 2021
key2024051304324865200_ref021
key2024051304324865200_ref004
key2024051304324865200_ref026
key2024051304324865200_ref005
key2024051304324865200_ref027
key2024051304324865200_ref024
(key2024051304324865200_ref014) 2023
key2024051304324865200_ref025
key2024051304324865200_ref009
key2024051304324865200_ref028
key2024051304324865200_ref007
References_xml – ident: key2024051304324865200_ref021
  doi: 10.1109/ICDAR.2019.00255
– ident: key2024051304324865200_ref028
  doi: 10.1109/ICDAR.2019.00166
– start-page: 91
  volume-title: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries
  year: 2012
  ident: key2024051304324865200_ref013
  article-title: Transforming Japanese archives into accessible digital books
  doi: 10.1145/2232817.2232836
– start-page: 2224
  volume-title: 2021 IEEE International Conference on Big Data (Big Data)
  year: 2021
  ident: key2024051304324865200_ref012
  article-title: Using transfer learning to contextually optimize optical character recognition (OCR) output and perform new feature extraction on a digitized cultural and historical dataset
  doi: 10.1109/BigData52589.2021.9671586
– ident: key2024051304324865200_ref027
– ident: key2024051304324865200_ref025
– ident: key2024051304324865200_ref023
– ident: key2024051304324865200_ref004
– volume-title: 2019 22nd International Conference on Computer and Information Technology (ICCIT)
  year: 2019
  ident: key2024051304324865200_ref017
  article-title: Offline optical character recognition (OCR) method: an effective method for scanned documents
  doi: 10.1109/ICCIT48885.2019.9038593
– volume: 16
  start-page: 29
  issue: 1
  year: 2017
  ident: key2024051304324865200_ref020
  article-title: How to present more readable text for people with dyslexia
  publication-title: Universal Access in the Information Society
  doi: 10.1007/s10209-015-0438-8
– ident: key2024051304324865200_ref019
– year: 2020
  ident: key2024051304324865200_ref033
  article-title: Chargid-OCR: End-to-end Trainable Optical Character Recognition for Printed Documents using Instance Segmentation
– volume-title: 1st International Conference on Technology Innovation and Its Applications (ICTIIA)
  year: 2022
  ident: key2024051304324865200_ref002
  article-title: Optical character recognition (OCR) for text recognition and its post-processing method: a literature review
  doi: 10.1109/ICTIIA54654.2022.9935961
– volume: 37
  start-page: 456
  issue: 3
  year: 2017
  ident: key2024051304324865200_ref018
  article-title: Blind academic library users’ experiences with obtaining full text and accessible full text of books and articles in the USA. A qualitative study
  publication-title: Library Hi Tech
– year: 2021
  ident: key2024051304324865200_ref029
  article-title: 1st place solution for ICDAR 2021 competition on mathematical formula detection
– start-page: 813507
  volume-title: Proceedings of SPIE – The International Society for Optical Engineering
  year: 2011
  ident: key2024051304324865200_ref006
  article-title: Page layout analysis and classification for complex scanned documents
  doi: 10.1117/12.893909
– ident: key2024051304324865200_ref011
– ident: key2024051304324865200_ref032
– year: 2019
  ident: key2024051304324865200_ref010
  article-title: ICDAR 2019 competition on table detection and recognition (cTDaR)
  doi: 10.1109/ICDAR.2019.00243
– ident: key2024051304324865200_ref030
– year: 2023
  ident: key2024051304324865200_ref001
  article-title: ICDAR 2023 competition on robust layout segmentation in corporate documents
– volume: 51
  start-page: 162
  issue: 1
  year: 2019
  ident: key2024051304324865200_ref003
  article-title: Information access preferences and behaviour of blind foundation library clients
  publication-title: Journal of Librarianship and Information Science
  doi: 10.1177/0961000616679880
– ident: key2024051304324865200_ref009
– ident: key2024051304324865200_ref026
– start-page: 141
  volume-title: Progress in Computer Recognition Systems. Advances in Intelligent Systems and Computing
  year: 2020
  ident: key2024051304324865200_ref008
  article-title: Segmentation of scanned documents using Deep-Learning approach
– ident: key2024051304324865200_ref024
– ident: key2024051304324865200_ref007
– ident: key2024051304324865200_ref005
– ident: key2024051304324865200_ref022
– volume: 21
  start-page: 102
  issue: 1
  year: 2003
  ident: key2024051304324865200_ref031
  article-title: Electronic reserves: the promise and challenge to increase accessibility
  publication-title: Library Hi Tech
  doi: 10.1108/07378830310467445
– start-page: 1
  volume-title: 2023 3rd International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME)
  year: 2023
  ident: key2024051304324865200_ref014
  article-title: OCR quality: Key to enhanced data mining
– volume: 30
  start-page: 1
  issue: 1
  year: 2020
  ident: key2024051304324865200_ref015
  article-title: Ground truth OCR sample data of Finnish historical newspapers and journals in data improvement validation of a re-OCRing process
  publication-title: LIBER Quarterly
  doi: 10.18352/lq.10322
– ident: key2024051304324865200_ref016
SSID ssj0001763564
Score 2.310058
Snippet Purpose The current predominant delivery format resulting from digitization is PDF, which is not appropriate for the blind, partially sighted and people who...
SourceID crossref
emerald
SourceType Aggregation Database
Publisher
StartPage 187
Title Improving accessibility of digitization outputs: EODOPEN project research findings
URI https://www.emerald.com/insight/content/doi/10.1108/DLP-09-2023-0080/full/html
Volume 40
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFG4QLl6MRo34g_TgQQ8LW7t2nTcjEGIUiIGE29KuG5DoIDIO_vf2dYVIwsFbD03TfE36fnzvfQ-h-1hGhFLGPSJECCPMlKeICVwzCvLmVKdEQ7_z-4D3J-HrlE1raLDthbFllVU6xv7Ti2INQWobCrfNL7wTHIDpNZ23ETD3MADcA8-nDRnr9rz8-jxCDWJiO1JHjWln7IIsm3WxemxANRPjV3hMBDvu8sCBe7Zq27D7x_z0TtGJ8xvxc_XQZ6iWFefoY5cSwNIOPqxKXX_wMsd6MVuUrskSLzflalOun3B32BmOugPsEjDYaf3MseWui9n6Ak163fFL33NDErzUePullypjgGUc5RyU3WKuOaXSFzkTXFImgOVUNCWcMeLLgOpIUmGuFASxFimTil6ierEssiuEZaCNcVeC5DoLBYklz6mkUgUSNNjzuIket3Akq0oLI7ExhC8SA13ixwlAlwB0TfTg8Dq0dQ_l6_9vvUHHZh0Cex-Et6hefm-yO-MUlKpl27xb7r1_AZsurs8
link.rule.ids 315,786,790,974,11662,27882,27955,27956,52722,52724
linkProvider Emerald
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV05T8MwFLZKGWDhECDK6YEBhrSNHScOG6KtCrRphVqJLbJztBWQVDQZ4Nfjl6OiqAMDa2RFzhfL73vX9xC6soVFKGWmRjg3YISZ1CRRjmtAQd6c-h7xod-575jdsfH4wl4qyCl7YbKyyjwck93Ts2gBTmoDCrfVLbwUHIDpNa3eEDL3MABcA-bTgIh1Y5q8v22gTUYsQsrm3zLmkqmxQaKZKFahMa4vM5drXrdiqcp23R_Gp7OL4nLbec3Jaz1NZN37-qXo-H_ftYd2Cp6K7_KDtY8qQXSAnpchCCyyQYt5ae0njkPszyazpGjqxHGazNNkcYvbg9Zg2HZwEfDBhbbQFGe58miyOETjTnt039WKoQyap7yLRPOkMvjCtkITlORs0zcpFU0eMm4KyjhkVSX1iMkYaQqd-pagXG1J122fe0xIeoSqURwFxwgL3VdkQnIS-oHBiS3MkAoqpC5A8z20a-im_AHuPNfecDOfpcldhZHbtF3AyAWMaui6gHfd0hU4T_6-9BJtdUf9ntt7cJ5O0bZ6bkDlgG6coWrykQbnipAk8iI7Zd8J2tVb
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Improving+accessibility+of+digitization+outputs%3A+EODOPEN+project+research+findings&rft.jtitle=Digital+library+perspectives&rft.au=Kav%C4%8Di%C4%8D+%C4%8Coli%C4%87%2C+Alenka&rft.au=Hari%2C+Andreja&rft.date=2024-05-14&rft.pub=Emerald+Publishing+Limited&rft.issn=2059-5816&rft.volume=40&rft.issue=2&rft.spage=187&rft.epage=211&rft_id=info:doi/10.1108%2FDLP-09-2023-0080&rft.externalDocID=10.1108%2FDLP-09-2023-0080
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2059-5816&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2059-5816&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2059-5816&client=summon