Improving accessibility of digitization outputs: EODOPEN project research findings
Purpose The current predominant delivery format resulting from digitization is PDF, which is not appropriate for the blind, partially sighted and people who read on mobile devices. To meet the needs of both communities, as well as broader ones, alternative file formats are required. With the finding...
Saved in:
Published in | Digital library perspectives Vol. 40; no. 2; pp. 187 - 211 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Emerald Publishing Limited
14.05.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Purpose
The current predominant delivery format resulting from digitization is PDF, which is not appropriate for the blind, partially sighted and people who read on mobile devices. To meet the needs of both communities, as well as broader ones, alternative file formats are required. With the findings of the eBooks-On-Demand-Network Opening Publications for European Netizens project research, this study aims to improve access to digitized content for these communities.
Design/methodology/approach
In 2022, the authors conducted research on the digitization experiences of 13 EODOPEN partners at their organizations. The authors distributed the same sample of scans in English with different characteristics, and in accordance with Web content accessibility guidelines, the authors created 24 criteria to analyze their digitization workflows, output formats and optical character recognition (OCR) quality.
Findings
In this contribution, the authors present the results of a trial implementation among EODOPEN partners regarding their digitization workflows, used delivery file formats and the resulting quality of OCR results, depending on the type of digitization output file format. It was shown that partners using the OCR tool ABBYY FineReader Professional and producing scanning outputs in tagged PDF and PDF/UA formats achieved better results according to set criteria.
Research limitations/implications
The trial implementations were limited to 13 project partners’ organizations only.
Originality/value
This research paper can be a valuable contribution to the field of massive digitization practices, particularly in terms of improving the accessibility of the output delivery file formats. |
---|---|
AbstractList | Purpose
The current predominant delivery format resulting from digitization is PDF, which is not appropriate for the blind, partially sighted and people who read on mobile devices. To meet the needs of both communities, as well as broader ones, alternative file formats are required. With the findings of the eBooks-On-Demand-Network Opening Publications for European Netizens project research, this study aims to improve access to digitized content for these communities.
Design/methodology/approach
In 2022, the authors conducted research on the digitization experiences of 13 EODOPEN partners at their organizations. The authors distributed the same sample of scans in English with different characteristics, and in accordance with Web content accessibility guidelines, the authors created 24 criteria to analyze their digitization workflows, output formats and optical character recognition (OCR) quality.
Findings
In this contribution, the authors present the results of a trial implementation among EODOPEN partners regarding their digitization workflows, used delivery file formats and the resulting quality of OCR results, depending on the type of digitization output file format. It was shown that partners using the OCR tool ABBYY FineReader Professional and producing scanning outputs in tagged PDF and PDF/UA formats achieved better results according to set criteria.
Research limitations/implications
The trial implementations were limited to 13 project partners’ organizations only.
Originality/value
This research paper can be a valuable contribution to the field of massive digitization practices, particularly in terms of improving the accessibility of the output delivery file formats. |
Author | Hari, Andreja Kavčič Čolić, Alenka |
Author_xml | – sequence: 1 givenname: Alenka surname: Kavčič Čolić fullname: Kavčič Čolić, Alenka email: alenka.kavcic@nuk.uni-lj.si – sequence: 2 givenname: Andreja surname: Hari fullname: Hari, Andreja email: andreja.hari@nuk.uni-lj.si |
BookMark | eNptkMFKAzEURYNUsNbuXeYHYl8mTSZxJ23VwmCL6DpkMpma0s6UJBXq1ztDXSgID-7bnMvlXKNB0zYOoVsKd5SCnMyLNQFFMsgYAZBwgYYZcEW4pGLw679C4xi3AEBzwbiYDtHrcn8I7advNthY62L0pd_5dMJtjSu_8cl_meTbBrfHdDimeI8Xq_lqvXjBHbZ1NuHgojPBfuDaN1XXE2_QZW120Y1_coTeHxdvs2dSrJ6Ws4eCWAYiEVvyjBqV10IpPlWiEowZkDWXwjAuu4OS2UxwnoGhrMoNk91ESlUlLTclGyE499rQxhhcrQ_B7004aQq616I7LRqU7rXoXkuHTM6I27tgdtV_xB-R7BscL2Uh |
Cites_doi | 10.1109/ICDAR.2019.00255 10.1109/ICDAR.2019.00166 10.1145/2232817.2232836 10.1109/BigData52589.2021.9671586 10.1109/ICCIT48885.2019.9038593 10.1007/s10209-015-0438-8 10.1109/ICTIIA54654.2022.9935961 10.1117/12.893909 10.1109/ICDAR.2019.00243 10.1177/0961000616679880 10.1108/07378830310467445 10.18352/lq.10322 |
ContentType | Journal Article |
Copyright | Alenka Kavčič Čolić and Andreja Hari. |
Copyright_xml | – notice: Alenka Kavčič Čolić and Andreja Hari. |
DBID | XDTOA AAYXX CITATION |
DOI | 10.1108/DLP-09-2023-0080 |
DatabaseName | Emerald Open Access Journals CrossRef |
DatabaseTitle | CrossRef |
DatabaseTitleList | CrossRef |
Database_xml | – sequence: 1 dbid: XDTOA name: Emerald Open Access Journals url: https://www.emerald.com/insight sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
EISSN | 2059-5816 |
EndPage | 211 |
ExternalDocumentID | 10_1108_DLP_09_2023_0080 10.1108/DLP-09-2023-0080 |
GroupedDBID | .X0 3FY 5VS 9F- AAMCF AAUDR ABIJV ABSDC ABYQI ADOMW AJEBP ALMA_UNASSIGNED_HOLDINGS ALSLI AODMV ARAPS AUCOK BENPR BVLZF EBS FNNZZ GEI GMN GQ. H13 HCIFZ KLENG M1O M2O M42 TGG TMF TMT XDTOA Z11 Z12 Z21 AAYXX CITATION |
ID | FETCH-LOGICAL-c306t-cb521a97f6995496d633a08f586a3583580b3c265520a13d7a38acc119d8c5ab3 |
IEDL.DBID | XDTOA |
ISSN | 2059-5816 |
IngestDate | Fri Aug 23 02:03:12 EDT 2024 Thu Aug 22 06:15:05 EDT 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 2 |
Keywords | Digitization Blind and partially sighted Accessibility Mobile technology users |
Language | English |
License | Published in Asia Pacific Journal of Innovation and Entrepreneurship. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c306t-cb521a97f6995496d633a08f586a3583580b3c265520a13d7a38acc119d8c5ab3 |
OpenAccessLink | https://www.emerald.com/insight/content/doi/10.1108/DLP-09-2023-0080/full/html |
PageCount | 25 |
ParticipantIDs | crossref_primary_10_1108_DLP_09_2023_0080 emerald_primary_10_1108_DLP-09-2023-0080 |
PublicationCentury | 2000 |
PublicationDate | 2024-05-14 |
PublicationDateYYYYMMDD | 2024-05-14 |
PublicationDate_xml | – month: 05 year: 2024 text: 2024-05-14 day: 14 |
PublicationDecade | 2020 |
PublicationTitle | Digital library perspectives |
PublicationYear | 2024 |
Publisher | Emerald Publishing Limited |
Publisher_xml | – name: Emerald Publishing Limited |
References | (key2024051304324865200_ref020) 2017; 16 (key2024051304324865200_ref010) 2019 (key2024051304324865200_ref001) 2023 key2024051304324865200_ref030 key2024051304324865200_ref011 key2024051304324865200_ref032 key2024051304324865200_ref016 (key2024051304324865200_ref033) 2020 (key2024051304324865200_ref008) 2020 key2024051304324865200_ref019 (key2024051304324865200_ref017) 2019 (key2024051304324865200_ref015) 2020; 30 (key2024051304324865200_ref031) 2003; 21 (key2024051304324865200_ref002) 2022 (key2024051304324865200_ref012) 2021 (key2024051304324865200_ref006) 2011 (key2024051304324865200_ref003) 2019; 51 (key2024051304324865200_ref018) 2017; 37 (key2024051304324865200_ref013) 2012 key2024051304324865200_ref022 key2024051304324865200_ref023 (key2024051304324865200_ref029) 2021 key2024051304324865200_ref021 key2024051304324865200_ref004 key2024051304324865200_ref026 key2024051304324865200_ref005 key2024051304324865200_ref027 key2024051304324865200_ref024 (key2024051304324865200_ref014) 2023 key2024051304324865200_ref025 key2024051304324865200_ref009 key2024051304324865200_ref028 key2024051304324865200_ref007 |
References_xml | – ident: key2024051304324865200_ref021 doi: 10.1109/ICDAR.2019.00255 – ident: key2024051304324865200_ref028 doi: 10.1109/ICDAR.2019.00166 – start-page: 91 volume-title: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries year: 2012 ident: key2024051304324865200_ref013 article-title: Transforming Japanese archives into accessible digital books doi: 10.1145/2232817.2232836 – start-page: 2224 volume-title: 2021 IEEE International Conference on Big Data (Big Data) year: 2021 ident: key2024051304324865200_ref012 article-title: Using transfer learning to contextually optimize optical character recognition (OCR) output and perform new feature extraction on a digitized cultural and historical dataset doi: 10.1109/BigData52589.2021.9671586 – ident: key2024051304324865200_ref027 – ident: key2024051304324865200_ref025 – ident: key2024051304324865200_ref023 – ident: key2024051304324865200_ref004 – volume-title: 2019 22nd International Conference on Computer and Information Technology (ICCIT) year: 2019 ident: key2024051304324865200_ref017 article-title: Offline optical character recognition (OCR) method: an effective method for scanned documents doi: 10.1109/ICCIT48885.2019.9038593 – volume: 16 start-page: 29 issue: 1 year: 2017 ident: key2024051304324865200_ref020 article-title: How to present more readable text for people with dyslexia publication-title: Universal Access in the Information Society doi: 10.1007/s10209-015-0438-8 – ident: key2024051304324865200_ref019 – year: 2020 ident: key2024051304324865200_ref033 article-title: Chargid-OCR: End-to-end Trainable Optical Character Recognition for Printed Documents using Instance Segmentation – volume-title: 1st International Conference on Technology Innovation and Its Applications (ICTIIA) year: 2022 ident: key2024051304324865200_ref002 article-title: Optical character recognition (OCR) for text recognition and its post-processing method: a literature review doi: 10.1109/ICTIIA54654.2022.9935961 – volume: 37 start-page: 456 issue: 3 year: 2017 ident: key2024051304324865200_ref018 article-title: Blind academic library users’ experiences with obtaining full text and accessible full text of books and articles in the USA. A qualitative study publication-title: Library Hi Tech – year: 2021 ident: key2024051304324865200_ref029 article-title: 1st place solution for ICDAR 2021 competition on mathematical formula detection – start-page: 813507 volume-title: Proceedings of SPIE – The International Society for Optical Engineering year: 2011 ident: key2024051304324865200_ref006 article-title: Page layout analysis and classification for complex scanned documents doi: 10.1117/12.893909 – ident: key2024051304324865200_ref011 – ident: key2024051304324865200_ref032 – year: 2019 ident: key2024051304324865200_ref010 article-title: ICDAR 2019 competition on table detection and recognition (cTDaR) doi: 10.1109/ICDAR.2019.00243 – ident: key2024051304324865200_ref030 – year: 2023 ident: key2024051304324865200_ref001 article-title: ICDAR 2023 competition on robust layout segmentation in corporate documents – volume: 51 start-page: 162 issue: 1 year: 2019 ident: key2024051304324865200_ref003 article-title: Information access preferences and behaviour of blind foundation library clients publication-title: Journal of Librarianship and Information Science doi: 10.1177/0961000616679880 – ident: key2024051304324865200_ref009 – ident: key2024051304324865200_ref026 – start-page: 141 volume-title: Progress in Computer Recognition Systems. Advances in Intelligent Systems and Computing year: 2020 ident: key2024051304324865200_ref008 article-title: Segmentation of scanned documents using Deep-Learning approach – ident: key2024051304324865200_ref024 – ident: key2024051304324865200_ref007 – ident: key2024051304324865200_ref005 – ident: key2024051304324865200_ref022 – volume: 21 start-page: 102 issue: 1 year: 2003 ident: key2024051304324865200_ref031 article-title: Electronic reserves: the promise and challenge to increase accessibility publication-title: Library Hi Tech doi: 10.1108/07378830310467445 – start-page: 1 volume-title: 2023 3rd International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME) year: 2023 ident: key2024051304324865200_ref014 article-title: OCR quality: Key to enhanced data mining – volume: 30 start-page: 1 issue: 1 year: 2020 ident: key2024051304324865200_ref015 article-title: Ground truth OCR sample data of Finnish historical newspapers and journals in data improvement validation of a re-OCRing process publication-title: LIBER Quarterly doi: 10.18352/lq.10322 – ident: key2024051304324865200_ref016 |
SSID | ssj0001763564 |
Score | 2.310058 |
Snippet | Purpose
The current predominant delivery format resulting from digitization is PDF, which is not appropriate for the blind, partially sighted and people who... |
SourceID | crossref emerald |
SourceType | Aggregation Database Publisher |
StartPage | 187 |
Title | Improving accessibility of digitization outputs: EODOPEN project research findings |
URI | https://www.emerald.com/insight/content/doi/10.1108/DLP-09-2023-0080/full/html |
Volume | 40 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFG4QLl6MRo34g_TgQQ8LW7t2nTcjEGIUiIGE29KuG5DoIDIO_vf2dYVIwsFbD03TfE36fnzvfQ-h-1hGhFLGPSJECCPMlKeICVwzCvLmVKdEQ7_z-4D3J-HrlE1raLDthbFllVU6xv7Ti2INQWobCrfNL7wTHIDpNZ23ETD3MADcA8-nDRnr9rz8-jxCDWJiO1JHjWln7IIsm3WxemxANRPjV3hMBDvu8sCBe7Zq27D7x_z0TtGJ8xvxc_XQZ6iWFefoY5cSwNIOPqxKXX_wMsd6MVuUrskSLzflalOun3B32BmOugPsEjDYaf3MseWui9n6Ak163fFL33NDErzUePullypjgGUc5RyU3WKuOaXSFzkTXFImgOVUNCWcMeLLgOpIUmGuFASxFimTil6ierEssiuEZaCNcVeC5DoLBYklz6mkUgUSNNjzuIket3Akq0oLI7ExhC8SA13ixwlAlwB0TfTg8Dq0dQ_l6_9vvUHHZh0Cex-Et6hefm-yO-MUlKpl27xb7r1_AZsurs8 |
link.rule.ids | 315,786,790,974,11662,27882,27955,27956,52722,52724 |
linkProvider | Emerald |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV05T8MwFLZKGWDhECDK6YEBhrSNHScOG6KtCrRphVqJLbJztBWQVDQZ4Nfjl6OiqAMDa2RFzhfL73vX9xC6soVFKGWmRjg3YISZ1CRRjmtAQd6c-h7xod-575jdsfH4wl4qyCl7YbKyyjwck93Ts2gBTmoDCrfVLbwUHIDpNa3eEDL3MABcA-bTgIh1Y5q8v22gTUYsQsrm3zLmkqmxQaKZKFahMa4vM5drXrdiqcp23R_Gp7OL4nLbec3Jaz1NZN37-qXo-H_ftYd2Cp6K7_KDtY8qQXSAnpchCCyyQYt5ae0njkPszyazpGjqxHGazNNkcYvbg9Zg2HZwEfDBhbbQFGe58miyOETjTnt039WKoQyap7yLRPOkMvjCtkITlORs0zcpFU0eMm4KyjhkVSX1iMkYaQqd-pagXG1J122fe0xIeoSqURwFxwgL3VdkQnIS-oHBiS3MkAoqpC5A8z20a-im_AHuPNfecDOfpcldhZHbtF3AyAWMaui6gHfd0hU4T_6-9BJtdUf9ntt7cJ5O0bZ6bkDlgG6coWrykQbnipAk8iI7Zd8J2tVb |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Improving+accessibility+of+digitization+outputs%3A+EODOPEN+project+research+findings&rft.jtitle=Digital+library+perspectives&rft.au=Kav%C4%8Di%C4%8D+%C4%8Coli%C4%87%2C+Alenka&rft.au=Hari%2C+Andreja&rft.date=2024-05-14&rft.pub=Emerald+Publishing+Limited&rft.issn=2059-5816&rft.volume=40&rft.issue=2&rft.spage=187&rft.epage=211&rft_id=info:doi/10.1108%2FDLP-09-2023-0080&rft.externalDocID=10.1108%2FDLP-09-2023-0080 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2059-5816&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2059-5816&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2059-5816&client=summon |