Modelling an IT solution to anonymise selected data processed in digital documents

Allowing access to real legal documents is an important element both for the development of science and the judiciary. On the other hand, protecting information about citizens or organizations, that appear in these documents, is crucial and required by law. Therefore, before the documents are distri...

Full description

Saved in:

Bibliographic Details
Published in	2022 17th Conference on Computer Science and Intelligence Systems (FedCSIS) Vol. 30; pp. 715 - 719
Main Authors	Probierz, Barbara, Jach, Tomasz, Kozak, Jan, Pacud, Radoslaw, Turek, Tomasz
Format	Conference Proceeding Journal Article
Language	English
Published	Polish Information Processing Society 01.01.2022
Subjects	Companies Computational modeling Computer science Data models Distributed databases Law Task analysis
Online Access	Get full text
ISSN	2300-5963
DOI	10.15439/2022F49

Cover

Loading…

More Information
Summary:	Allowing access to real legal documents is an important element both for the development of science and the judiciary. On the other hand, protecting information about citizens or organizations, that appear in these documents, is crucial and required by law. Therefore, before the documents are distributed, the data anonymisation process should be carried out. Unfortunately, there is no perfect tool that can automatically anonymise documents in such a way, that the main concept of the document is preserved; especially in the case of documents written in inflectional language. The aim of this article is to show how important (and at the same time how difficult) is the task to identify personal or corporate data of a client, as well as other related personal data in documents that are subject to legal protection. We conducted research aimed at assessing the usefulness of IT techniques as well as decision rules and patterns in the anonymisation of legal documents. A set of real legal documents written in Polish was used for the research in which we identified selected types of data that need to be anonymised. Eventually, the obtained results were assessed by field experts. Additionally, in order to verify the effectiveness of the proposed solution, we conducted research on a set of 50,000 false identities with names, company names, addresses and other confidential information. The collection was created using Fake Name Generator 1 . The obtained results from both experiments confirmed that the solutions we proposed is accurate even in the case of real legal documents.
ISSN:	2300-5963
DOI:	10.15439/2022F49