Modelling an IT solution to anonymise selected data processed in digital documents
Allowing access to real legal documents is an important element both for the development of science and the judiciary. On the other hand, protecting information about citizens or organizations, that appear in these documents, is crucial and required by law. Therefore, before the documents are distri...
Saved in:
Published in | 2022 17th Conference on Computer Science and Intelligence Systems (FedCSIS) Vol. 30; pp. 715 - 719 |
---|---|
Main Authors | , , , , |
Format | Conference Proceeding Journal Article |
Language | English |
Published |
Polish Information Processing Society
01.01.2022
|
Subjects | |
Online Access | Get full text |
ISSN | 2300-5963 |
DOI | 10.15439/2022F49 |
Cover
Loading…
Summary: | Allowing access to real legal documents is an important element both for the development of science and the judiciary. On the other hand, protecting information about citizens or organizations, that appear in these documents, is crucial and required by law. Therefore, before the documents are distributed, the data anonymisation process should be carried out. Unfortunately, there is no perfect tool that can automatically anonymise documents in such a way, that the main concept of the document is preserved; especially in the case of documents written in inflectional language. The aim of this article is to show how important (and at the same time how difficult) is the task to identify personal or corporate data of a client, as well as other related personal data in documents that are subject to legal protection. We conducted research aimed at assessing the usefulness of IT techniques as well as decision rules and patterns in the anonymisation of legal documents. A set of real legal documents written in Polish was used for the research in which we identified selected types of data that need to be anonymised. Eventually, the obtained results were assessed by field experts. Additionally, in order to verify the effectiveness of the proposed solution, we conducted research on a set of 50,000 false identities with names, company names, addresses and other confidential information. The collection was created using Fake Name Generator 1 . The obtained results from both experiments confirmed that the solutions we proposed is accurate even in the case of real legal documents. |
---|---|
ISSN: | 2300-5963 |
DOI: | 10.15439/2022F49 |