A Human-in-the-Loop Approach for Information Extraction from Privacy Policies under Data Scarcity
Machine-readable representations of privacy policies are door openers for a broad variety of novel privacy-enhancing and, in particular, transparency-enhancing technologies (TETs). In order to generate such representations, transparency information needs to be extracted from written privacy policies...
Saved in:
Main Authors | , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
24.05.2023
|
Subjects | |
Online Access | Get full text |
DOI | 10.48550/arxiv.2305.15006 |
Cover
Loading…
Summary: | Machine-readable representations of privacy policies are door openers for a
broad variety of novel privacy-enhancing and, in particular,
transparency-enhancing technologies (TETs). In order to generate such
representations, transparency information needs to be extracted from written
privacy policies. However, respective manual annotation and extraction
processes are laborious and require expert knowledge. Approaches for fully
automated annotation, in turn, have so far not succeeded due to overly high
error rates in the specific domain of privacy policies. In the end, a lack of
properly annotated privacy policies and respective machine-readable
representations persists and enduringly hinders the development and
establishment of novel technical approaches fostering policy perception and
data subject informedness.
In this work, we present a prototype system for a `Human-in-the-Loop'
approach to privacy policy annotation that integrates ML-generated suggestions
and ultimately human annotation decisions. We propose an ML-based suggestion
system specifically tailored to the constraint of data scarcity prevalent in
the domain of privacy policy annotation. On this basis, we provide meaningful
predictions to users thereby streamlining the annotation process. Additionally,
we also evaluate our approach through a prototypical implementation to show
that our ML-based extraction approach provides superior performance over other
recently used extraction models for legal documents. |
---|---|
DOI: | 10.48550/arxiv.2305.15006 |