A Human-in-the-Loop Approach for Information Extraction from Privacy Policies under Data Scarcity

Machine-readable representations of privacy policies are door openers for a broad variety of novel privacy-enhancing and, in particular, transparency-enhancing technologies (TETs). In order to generate such representations, transparency information needs to be extracted from written privacy policies...

Full description

Saved in:

Bibliographic Details
Main Authors	Gebauer, Michael, Maschhur, Faraz, Leschke, Nicola, Grünewald, Elias, Pallas, Frank
Format	Journal Article
Language	English
Published	24.05.2023
Subjects	Computer Science - Artificial Intelligence Computer Science - Computers and Society
Online Access	Get full text
DOI	10.48550/arxiv.2305.15006

Cover

Abstract	Machine-readable representations of privacy policies are door openers for a broad variety of novel privacy-enhancing and, in particular, transparency-enhancing technologies (TETs). In order to generate such representations, transparency information needs to be extracted from written privacy policies. However, respective manual annotation and extraction processes are laborious and require expert knowledge. Approaches for fully automated annotation, in turn, have so far not succeeded due to overly high error rates in the specific domain of privacy policies. In the end, a lack of properly annotated privacy policies and respective machine-readable representations persists and enduringly hinders the development and establishment of novel technical approaches fostering policy perception and data subject informedness. In this work, we present a prototype system for a `Human-in-the-Loop' approach to privacy policy annotation that integrates ML-generated suggestions and ultimately human annotation decisions. We propose an ML-based suggestion system specifically tailored to the constraint of data scarcity prevalent in the domain of privacy policy annotation. On this basis, we provide meaningful predictions to users thereby streamlining the annotation process. Additionally, we also evaluate our approach through a prototypical implementation to show that our ML-based extraction approach provides superior performance over other recently used extraction models for legal documents.
AbstractList	Machine-readable representations of privacy policies are door openers for a broad variety of novel privacy-enhancing and, in particular, transparency-enhancing technologies (TETs). In order to generate such representations, transparency information needs to be extracted from written privacy policies. However, respective manual annotation and extraction processes are laborious and require expert knowledge. Approaches for fully automated annotation, in turn, have so far not succeeded due to overly high error rates in the specific domain of privacy policies. In the end, a lack of properly annotated privacy policies and respective machine-readable representations persists and enduringly hinders the development and establishment of novel technical approaches fostering policy perception and data subject informedness. In this work, we present a prototype system for a `Human-in-the-Loop' approach to privacy policy annotation that integrates ML-generated suggestions and ultimately human annotation decisions. We propose an ML-based suggestion system specifically tailored to the constraint of data scarcity prevalent in the domain of privacy policy annotation. On this basis, we provide meaningful predictions to users thereby streamlining the annotation process. Additionally, we also evaluate our approach through a prototypical implementation to show that our ML-based extraction approach provides superior performance over other recently used extraction models for legal documents.
Author	Leschke, Nicola Pallas, Frank Maschhur, Faraz Gebauer, Michael Grünewald, Elias
Author_xml	– sequence: 1 givenname: Michael surname: Gebauer fullname: Gebauer, Michael – sequence: 2 givenname: Faraz surname: Maschhur fullname: Maschhur, Faraz – sequence: 3 givenname: Nicola surname: Leschke fullname: Leschke, Nicola – sequence: 4 givenname: Elias surname: Grünewald fullname: Grünewald, Elias – sequence: 5 givenname: Frank surname: Pallas fullname: Pallas, Frank
BackLink	https://doi.org/10.48550/arXiv.2305.15006$$DView paper in arXiv
BookMark	eNqFzrsOgkAUBNAttPD1AVbeHwAXEWNLFIOJhYn25GZdwibsI5eFwN-rxN5mZoopzpxNjDWSsXXEw_0xSfgWqVdduIt5EkYJ54cZwxTyVqMJlAl8JYObtQ5S58iiqKC0BFfzSY1eWQNZ7wnFOEuyGu6kOhQD3G2thJINtOYlCc7oER4CSSg_LNm0xLqRq18v2OaSPU95MGIKR0ojDcUXVYyo-P_jDdI1RCg
ContentType	Journal Article
Copyright	http://creativecommons.org/licenses/by/4.0
Copyright_xml	– notice: http://creativecommons.org/licenses/by/4.0
DBID	AKY GOX
DOI	10.48550/arxiv.2305.15006
DatabaseName	arXiv Computer Science arXiv.org
DatabaseTitleList
Database_xml	– sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
ExternalDocumentID	2305_15006
GroupedDBID	AKY GOX
ID	FETCH-arxiv_primary_2305_150063
IEDL.DBID	GOX
IngestDate	Tue Jul 22 23:18:37 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-arxiv_primary_2305_150063
OpenAccessLink	https://arxiv.org/abs/2305.15006
ParticipantIDs	arxiv_primary_2305_15006
PublicationCentury	2000
PublicationDate	2023-05-24
PublicationDateYYYYMMDD	2023-05-24
PublicationDate_xml	– month: 05 year: 2023 text: 2023-05-24 day: 24
PublicationDecade	2020
PublicationYear	2023
Score	3.6689453
SecondaryResourceType	preprint
Snippet	Machine-readable representations of privacy policies are door openers for a broad variety of novel privacy-enhancing and, in particular, transparency-enhancing...
SourceID	arxiv
SourceType	Open Access Repository
SubjectTerms	Computer Science - Artificial Intelligence Computer Science - Computers and Society
Title	A Human-in-the-Loop Approach for Information Extraction from Privacy Policies under Data Scarcity
URI	https://arxiv.org/abs/2305.15006
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdZ27T8MwEMZPpRMLAgEq7xtYDWliJ2GMoKVCvAaQskWXh6UsbWXSqvz3nO0gWLralnWyh_udT99ngOtxFRDFSSwo0I2QqooESUqFbphOtB5XqbJq5JfXePYpn3KVDwB_tTBkNu3a-wOXX7fMx-qGkcV6au-EoS2uHt9y35x0Vlz9-r91zJhu6F-SmO7DXk93mPnrOIBBMz8EytA9lYt2Lhi3xPNiscSs9_JGhkbsNUH2jHCy6YwXG6CVfuC7addUfaMz8OWyFq3qy-ADdYS2e1IxRh_B1XTycT8TLqhi6R0kChtv4eKNjmHIdX4zAuREnug0jhNOW5KUoohRI6ijUqk6UXf1CYy27XK6feoMdu0P6bbhHcpzGHZm1VxwHu3KS3eYP3kWeCY
linkProvider	Cornell University
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Human-in-the-Loop+Approach+for+Information+Extraction+from+Privacy+Policies+under+Data+Scarcity&rft.au=Gebauer%2C+Michael&rft.au=Maschhur%2C+Faraz&rft.au=Leschke%2C+Nicola&rft.au=Gr%C3%BCnewald%2C+Elias&rft.date=2023-05-24&rft_id=info:doi/10.48550%2Farxiv.2305.15006&rft.externalDocID=2305_15006