Automating Governing Knowledge Commons and Contextual Integrity (GKC-CI) Privacy Policy Annotations with Large Language Models
Identifying contextual integrity (CI) and governing knowledge commons (GKC) parameters in privacy policy texts can facilitate normative privacy analysis. However, GKC-CI annotation has heretofore required manual or crowdsourced effort. This paper demonstrates that high-accuracy GKC-CI parameter anno...
Saved in:
Main Authors | , , |
---|---|
Format | Journal Article |
Language | English |
Published |
03.11.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Identifying contextual integrity (CI) and governing knowledge commons (GKC)
parameters in privacy policy texts can facilitate normative privacy analysis.
However, GKC-CI annotation has heretofore required manual or crowdsourced
effort. This paper demonstrates that high-accuracy GKC-CI parameter annotation
of privacy policies can be performed automatically using large language models.
We fine-tune 50 open-source and proprietary models on 21,588 GKC-CI annotations
from 16 ground truth privacy policies. Our best performing model has an
accuracy of 90.65%, which is comparable to the accuracy of experts on the same
task. We apply our best performing model to 456 privacy policies from a variety
of online services, demonstrating the effectiveness of scaling GKC-CI
annotation for privacy policy exploration and analysis. We publicly release our
model training code, training and testing data, an annotation visualizer, and
all annotated policies for future GKC-CI research. |
---|---|
DOI: | 10.48550/arxiv.2311.02192 |