Extracting JSON Schemas with Tagged Unions

With data lakes and schema-free NoSQL document stores, extracting a descriptive schema from JSON data collections is an acute challenge. In this paper, we target the discovery of tagged unions, a JSON Schema design pattern where the value of one property of an object (the tag) conditionally implies...

Full description

Saved in:
Bibliographic Details
Published inarXiv.org
Main Authors Klessinger, Stefan, Klettke, Meike, Störl, Uta, Scherzinger, Stefanie
Format Paper
LanguageEnglish
Published Ithaca Cornell University Library, arXiv.org 12.06.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:With data lakes and schema-free NoSQL document stores, extracting a descriptive schema from JSON data collections is an acute challenge. In this paper, we target the discovery of tagged unions, a JSON Schema design pattern where the value of one property of an object (the tag) conditionally implies subschemas for sibling properties. We formalize these implications as conditional functional dependencies and capture them using the JSON Schema operators if-then-else. We further motivate our heuristics to avoid overfitting. Experiments with our prototype implementation are promising, and show that this form of tagged unions can successfully be detected in real-world GeoJSON and TopoJSON datasets. In discussing future work, we outline how our approach can be extended further.
ISSN:2331-8422