Supercombinator set acquired from context-free grammar samples

•We present an algorithm that transforms context-free grammars into a single set of supercombinators.•We evaluate our algorithm with the use of 62,008 grammar samples obtained from Groningen Meaning Bank.•We have found the limit of supercombinator set, which in case of our sample set is a sequence o...

Full description

Saved in:

Bibliographic Details
Published in	Computer languages, systems & structures Vol. 54; pp. 1 - 19
Main Authors	Sičák, Michal, Kollár, Ján
Format	Journal Article
Language	English
Published	Elsevier Ltd 01.12.2018
Subjects	Abstract grammars Context-free grammars Supercombinators Supercombinators Abstract grammars Context-free grammars
Online Access	Get full text

Cover

Loading…

More Information
Summary:	•We present an algorithm that transforms context-free grammars into a single set of supercombinators.•We evaluate our algorithm with the use of 62,008 grammar samples obtained from Groningen Meaning Bank.•We have found the limit of supercombinator set, which in case of our sample set is a sequence of Catalan numbers.•We show the way how to identify the most common structures of input grammars. We present an algorithm that transforms context-free grammars into a non-redundant set of supercombinators. This set contains interconnected lambda calculus’ supercombinators that are enriched by grammar operations. The resulting set is scalable and it can be extended with new supercombinators created from grammars. We describe this algorithm in detail and then we apply it on 62,008 grammar samples in order to find out the properties and limits of acquired supercombinator set. We show that this set has a maximum theoretical limit of possible supercombinators. That limit is the sequence of Catalan numbers. We show that in some cases we are able to reach that limit if we use large enough input data source and we limit the size of supercombinators permitted into the final set. We also describe another benefit of our algorithm, which is the identification of most reoccurring structures in the input set.
ISSN:	1477-8424 1873-6866
DOI:	10.1016/j.cl.2018.04.001