A greedy search tree heuristic for symbolic regression

Symbolic Regression tries to find a mathematical expression that describes the relationship of a set of explanatory variables to a measured variable. The main objective is to find a model that minimizes the error and, optionally, that also minimizes the expression size. A smaller expression can be s...

Full description

Saved in:
Bibliographic Details
Published inInformation sciences Vol. 442-443; pp. 18 - 32
Main Author Olivetti de França, Fabrício
Format Journal Article
LanguageEnglish
Published Elsevier Inc 01.05.2018
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Symbolic Regression tries to find a mathematical expression that describes the relationship of a set of explanatory variables to a measured variable. The main objective is to find a model that minimizes the error and, optionally, that also minimizes the expression size. A smaller expression can be seen as an interpretable model considered a reliable decision model. This is often performed with Genetic Programming, which represents their solution as expression trees. The shortcoming of this algorithm lies on this representation that defines a rugged search space and contains expressions of any size and difficulty. These pose as a challenge to find the optimal solution under computational constraints. This paper introduces a new data structure, called Interaction-Transformation (IT), that constrains the search space in order to exclude a region of larger and more complicated expressions. In order to test this data structure, it was also introduced an heuristic called SymTree. The obtained results show evidence that SymTree are capable of obtaining the optimal solution whenever the target function is within the search space of the IT data structure and competitive results when it is not. Overall, the algorithm found a good compromise between accuracy and simplicity for all the generated models.
ISSN:0020-0255
1872-6291
DOI:10.1016/j.ins.2018.02.040