Conference Paper (published)

Benchmarks That Matter for Genetic Programming

Details

Citation

Woodward J, Martin S & Swan J (2014) Benchmarks That Matter for Genetic Programming. In: Proceedings of the 2014 Conference Companion on Genetic and Evolutionary Computation Companion. GECCO Comp '14. GECCO 2014: Genetic and Evolutionary Computation Conference, Vancouver, BC, Canada, 12.07.2014-16.04.2014. New York, NY, USA: ACM, pp. 1397-1404. http://doi.acm.org/10.1145/2598394.2609875; https://doi.org/10.1145/2598394.2609875

Abstract
There have been several papers published relating to the practice of benchmarking in machine learning and Genetic Programming (GP) in particular. In addition, GP has been accused of targeting over-simplified 'toy' problems that do not reflect the complexity of real-world applications that GP is ultimately intended. There are also theoretical results that relate the performance of an algorithm with a probability distribution over problem instances, and so the current debate concerning benchmarks spans from the theoretical to the empirical. The aim of this article is to consolidate an emerging theme arising from these papers and suggest that benchmarks should not be arbitrarily selected but should instead be drawn from an underlying probability distribution that reflects the problem instances which the algorithm is likely to be applied to in the real-world. These probability distributions are effectively dictated by the application domains themselves (essentially data-driven) and should thus re-engage the owners of the originating data. A consequence of properly-founded benchmarking leads to the suggestion of meta-learning as a methodology for automatically designing algorithms rather than manually designing algorithms. A secondary motive is to reduce the number of research papers that propose new algorithms but do not state in advance what their purpose is (i.e. in what context should they be applied). To put the current practice of GP benchmarking in a particular harsh light, one might ask what the performance of an algorithm on Koza's lawnmower problem (a favourite toy-problem of the GP community) has to say about its performance on a very real-world cancer data set: the two are completely unrelated.

Keywords
evolutionary computation; function optimization; genetic programming; hyper-heuristics; machine learning; meta-learning; no free lunch theorems

StatusPublished
Title of seriesGECCO Comp '14
Publication date31/12/2014
Publication date online31/07/2014
Related URLshttp://www.sigevo.org/gecco-2014/
PublisherACM
Publisher URLhttp://doi.acm.org/10.1145/2598394.2609875
Place of publicationNew York, NY, USA
ISBN978-1-4503-2881-4
ConferenceGECCO 2014: Genetic and Evolutionary Computation Conference
Conference locationVancouver, BC, Canada
Dates