Conference Paper (published)

Evaluating Explanations for Software Patches Generated by Large Language Models

Full Text

Details

Citation

Sobania D, Geiger A, Callan J, Brownlee A, Hanna C, Moussa R, Zamorano López M, Petke J & Sarro F (2023) Evaluating Explanations for Software Patches Generated by Large Language Models. In: Symposium on Search-Based Software Engineering- Challenge Track, San Francisco, CA, USA, 08.12.2023-08.12.2023.

Abstract
Large language models (LLMs) have recently been integrated in a variety of applications including software engineering tasks. In this work, we study the use of LLMs to enhance the explainability of software patches. In particular, we evaluate the performance of GPT 3.5 in explaining patches generated by the search-based automated program repair system ARJA-e for 30 bugs from the popular Defects4J benchmark. We also investigate the performance achieved when explaining the corresponding patches written by software developers. We find that on average 84% of the LLM explanations for machine-generated patches were correct and 54% were complete for the studied categories in at least 1 out of 3 runs. Furthermore, we find that the LLM generates more accurate explanations for machine-generated patches than for human-written ones.

Keywords
Large Language Models; Software Patches; AI Explainability; Program Repair; Genetic Improvement

Status	Accepted
Funders	Engineering and Physical Sciences Research Council and European Commission (Horizon 2020)
URL	http://hdl.handle.net/1893/35519
Conference	Symposium on Search-Based Software Engineering- Challenge Track
Conference location	San Francisco, CA, USA
Dates	08/12/2023

Evaluating Explanations for Software Patches Generated by Large Language Models

Details

People (1)

Research centres/groups