Publications by the same author
plus in the repository
plus in Google Scholar

Bibliografische Daten exportieren
 

Information-theoretic detection of unusual source code changes

DOI zum Zitieren der Version auf EPub Bayreuth: https://doi.org/10.15495/EPub_UBT_00008932
URN to cite this document: urn:nbn:de:bvb:703-epub-8932-4

Title data

Torres, Adriano ; Baltes, Sebastian ; Treude, Christoph ; Wagner, Markus:
Information-theoretic detection of unusual source code changes.
In: Empirical Software Engineering. Vol. 30 (2025) . - 153.
ISSN 1573-7616
DOI der Verlagsversion: https://doi.org/10.1007/s10664-025-10644-y

[thumbnail of s10664-025-10644-y.pdf]
Format: PDF
Name: s10664-025-10644-y.pdf
Version: Published Version
Available under License Creative Commons BY 4.0: Attribution
Download (2MB)

Project information

Project title:
Project's official title
Project's id
Open Access Publizieren
No information

Abstract

The code base of software projects evolves essentially through inserting and removing information to and from the source code. We can measure this evolution via the elements of infor-mation—tokens, words, nodes—of the respective representation of the code. In this work, we approach the measurement of the information content of the source code of open-source projects from an information-theoretic standpoint. Our focus is on the entropy of two funda-mental representations of code: tokens and abstract syntax tree nodes, from which we derive definitions of textual and structural entropy. We proceed with an empirical assessment where we evaluate the evolution patterns of the entropy of 95 actively maintained open source pro-jects. We calculate the statistical relationships between our derived entropy metrics and classic methods of measuring code complexity and learn that entropy may capture different dimen-sions of complexity than classic metrics. Finally, we conduct entropy-based anomaly detection of unusual changes to demonstrate that our approach may effectively recognise unusual source code change events with over 60% precision, and lay the groundwork for improvements to information-theoretic measurement of source code evolution, thus paving the way for a new approach to statically gauging program complexity throughout its development.

Further data

Item Type: Article in a journal
DDC Subjects: 000 Computer Science, information, general works > 004 Computer science
Institutions of the University: Faculties > Faculty of Mathematics, Physics und Computer Science > Department of Computer Science > Former Professors > Chair Applied Computer Science I - Univ.-Prof. Dr. Sebastian Baltes
Faculties
Faculties > Faculty of Mathematics, Physics und Computer Science
Faculties > Faculty of Mathematics, Physics und Computer Science > Department of Computer Science
Faculties > Faculty of Mathematics, Physics und Computer Science > Department of Computer Science > Former Professors
Language: English
Originates at UBT: Yes
URN: urn:nbn:de:bvb:703-epub-8932-4
Date Deposited: 27 Feb 2026 10:46
Last Modified: 27 Feb 2026 10:47
URI: https://epub.uni-bayreuth.de/id/eprint/8932

Downloads

Downloads per month over past year