ON THE DISTRIBUTION OF SOURCE CODE FILE SIZES

Israel Herraiz, Daniel M. German, Ahmed E. Hassan

2011

Abstract

Source code size is an estimator of software effort. Size is also often used to calibrate models and equations to estimate the cost of software. The distribution of source code file sizes has been shown in the literature to be a lognormal distribution. In this paper, we measure the size of a large collection of software (the Debian GNU/Linux distribution version 5.0.2), and we find that the statistical distribution of its source code file sizes follows a double Pareto distribution. This means that large files are to be found more often than predicted by the lognormal distribution, therefore the previously proposed models underestimate the cost of software.

Download


Paper Citation


in Harvard Style

Herraiz I., German D. and Hassan A. (2011). ON THE DISTRIBUTION OF SOURCE CODE FILE SIZES . In Proceedings of the 6th International Conference on Software and Database Technologies - Volume 2: ICSOFT, ISBN 978-989-8425-77-5, pages 5-14. DOI: 10.5220/0003426200050014

in Bibtex Style

@conference{icsoft11,
author={Israel Herraiz and Daniel M. German and Ahmed E. Hassan},
title={ON THE DISTRIBUTION OF SOURCE CODE FILE SIZES},
booktitle={Proceedings of the 6th International Conference on Software and Database Technologies - Volume 2: ICSOFT,},
year={2011},
pages={5-14},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003426200050014},
isbn={978-989-8425-77-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 6th International Conference on Software and Database Technologies - Volume 2: ICSOFT,
TI - ON THE DISTRIBUTION OF SOURCE CODE FILE SIZES
SN - 978-989-8425-77-5
AU - Herraiz I.
AU - German D.
AU - Hassan A.
PY - 2011
SP - 5
EP - 14
DO - 10.5220/0003426200050014