The Comparison of the Software Cost Estimating Methods

Liming Wu
University of Calgary
Email:wul@cpsc.ucalgary.ca

Abstract

Practitioners have expressed concern over their inability to estimate accurately costs associated with software development. This concern has become even more pressing as cost associated development continue to increase. Considerable studies are now directed at constructing ,evaluating and selecting better software cost estimation models and tools for specific software development projects. This essay gives an overview of cost estimation models and then discusses their advantages and disadvantages. Finally, the guidelines for selecting appropriate cost estimation models are given and a combination method is recommended.

Table of contents


1. Introduction

It has been surveyed that nearly one-third projects overrun their budget and late delivered and two-thirds of all major projects substantially overrun their original estimates. The accurate prediction of software development costs is a critical issue to make the good management decisions and accurately determining how much effort and time a project required for both project managers as well as system analysts and developers. Without reasonably accurate cost estimation capability, project managers can not determine how much time and manpower cost the project should take and that means the software portion of the project is out of control from its beginning; system analysts can not make realistic hardware-software tradeoff analyses during the system design phase; software project personnel can not tell managers and customers that their proposed budget and schedule are unrealistic. This may lead to optimistic over promising on software development and the inevitable overruns and performance compromises as a consequence. But, actually huge overruns resulting from inaccurate estimates are believed to occur frequently.

The overall process of developing a cost estimate for software is not different from the process for estimating any other element of cost. There are, however, aspects of the process that are peculiar to software estimating. Some of the unique aspects of software estimating are driven by the nature of software as a product. Other problems are created by the nature of the estimating methodologies. Software cost estimation is a continuing activity which starts at the proposal stage and continues through the lift time of a project. Continual cost estimation is to ensure that the spending is in line with the budget.

Cost estimation is one of the most challenging tasks in project management. It is to accurately estimate needed resources and required schedules for software development projects. The software estimation process includes estimating the size of the software product to be produced, estimating the effort required, developing preliminary project schedules, and finally, estimating overall cost of the project.

It is very difficult to estimate the cost of software development. Many of the problems that plague the development effort itself are responsible for the difficulty encountered in estimating that effort. One of the first steps in any estimate is to understand and define the system to be estimated. Software, however, is intangible, invisible, and intractable. It is inherently more difficult to understand and estimate a product or process that cannot be seen and touched. Software grows and changes as it is written. When hardware design has been inadequate, or when hardware fails to perform as expected, the "solution" is often attempted through changes to the software. This change may occur late in the development process, and sometimes results in unanticipated software growth.

After 20 years research, there are many software cost estimation methods available including algorithmic methods, estimating by analogy, expert judgment method, price to win method, top-down method, and bottom-up method. No one method is necessarily better or worse than the other, in fact, their strengths and weaknesses are often complimentary to each other. To understand their strengths and weaknesses is very important when you want to estimate your projects.


2.Expert Judgment Method

Expert judgment techniques involve consulting with software cost estimation expert or a group of the experts to use their experience and understanding of the proposed project to arrive at an estimate of its cost.

Generally speaking, a group consensus technique, Delphi technique, is the best way to be used. The strengths and weaknesses are complementary to the strengths and weaknesses of algorithmic method.

To provide a sufficiently broad communication bandwidth for the experts to exchange the volume of information necessary to calibrate their estimates with those of the other experts, a wideband Delphi technique is introduced over standard Deliphi technique.

The estimating steps using this method:

  1. Coordinator present each expert with a specification and an estimation form.
  2. Coordinator calls a group meeting in which the experts discuss estimation issues with the coordinator and each other.
  3. Experts fill out forms anonymously
  4. Coordinator prepares and distributes a summary of the estimation on an iteration form.
  5. Coordinator calls a group meeting, specially focusing on having the experts discuss points where their estimates varied widely.
  6. Experts fill out forms, again anonymously, and steps 4 and 6 are iterated for as many rounds as appropriate.

The wideband Delphi Technique has subsequently been used in a number of studies and cost estimation activities. It has been highly successful in combining the free discuss advantages of the group meeting technique and advantage of anonymous estimation of the standard Delphi Technique.

The advantages of this method are:

The disadvantages include:


3. Estimating by Analogy

Estimating by analogy means comparing the proposed project to previously completed similar project where the project development information id known. Actual data from the completed projects are extrapolated to estimate the proposed project. This method can be used either at system-level or at the component-level.

Estimating by analogy is relatively straightforward. Actually in some respects, it is a systematic form of expert judgment since experts often search for analogous situations so as to inform their opinion.

The steps using estimating by analogy are:

  1. Characterizing the proposed project.
  2. Selecting the most similar completed projects whose characteristics have been stored in the historical data base.
  3. Deriving the estimate for the proposed project from the most similar completed projects by analogy.

The main advantages of this method are:

  1. The estimation are based on actual project characteristic data.
  2. The estimator's past experience and knowledge can be used which is not easy to be quantified.
  3. The differences between the completed and the proposed project can be identified and impacts estimated.

However there are also some problems with this method,

  1. Using this method, we have to determine how best to describe projects. The choice of variables must be restricted to information that is available at the point that the prediction required. Possibilities include the type of application domain, the number of inputs, the number of distinct entities referenced, the number of screens and so forth.
  2. Even once we have characterized the project, we have to determine the similarity and how much confidence can we place in the analogies. Too few analogies might lead to maverick projects being used; too many might lead to the dilution of the effect of the closest analogies. Martin Shepperd etc. introduced the method of finding the analogies by measuring Euclidean distance in n-dimensional space where each dimension corresponds to a variable. Values are standardized so that each dimension contributes equal weight to the process of finding analogies. Generally speaking, two analogies are the most effective.
  3. Finally, we have to derive an estimate for the new project by using known effort values from the analogous projects. Possibilities include means and weighted means which will give more influence to the closer analogies.

It has been estimated that estimating by analogy is superior technique to estimation via algorithmic model in at least some circumstances. It is a more intuitive method so it is easier to understand the reasoning behind a particular prediction..


4. Top-Down and Bottom-Up Methods

4.1 Top-Down Estimating Method

Top-down estimating method is also called Macro Model. Using top-down estimating method, an overall cost estimation for the project is derived from the global properties of the software project, and then the project is partitioned into various low-level components. The leading method using this approach is Putnam model. This method is more applicable to early cost estimation when only global properties are known. In the early phase of the software development, It is very useful because there are no detailed information available.

The advantages of this method are:

The disadvantages are:

Because it provides a global view of the software project, it usually embodies some effective features such as cost-time trade off capability that exists in Putnam model.

4.2 Bottom-up Estimating Method

Using bottom-up estimating method, the cost of each software components is estimated and then combine the results to arrive at an estimated cost of overall project. It aims at constructing the estimate of a system from the knowledge accumulated about the small software components and their interactions. The leading method using this approach is COCOMO's detailed model.

The advantages:

The disadvantages:


5. Algorithmic Method

5.1 General discussion

The algorithmic method is designed to provide some mathematical equations to perform software estimation. These mathematical equations are based on research and historical data and use inputs such as Source Lines of Code (SLOC), number of functions to perform, and other cost drivers such as language, design methodology, skill-levels, risk assessments, etc. The algorithmic methods have been largely studied and there are a lot of models have been developed, such as COCOMO models, Putnam model, and function points based models.

General advantages:

  1. It is able to generate repeatable estimations.
  2. It is easy to modify input data, refine and customize formulas.
  3. It is efficient and able to support a family of estimations or a sensitivity analysis.
  4. It is objectively calibrated to previous experience.

General disadvantages:

  1. .It is unable to deal with exceptional conditions, such as exceptional personnel in any software cost estimating exercises, exceptional teamwork, and an exceptional match between skill-levels and tasks.
  2. Poor sizing inputs and inaccurate cost driver rating will result in inaccurate estimation.
  3. Some experience and factors can not be easily quantified.

5.2 COCOMO Models

One very widely used algorithmic software cost model is the Constructive Cost Model (COCOMO). The basic COCOMO model has a very simple form:

MAN-MONTHS = K1* (Thousands of Delivered Source Instructions) K2

Where K1 and K2 are two parameters dependent on the application and development environment.

Estimates from the basic COCOMO model can be made more accurate by taking into account other factors concerning the required characteristics of the software to be developed, the qualification and experience of the development team, and the software development environment. Some of these factors are:

Complexity of the software

  1. Required reliability
  2. Size of data base
  3. Required efficiency (memory and execution time)
  4. Analyst and programmer capability
  5. Experience of team in the application area
  6. Experience of team with the programming language and computer
  7. Use of tools and software engineering practices

Many of these factors affect the person months required by an order of magnitude or more. COCOMO assumes that the system and software requirements have already been defined, and that these requirements are stable. This is often not the case.

COCOMO model is a regression model. It is based on the analysis of 63 selected projects. The primary input is KDSI. The problems are:

  1. In early phase of system life-cycle, the size is estimated with great uncertainty value. So, the accurate cost estimate can not be arrived at.
  2. The cost estimation equation is derived from the analysis of 63 selected projects. It usually have some problems outside of its particular environment. For this reason, the recalibration is necessary.
According to Kemerer's research, the average error for all versions of the model is 601%. The detailed model and Intermediate model seem not much better than basic model.

The first version of COCOMO model was originally developed in 1981. Now, it has been experiencing increasing difficulties in estimating the cost of software developed to new life cycle processes and capabilities including rapid-development process model, reuse-driven approaches, object-oriented approaches and software process maturity initiative.

For these reasons, The newest version, COCOMO 2.0, was developed. The major new modeling capabilities of COCOMO 2.0 are a tailorable family of software size models, involving object points, function points and source lines of code; nonlinear models for software reuse and reengineering; an exponent-driver approach for modeling relative software diseconomies of scale; and several additions, deletions, and updates to previous COCOMO effort-multiplier cost drivers. This new model is also serving as a framework for an extensive current data collection and analysis effort to further refine and calibrate the model's estimation capabilities.


5.3 Putnam model

Another popular software cost model is the Putnam model. The form of this model is:

Technical constant C= size * B1/3 * T4/3

Total Person Months B=1/T4 *(size/C)3

T= Required Development Time in years

Size is estimated in LOC

Where: C is a parameter dependent on the development environment and It is determined on the basis of historical data of the past projects.

Rating: C=2,000 (poor), C=8000 (good) C=12,000 (excellent).

The Putnam model is very sensitive to the development time: decreasing the development time can greatly increase the person-months needed for development.

One significant problem with the PUTNAM model is that it is based on knowing, or being able to estimate accurately, the size (in lines of code) of the software to be developed. There is often great uncertainty in the software size. It may result in the inaccuracy of cost estimation. According to Kemerer's research, the error percentage of SLIM, a Putnam model based method,is 772.87%.


5.4 Function Point Analysis Based Methods

From above two algorithmic models, we found they require the estimators to estimate the number of SLOC in order to get man-months and duration estimates. The Function Point Analysis is another method of quantifying the size and complexity of a software system in terms of the functions that the systems delivers to the user. A number of proprietary models for cost estimation have adopted a function point type of approach, such as ESTIMACS and SPQR/20.

The function point measurement method was developed by Allan Albrecht at IBM and published in 1979. He believes function points offer several significant advantages over SLOC counts of size measurement. There are two steps in counting function points:

  • Counting the user functions. The raw function counts are arrived at by considering a linear combination of five basic software components: external inputs, external outputs, external inquiries, logic internal files, and external interfaces, each at one of three complexity levels: simple, average or complex.. .The sum of these numbers, weighted according to the complexity level, is the number of function counts (FC).
  • Adjusting for environmental processing complexity. The final function points is arrived at by multiplying FC by an adjustment factor that is determined by considering 14 aspects of processing complexity. This adjustment factor allows the FC to be modified by at most 35% or -35%.
  • The collection of function point data has two primary motivations. One is the desire by managers to monitor levels of productivity. Another use of it is in the estimation of software development cost.

    There are some cost estimation methods which are based on a function point type of measurement, such as ESTIMACS and SPQR/20. SPQR/20 is based on a modified function point method. Whereas traditional function point analysis is based on evaluating 14 factors, SPQR/20 separates complexity into three categories: complexity of algorithms, complexity of code, and complexity of data structures. ESTIMACS is a propriety system designed to give development cost estimate at the conception stage of a project and it contains a module which estimates function point as a primary input for estimating cost.

    The advantages of function point analysis based model are:

    1. function points can be estimated from requirements specifications or design specifications, thus making it possible to estimate development cost in the early phases of development.
    2. function points are independent of the language, tools, or methodologies used for implementation.
    3. non-technical users have a better understanding of what function points are measuring since function points are based on the system user's external view of the system
    From Kemerer's research, the mean error percentage of ESTIMACS is only 85.48%. So, considering the 601% with COCOMO and 771% with SLIM, I think the Function Point based cost estimation methods is the better approach especially in the early phases of development.

    6. The Selection and Use of Estimation Methods

    6.1 The selection of Estimation methods

    From the above comparison, we know no one method is necessarily better or worse than the other, in fact, their strengths and weaknesses are often complimentary to each other. According to the experience, it is recommended that a combination of models and analogy or expert judgment estimation methods is useful to get reliable, accurate cost estimation for software development.

    For known projects and projects parts, we should use expert judgment method or analogy method if the similarities of them can be got, since it is fast and under these circumstance, reliable; For large, lesser known projects, it is better to use algorithmic model. In this case, many researchers recommend the estimation models that do not required SLOC as an input. I think COCOMO2.0 is the first candidate because COCOMO2.0 model not only can use Source lines of code (SLOC) but also can use Object points, unadjusted function points as metrics for sizing a project. If we approach cost estimation by parts, we may use expert judgment for some known parts. This way we can take advantage of both: the rigor of models and the speed of expert judgment or analogy. Because the advantages and disadvantages of each technique are complementary, a combination will reduce the negative effect of any one technique, augment their individual strengths and help to cross-check one method against another.

    6.2 Use of Estimation Methods

    It is very common that we apply some cost estimation methods to estimate the cost of software development. But what we have to note is that it is very important to continually re-estimate cost and to compare targets against actual expenditure at each major milestone. This keeps the status of the project visible and helps to identify necessary corrections to budget and schedule as soon as they occur.

    At every estimation and re-estimation point, iteration is an important tool to improve estimation quality. The estimator can use several estimation techniques and check whether their estimates converge. The other advantages are as following:

    It is also very important to compare actual cost and time to the estimates even if only one or two techniques are used. It will also provide the necessary feedback to improve the estimation quality in the future. Generally, the historical data base for cost estimation should be set up for future use.

    Identifying the goals of the estimation process is very important because it will influence the effort spent in estimating, its accuracy, and the models used. Tight schedules with high risks require more accurate estimates than loosely defined projects with a relatively open-ended schedule. The estimators should look at the quality of the data upon which estimates are based and at the various objectives.

    6.3 Model Calibration

    The act of calibration standardizes a model. Many model are developed for specific situations and are, by definition, calibrated to that situation. Such models usually are not useful outside of their particular environment. So, the act of calibration is needed to increase the accuracy of one of these general models by making it temporarily a specific model for whatever product it has been calibrated for. Calibration is in a sense customizing a generic model. Items which can be calibrated in a model include: product types, operating environments, labor rates and factors, various relationships between functional cost items, and even the method of accounting used by a contractor. All general models should be standardized (i.e. calibrated), unless used by an experienced modeler with the appropriate education, skills and tools, and experience in the technology being modeled.

    Calibration is the process of determining the deviation from a standard in order to compute the correction factors. For cost estimating models, the standard is considered historical actual costs. The calibration procedure is theoretically very simple. It is simply running the model with normal inputs (known parameters such as software lines of code) against items for which the actual cost are known. These estimates are then compared with the actual costs and the average deviation becomes a correction factor for the model. In essence, the calibration factor obtained is really good only for the type of inputs that were used in the calibration runs. For a general total model calibration, a wide range of components with actual costs need to be used. Better yet, numerous calibrations should be performed with different types of components in order to obtain a set of calibration factors for the various possible expected estimating situations.


    7. Conclusions

    The accurate prediction of software development costs is a critical issue to make the good management decisions and accurately determining how much effort and time a project required for both project managers as well as system analysts and developers. There are many software cost estimation methods available including algorithmic methods, estimating by analogy, expert judgment method, top-down method, and bottom-up method. No one method is necessarily better or worse than the other, in fact, their strengths and weaknesses are often complimentary to each other. To understand their strengths and weaknesses is very important when you want to estimate your projects.

    For a specific project to be estimated, which estimation methods should be used depend on the environment of the project. According to the weaknesses and strengths of the methods, you can choose some methods to be used. I think a combination of the expert judgment or analogy method and COCOMO2.0 is the best approach that you can choose. For known projects and projects parts, we should use expert judgment method or analogy method if the similarities of them can be got, since it is fast and under these circumstance, reliable; For large, lesser known projects, it is better to use algorithmic model like COCOMO2.0 which will be available in early 1997. If COCOMO2.0 is not available, ESTIMACS or the other function point based methods are highly recommended especially in the early phase of the software life-cycle because in the early phase of software life-cycle SLOC based methods have great uncertainty values of size. If there are many great uncertainty values of size, reuse, cost drivers etc., the analogous method or wide-band Delphi technology should be considered as the first candidate. And , the COCOMO 2.0 has capabilities to deal with the current software process and is served as a framework for an extensive current data collection and analysis effort to further refine and calibrate the model's estimation capabilities. In general, the COCOMO2.0 will be very popular. Now Dr. Barry Boehm and his students are developing COCOMO2.0. They expect to have it calibrated and usable in early 1997.

    Some recommendations:

    1. Do not depend on a single cost or schedule estimate.
    2. Use several estimating techniques or cost models, compare the results, and determine the reasons for any large variations.
    3. Document the assumptions made when making the estimates.
    4. Monitor the project to detect when assumptions that turn out to be wrong jeopardize the accuracy of the estimate.
    5. Improve software process: An effective software process can be used to increase accuracy in cost estimation in a number of ways.
    6. Maintaining a historical database

    References:

    1. Bernard L. "Cost Estimation For Software Development", Addision_Wesley, 1987
    2. Boehm, B.W. "Software Engineering Economics", Prentice_Hall, 1981
    3. Shepperd,M. "Effort Estimation Using Analogy", IEEE,1996
    4. Kemerer, C.F. "An Empirical Validation of Software Cost Estimation Models", CACM, May 1987
    5. Albrechet, A.J. etc. "Software Function, Source Lines of Code, and Development Effort Prediction: A Software Science Validation", IEEE on Software Engineering, NOV 1983
    6. Albert L. Lederer and Jayesh Prasad "Nine Management Guidelines for Better Cost Estimating", CACM,Vol.35,No.2, Feb 1992
    7. Boehm, B.W. "An Overview of COCOMO2.0 Software Cost Model "
    8. Shaw, M L.G. " Lecture Notes on Software Cost Estimation Model"
    9. SoftStar System Co. " COCOMO Model and SoftStar System"

    wul@cpsc.ucalgary.ca 4-Mar-97