Home > SENG 621 > Software Cost Estimation
![]() ![]() |
Winter 2002 | |
Software Cost Estimationby Samuel Lee (samuel.lee@telusplanet.net)
Department of Computer Science | ||
| 1. Abstract | |
|
Software projects are notorius for going past their deadline, going over budget, or both. The problem lies in the estimation of the amount of effort required for the development of a project. The cost estimation is usually dependent upon the size estimate of the project, which may use lines of code or function points as metrics (see Size Estimation). There are several different techniques for performing software cost estimation, including expert judgement and algorithmic models. Estimation by expert judgement is a common way of estimating the effort required for a project. Unfortunately, this method of estimation does not emphasize re-estimation during the project life cycle, which is an important part of project tracking, because it allows the estimates to be improved during the project life cycle. The quality of a cost estimation model is not so much attributed to the initial estimate, but rather the speed at which the estimates converges to the actual cost of the project. COCOMO is a popular algorithmic model for cost estimation whose cost factors can be tailored to the individual development environment, which is important for the accuracy of the cost estimates. More than one method of cost estimation should be done so that there is some comparison available for the estimates. This is especially important for unique projects. Cost estimation must be done more diligently throughout the project life cycle so that in the future there are fewer surprises and unforseen delays in the release of a product. | |
| 2. Introduction | |
|
Studies within the last few years have shown that a great deal more money is often spent on projects than is initially anticipated. IBM's Consulting Group did a survey of 24 leading companies in 1994 and found that 55% of the software developed cost more than the initial cost estimates (Hussein, 2002a). The Standish Group also did a study in 1994 of 8,380 projects in the United States and found that 53% of the software projects that were completed cost 189% of the original estimates (Hussein, 2002a). Although these numbers are from a few years ago, they likely have not changed much recently (Hussein, 2002b). The problem of making accurate cost estimates can be attributed to a number of reasons: the wrong cost estimation processes may be used, no processes may be used, or the nature of the problem may not allow for accurate cost estimation. Cost estimation is an often overlooked project management practice. Cost estimation can be defined as the approximate judgement of the costs for a project. Cost estimation will never be an exact science because there are too many variables involved in the calculation for a cost estimate, such as human, technical, environmental, and political. Futhermore, any process that involves a significant human factor can never be exact because humans are far too complex to be entirely predictable. Furthermore, software development for any fair-sized project will inevitably include a number of tasks that have complexities that are difficult to judge because of the complexity of software systems. Cost estimation is usually measured in terms of effort. The most common metric used is person months or years (or man months or years). The effort is the amount of time for one person to work for a certain period of time. It is important that the specific characteristics of the development environment are taking into account when comparing the effort of two or more projects because no two development environments are the same. A clear example of differences in development environments are the amount of time people work in different countries; the typical workweek in North America is 40 hours per week, while in Europe the typical workweek is 35 hours per week (Londeix, 1987). Thus, when comparing a project from North America with a project from Europe, a conversion factor would have to be used to all for an accurate comparison. Different variables can be used for cost estimation, which leads to a difficulty when comparing projects if standard models or tools are not used. For example, a cost estimate can include factors from management, development (e.g., training, quality assurance), and other areas specific to an organization. 2.2 Cost Estimation and Project Planning Cost estimation is an important tool that can affect the planning and budgeting of a project. Because there are a finite number of resources for a project, all of the features of a requirements document can often not all be included in the final product. A cost estimate done at the beginning of a project will help determine which features can be included within the resource constraints of the project (e.g., time). Requirements can be prioritized to ensure that the most important features are included in the product. The risk of a project is reduced when the most important features are included at the beginning because the complexity of a project increases with its size, which means there is more opportunity for mistakes as development progresses. Thus, cost estimation can have a big impact on the life cycle and schedule for a project. Cost estimation can also have an important effect on resource allocation. It is prudent for a company to allocate better resources, such as more experienced personnel, to costly projects. Manpower loading is a term used to measure the number of engineering and management personnel allocated to a project in a given amount of time. Most of time, it is worse for a company if a costly project fails than if a less costly project fails. When tools are used for estimation, management and developers can even experiment with trading off some resources (or factors) with others while keeping the cost of the project constant. For example, one tradeoff may be to invest in a more powerful integrated development environment (IDE) so that the number of personnel working on a project could be reduced. Cost estimation has a large impact on project planning and management. 2.3 Cost Estimation During the Software Life Cycle Cost estimation should be done throughout the entire life cycle. The first time cost estimation can be done is at the beginning of the project after the requirements have been outlined. Cost estimation may even be done more than once at the beginning of the project. For example, several companies may bid on a contract based on some preliminary or initial requirements, and then once a company wins the bid, a second round of estimation could be done with more refined and detailed requirements. Doing cost estimation during the entire life cycle allows for the refinement of the estimate because there is more data available. Periodic re-estimation is a way to gauge the progress of the project and whether deadlines will be able to be met. Effective monitoring and control of the software costs is required for the verification and improvement of the accuracy of the estimates. Tools are available to help organize and manage the cost estimates and the data that is captured during the development process. People are less likely to gather data if the process is cumbersome or tedious, and so using tools that are efficient and easy to use will save time. It is not always the most expensive tool that will be the best tool to buy, but rather the tool that is most suited to the development environment. Some thought should be given to the level of detail at which the metrics will be gathered, as well as planning for what metrics may be used in the future for comparison with other projects. The metrics that are gathered will be highly dependent upon the organization's development and organizational practices. The success of a cost estimate method is not necessarily the accuracy of the initial estimates, but rather the rate at which the estimates converge to the actual cost. An organization that does a great deal of contract work would place more importance on the initial estimates. However, in general, the method will be better if it converges quickly to the actual cost of the project. At the end of the project, all estimation methods have the opportunity to converge to the actual cost because enough information is available. The people who do the cost estimates could be either directly or indirectly responsible for the implementation for a project, such as a developer or manager, respectively. Someone who has knowledge of the organization and previous projects could use an analogy-based approach to compare the current project with previous projects, which is a common method of estimation for small organizations and small projects. The historical data is often limited to the memory of the estimator. In this case, the estimator would need to be experienced and would likely have been with the company for awhile. Some people believe it is better if the estimates are done by outsiders so that there is less chance of bias. It is true that people outside an organization will likely have to deal with fewer company politics than people within the organization. For example, the developer for a company may want to please the manager and so give an estimate that is overly-optimistic. The disadvantage of having an outside estimate is that the person would have less knowledge of the development environment, especially if the person is from outside the company. An empirical method of estimation would then be required, such as the Constructive Cost Model (COCOMO), which is discussed in more detail in section 5. Empirical methods of estimation can be used by all types of estimators. There may be some resistance to using an empirical method of estimation because there may be some question on whether a model could outperform an expert. People who are accurate estimators are rare in our experience, and so it is best to get the opinion of several people or tools. To give the reader a better idea of how software cost estimation fits into the development process, we will outline the general steps for doing cost estimation. The steps are not numbered because they are not completely discrete from one another. As well, although they generally follow a logical order, some of the steps can fit into several parts of the development process. Although this may at first seem to be confusing, the steps are straightforward enough that there should not be any difficulty in envisioning how they fit into the development process. The first and most important step is to establish a cost estimate plan (Pressman, 2001). In this plan, it should be stated what data will be gathered, why the data is being gathered, and the goal for doing the cost estimation process. Determining which data is to be gathered is essentially stating the level of detail of the metrics. This decision can influence the amount of decomposition for the tasks. There is obviously no point in gathering data that will not be used. This will seem unnecessary, and require more work, for the people who have to collect and manage the data. Although it may seem like a good idea to gather metrics that will not be used in the near future, but could possibly be used in the future, this is a waste of resources at the time. A fair amount of thought should be put into the cost estimation plan, much like the requirements for a project. The second step is to perform a cost estimation based on the requirements. Decomposition of the project can be done at this time if a lower level of abstraction is needed for the data. Keep in mind that it is important to use more than one method of estimation because there is no perfect technique. If there are wide variances in the estimates of the methods, then the information used to make the estimates should be re-evaluated (Humphrey, 1990). During the lifecycle, re-estimates should be done to allow for refinement of the cost estimates. The re-estimates could be done at major milestones during the project, or at specific time intervals. This decision will depend on the situation. Changes may have to be made to the project if the cost estimates either increase or decrease. At the end of the project, a final assessment of the results of the entire cost estimation process should be done. This allows a company to refine the estimation process in the future because of the data points that were obtained, and also allows the developers to review the development process. The remainder of this document consists of the following sections. The cost estimation process is outlined in section 3, which includes two different views of the estimation process. Six methods of doing cost estimation are described in section 4, along with the advantages and disadvantages of each method. A popular empirical method of estimation is discussed in section 5. Finally, section 6 includes a summary of some of the main issues of the cost estimation process. | |
| 3. Cost Estimation Process | |
|
In order to understand the end result or the outputs of the software cost estimation process we must first understand what is software cost estimation process. By definition, software cost estimation process is a set of techniques and procedures that is used to derive the software cost estimate. There is usually a set of inputs to the process and then the process uses these inputs to generate or calculate a set of outputs.
Figure 1: Classical view of software estimation process (Vigder and Kark, 1994) Most of the software cost estimation models views the estimation process as being a function that is computed from a set of cost drivers. And in most cost estimation techniques the primary cost driver or the most important cost driver is believed to be the software requirements. As illustrated in figure 1, in a classical view of software estimation process, the software requirements are the primary input to the process and also form the basis for the cost estimation. The cost estimate will then be adjusted accordingly to a number of other cost drivers to arrive at the final estimate. So what is cost driver? Cost driver is anything that may or will affect the cost of the software. Cost driver are things such as design methodology, skill-levels, risk assessment, personnel experience, programming language or system complexity. In a classical view of the estimation process, it will generate three outputs - efforts, duration and loading. The following is a brief description of the outputs:
The outputs (loading, duration and effort) are usually computed as fixed number with or without tolerance in the classical view. But in reality, the cost estimation process is more complex than what is shown in figure 1. Many of the data that are inputs to the process are modified or refined during the software cost estimation process.
Figure 2: Actual Cost Estimation Process (Vigder and Kark, 1994) In the actual cost estimation process there are other inputs and constraints that needed to be considered besides the cost drivers. One of the primary constraints of the software cost estimate is the financial constraint, which are the amount of the money that can be budgeted or allocated to the project. There are other constraints such as manpower constraints, and date constraints. Other input such as architecture, which defines the components that made up the system and the interrelationships between these components. Some company will have certain software process or an existing architecture in place; hence for these companies the software cost estimation must base their estimates on these criteria. There are only very few cases where the software requirements stay fixed. Hence, how do we deal with software requirement changes, ambiguities or inconsistencies? During the estimation process, an experienced estimator will detect the ambiguities and inconsistency in the requirements. As part of the estimation process, the estimator will try to solve all these ambiguities by modifying the requirements. If the ambiguities or inconsistent requirements stay unsolved, which will correspondingly affect the estimation accuracy. The cost estimation accuracy helps to determine how well or how accurate our estimation is when using a particular model or technique. We can assess the performance of the software estimation technique by:
Each of the error calculation techniques has advantages and disadvantages. For example, absolute error fails to measure the size of the project, and mean magnitude of relative error will mask any systematic bias (don't know if the estimation is over or under). | |
| 5. COCOMO | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
COCOMO stands for Constructive Cost Model, it is a software cost estimation model that was first published in 1981 by Barry Bohem (Bohem, 2001). It is an algorithmic approach to estimating the cost of a software project. By using COCOMO you can calculate the amount of effort and the time schedule for projects. From these calculations you can then find out how much staffing is required to complete a project on time. COCOMO's main metric used for calculating these values is lines of code (denoted KLOC for COCOMO II, or KDSI for COCOMO 81 and measured in thousands), function points (FP), or object points (OP). COCOMO also lets you check out 'what if' scenarios where by adjusting certain factors in COCOMO you can see how a projects time and effort estimates change as well (Bohem, 2001). There have been a few different versions of COCOMO; the two that are discussed in this report are COCOMO 81 and COCOMO II. The equations on which COCOMO is based are are also shown, however in real world use you would most likely use one of the free or commercial COCOMO tools available (SoftStar, 2002). COCOMO 81 was the first version of
COCOMO. It was modeled around
software practices of the 1980’s.
It has been found that on average it is able to produce estimates
that are within 20% of the actual values 68% of the time. COCOMO 81 has three different
models that can be used throughout a projects life cycle
(Bohem, 2001):
Within each of these models there are
also three different modes.
The mode you choose will depend on your work environment, and the
size and constraints of the project itself. The modes are:
There are two main equations that are
used to calculated effort and schedule time (measured in months). They are: Equation 1 PM = a(KDSI)b * EAF Equation 2
TDEV = c(PM)d Where:
Table 1 – List of Constants Based on Mode
The EAF is used to tailor your
estimate based on conditions of the development environment. For the basic model it is not used
and just set to 1. For the
intermediate model there are 15 different cost drivers that can be used to
calculate your EAF. They are
grouped into 4 different categories; product attributes, computer
attributes, personal attributes, and project attributes (see Table
2). Each cost driver is rated
on a scale Very Low to Extra High depending on how that cost driver will
affect your development.
These ratings are based on a statistical analysis of historical
data collected from 83 past projects. To calculate the EAF from the cost drivers you simply choose values for each cost driver and multiply them all together. The resulting number is your EAF. Table 2. List of 15 costs
drivers and their ratings for COCOMO 81 (Wu, 1997).
The advanced model of COCOMO 81 goes one step further then the intermediate modem in that it uses costs drivers that are rated differently depending on the current phase that a project is in. One of the problems with using a model like COCOMO 81 today is that it does not match the develop environment of the late 1990’s and 2000’s. It was created in a time when batch jobs were the norm, programs were run on mainframes and compile times were measured in hours not seconds. It is outdated for use in today’s development environment (rapid application development, 4th generation languages etc) so in 1997 COCOMO II was published and was suppose to solve most of these problems COCOMO II was published in 1997 and is
an updated model that addresses the problems with COCOMO 81. The main objectives of COCOMO II
were set out when it was first published. They are:
For the most part estimates are
obtained in pretty much the same way as COCOMO 81. The main changes have been in the
number and type of cost drivers and the calculation of equation variables
rather then the use of constants (for a detailed look at the specific
differences between COCOMO 81 and COCOMO II see (Bohem, 1998)).
The equations still use lines of code as their main metric, you can
however also using function points and object points to do estimates. The line of code metric used is
now the LOC. There are
standards set out by SEI for proper counting of lines, things like
if/then/else statements would be counted as one line (there are automated
tools that will do the counting for you when you want to collect data from
your own code). COCOMO II again has three models, but
they are different from the ones for COCOMO 81. They are:
In COCOMO II there are 17 cost drivers that are used in the Post-Architecture model. They are used in the same way as in COCOMO 81 to calculate the EAF. The cost drivers are not the same ones as in COCOMO 81; they are better suited for the software development environment on the 1990’s and 2000’s. They are grouped together as shown in table 3. We will not go into specific details on all of the cost drivers here as that information can be found in the paper “Cost Models for Future Software Life Cycle Processes: COCOMO 2.0” (Bohem et al, 1995). The cost drivers for COCOMO II are again rated on a scale from Very Low to Extra High in the same was as in COCOMO 81. Table 3 – List of COCOMO II’s Cost Drivers (Bohem
et al, 1995).
For a COCOMO model to be accurate it
must be calibrated using historical data. COCOMO 81 was calibrated using 63 data points
from past projects (Bohem, 2001).
The calibration process can be done by using a company’s own data,
but for the most part it requires more data then a single company would
have. The calibration
involves doing a statistical analysis on your data and then adjusting all
cost driver values. Because of the need of a proper calibration there are standard calibrations released. COCOMO II has gone through two calibrations, COCOMO II.1997 and COCOMO II.1998. COCOMO II.1997 was based on 83 data points and was found that it only could come within 20% of the actual values 46% of the time. The COCOMO II.1998 calibration was found to come within 30% of the actual values 75% of the time, this calibration was based on 161 data points (Bohem, Chulani, Clark, 1997). Users can also submit data from their own projects to be used in future calibrations. When using the release calibrations or your own it is important to continue collecting historical data so it can be use to further increase the accuracy of your estimation results in the future. COCOMO is no doubt the most popular method for doing software cost estimation. The estimations are relatively easy to do by hand. There also are tools available which allow you to calculate more complex estimation. Calibration of COCOMO is one of the most important things that needs to be done in order to get accurate estimations. Even though COCOMO may be the most popular estimation method it is recommended that you always use another method of estimation to verify your results. The other method should differ significantly from COCOMO. This way your project is examined from more then one angle and something that you may have overlooked when using COCOMO is not overlooked again. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 7. References | |
| Softstar Systems (2002), Answers to Frequently Asked Questions . http://www.softstarsystems.com/faq.htm. | |
| NASA JSC, Basic COCOMO Software Cost Model. http://www.jsc.nasa.gov/bu2/COCOMO.html. | |
| Boehm, B., Clark, B., Horowitz, E., Madachy, R., Shelby, R., and Westland C. (1995). Cost Models for Future Software Life Cycle Processes: COCOMO 2.0, Annals of Software Engineering. http://sunset.usc.edu/research/COCOMOII/Docs/stc.pdf. | |
| Boehm, B., Chulani, S., and Clark, B. (1997). Calibration Results of COCOMO II.1997. http://sunset.usc.edu/publications/TECHRPTS/1998/usccse98-502/CalPostArch.pdf. | |
| Boehm, B., Chulani, S., Clark, B. (1997). Calibrating the COCOMO II Post Architecture Model. http://sunset.usc.edu/Research_Group/Sunita/down/calpap.pdf. | |
| Boehm, B., Chulani, S., and Reifer, D. (1998). The Rosetta Stone: Making COCOMO 81 Files Work With COCOMO II. http://sunset.usc.edu/publications/TECHRPTS/1998/usccse98-516/usccse98-516.pdf. | |
| Boehm, B., (2001) COCOMO Website. http://sunset.usc.edu/research/COCOMOII/cocomo_main.html | |
| Chulani, S. (1998). Software Development Cost Estimation Approaches – A Survey. IBM Research. | |
| Humphrey, W.S. (1990). Managing the Software Process. Addison-Wesley Publishing Company, New York, NY. | |
| Hussein, A. (2002a). Introduction to Software Process Management. University of Calgary, Calgary, Canada. http://sern.ucalgary.ca/courses/SENG/621/W01/intro.ppt. | |
| Hussein, A. (2000b). University of Calgary, Calgary, Canada. Personal Communication on 10 January 2002. | |
| Londeix, B. (1987). Cost Estimation for Software Development. Addison-Wesley Publishing Company, New York, NY. | |
| Pressman, R.S. (2001). Software Engineering: A Practitioner’s Approach. McGraw-Hill Higher Education, New York, NY. | |
| Vigder, M. R. and Kark, A. W. (1994). Software Cost Estimation and Control. Software Engineering Institute for Information Technology. http://wwwsel.iit.nrc.ca/seldocs/cpdocs/NRC37116.pdf. | |
| Wu, L. (1997). The comparison of the Software Cost Estimating Methods, University of Calgary, Calgary, Canada. http://sern.ucalgary.ca/courses/seng/621/W97/wul/seng621_11.html. | |
| Figure 1. Classical view of software estimation process (Vigder and Kark, 1994) | |
| Figure 2. Actual Cost Estimation Process (Vigder and Kark, 1994) | |
| Table 1. List of Constants Based on Mode | |
| Table 2. List of 15 costs drivers and their ratings for COCOMO 81 | |
| Table 3. List of COCOMO II’s Cost Drivers | |
|
|
Updated 21 Feb 2002