Practitioners have expressed concern over their inability to estimate accurately costs associated with software development. This concern has become even more pressing as cost associated development continue to increase. Considerable studies are now directed at constructing ,evaluating and selecting better software cost estimation models and tools for specific software development projects. This essay gives an overview of cost estimation models and then discusses their advantages and disadvantages. Finally, the guidelines for selecting appropriate cost estimation models are given and a combination method is recommended.
Table of contents
It has been surveyed that nearly one-third projects overrun their
budget and late delivered and two-thirds of all major projects
substantially overrun their original estimates. The accurate prediction
of software development costs is a critical issue to make the
good management decisions and accurately determining how much
effort and time a project required for both project managers as
well as system analysts and developers. Without reasonably accurate
cost estimation capability, project managers can not determine
how much time and manpower cost the project should take and that
means the software portion of the project is out of control from
its beginning; system analysts can not make realistic hardware-software
tradeoff analyses during the system design phase; software project
personnel can not tell managers and customers that their proposed
budget and schedule are unrealistic. This may lead to optimistic
over promising on software development and the inevitable overruns
and performance compromises as a consequence. But, actually huge
overruns resulting from inaccurate estimates are believed to occur
frequently.
The overall process of developing a cost estimate for software
is not different from the process for estimating any other element
of cost. There are, however, aspects of the process that are peculiar
to software estimating. Some of the unique aspects of software
estimating are driven by the nature of software as a product.
Other problems are created by the nature of the estimating methodologies.
Software cost estimation is a continuing activity which starts
at the proposal stage and continues through the lift time of a
project. Continual cost estimation is to ensure that the spending
is in line with the budget.
Cost estimation is one of the most challenging tasks in project
management. It is to accurately estimate needed resources and
required schedules for software development projects. The software
estimation process includes estimating the size of the software
product to be produced, estimating the effort required, developing
preliminary project schedules, and finally, estimating overall
cost of the project.
It is very difficult to estimate the cost of software development.
Many of the problems that plague the development effort itself
are responsible for the difficulty encountered in estimating that
effort. One of the first steps in any estimate is to understand
and define the system to be estimated. Software, however, is intangible,
invisible, and intractable. It is inherently more difficult to
understand and estimate a product or process that cannot be seen
and touched. Software grows and changes as it is written. When
hardware design has been inadequate, or when hardware fails to
perform as expected, the "solution" is often attempted
through changes to the software. This change may occur late in
the development process, and sometimes results in unanticipated
software growth.
After 20 years research, there are many software cost estimation
methods available including algorithmic methods, estimating by
analogy, expert judgment method, price to win method, top-down
method, and bottom-up method. No one method is necessarily better
or worse than the other, in fact, their strengths and weaknesses
are often complimentary to each other. To understand their strengths
and weaknesses is very important when you want to estimate your
projects.
Expert judgment techniques involve consulting with software cost
estimation expert or a group of the experts to use their experience
and understanding of the proposed project to arrive at an estimate
of its cost.
Generally speaking, a group consensus technique, Delphi technique,
is the best way to be used. The strengths and weaknesses are complementary
to the strengths and weaknesses of algorithmic method.
To provide a sufficiently broad communication bandwidth for the
experts to exchange the volume of information necessary to calibrate
their estimates with those of the other experts, a wideband Delphi
technique is introduced over standard Deliphi technique.
The estimating steps using this method:
The wideband Delphi Technique has subsequently been used in a
number of studies and cost estimation activities. It has been
highly successful in combining the free discuss advantages of
the group meeting technique and advantage of anonymous estimation
of the standard Delphi Technique.
The advantages of this method are:
The disadvantages include:
Estimating by analogy means comparing the proposed project to
previously completed similar project where the project development
information id known. Actual data from the completed projects
are extrapolated to estimate the proposed project. This method
can be used either at system-level or at the component-level.
Estimating by analogy is relatively straightforward. Actually
in some respects, it is a systematic form of expert judgment since
experts often search for analogous situations so as to inform
their opinion.
The steps using estimating by analogy are:
The main advantages of this method are:
However there are also some problems with this method,
It has been estimated that estimating by analogy is superior technique to estimation via algorithmic model in at least some circumstances. It is a more intuitive method so it is easier to understand the reasoning behind a particular prediction..
Top-down estimating method is also called Macro Model. Using top-down
estimating method, an overall cost estimation for the project
is derived from the global properties of the software project,
and then the project is partitioned into various low-level components.
The leading method using this approach is Putnam model. This method
is more applicable to early cost estimation when only global properties
are known. In the early phase of the software development, It
is very useful because there are no detailed information available.
The advantages of this method are:
The disadvantages are:
Because it provides a global view of the software project, it
usually embodies some effective features such as cost-time trade
off capability that exists in Putnam model.
Using bottom-up estimating method, the cost of each software components
is estimated and then combine the results to arrive at an estimated
cost of overall project. It aims at constructing the estimate
of a system from the knowledge accumulated about the small software
components and their interactions. The leading method using this
approach is COCOMO's detailed model.
The advantages:
The disadvantages:
The algorithmic method is designed to provide some mathematical
equations to perform software estimation. These mathematical equations
are based on research and historical data and use inputs such
as Source Lines of Code (SLOC), number of functions to perform,
and other cost drivers such as language, design methodology, skill-levels,
risk assessments, etc. The algorithmic methods have been largely
studied and there are a lot of models have been developed, such
as COCOMO models,
Putnam model,
and function points based models.
General advantages:
General disadvantages:
One very widely used algorithmic software cost model is the Constructive
Cost Model (COCOMO). The
basic COCOMO model has a very simple form:
MAN-MONTHS = K1*
Where K1 and K2 are two parameters dependent on the application
and development environment.
Estimates from the basic COCOMO model can be made more accurate by taking into account other factors concerning the required characteristics of the software to be developed, the qualification and experience of the development team, and the software development environment. Some of these factors are:
Complexity of the software
Many of these factors affect the person months required by an order of magnitude or more. COCOMO assumes that the system and software requirements have already been defined, and that these requirements are stable. This is often not the case.
COCOMO model is a regression model. It is based on the analysis of 63 selected projects. The primary input is KDSI. The problems are:
The first version of COCOMO model was originally developed in
1981. Now, it has been experiencing increasing difficulties in
estimating the cost of software developed to new life cycle processes
and capabilities including rapid-development process model, reuse-driven
approaches, object-oriented approaches and software process maturity
initiative.
For these reasons, The newest version, COCOMO 2.0,
was developed. The major new modeling capabilities of COCOMO 2.0 are
a tailorable family of software size models, involving object
points, function points and source lines of code; nonlinear models
for software reuse and reengineering; an exponent-driver approach
for modeling relative software diseconomies of scale; and several
additions, deletions, and updates to previous COCOMO effort-multiplier
cost drivers. This new model is also serving as a framework for
an extensive current data collection and analysis effort to further
refine and calibrate the model's estimation capabilities.
Another popular software cost model is the Putnam model. The form
of this model is:
Technical constant C= size * *
Total Person Months B= *
T= Required Development Time in years
Size is estimated in LOC
Where: C is a parameter dependent on the development environment and It is determined on the basis of historical data of the past projects.
Rating: C=2,000 (poor), C=8000 (good) C=12,000 (excellent).
The Putnam model is very sensitive to the development time: decreasing
the development time can greatly increase the person-months needed
for development.
One significant problem with the PUTNAM model is that it is based on knowing, or being able to estimate accurately, the size (in lines of code) of the software to be developed. There is often great uncertainty in the software size. It may result in the inaccuracy of cost estimation. According to Kemerer's research, the error percentage of SLIM, a Putnam model based method,is 772.87%.
From above two algorithmic models, we found they require the estimators
to estimate the number of SLOC in order to get man-months and
duration estimates. The Function Point Analysis is another method
of quantifying the size and complexity of a software system in
terms of the functions that the systems delivers to the user.
A number of proprietary models for cost estimation have adopted
a function point type of approach, such as ESTIMACS
and SPQR/20.
The function point measurement method was developed by Allan Albrecht at IBM and published in 1979. He believes function points offer several significant advantages over SLOC counts of size measurement. There are two steps in counting function points:
The collection of function point data has two primary motivations.
One is the desire by managers to monitor levels of productivity.
Another use of it is in the estimation of software development
cost.
There are some cost estimation methods which are based on a function
point type of measurement, such as ESTIMACS and SPQR/20. SPQR/20
is based on a modified function point method. Whereas traditional
function point analysis is based on evaluating 14 factors, SPQR/20
separates complexity into three categories: complexity of algorithms,
complexity of code, and complexity of data structures. ESTIMACS
is a propriety system designed to give development cost estimate
at the conception stage of a project and it contains a module
which estimates function point as a primary input for estimating
cost.
The advantages of function point analysis based model are:
From the above comparison, we know no one method is necessarily
better or worse than the other, in fact, their strengths and weaknesses
are often complimentary to each other. According to the experience,
it is recommended that a combination of models and analogy or
expert judgment estimation methods is useful to get reliable,
accurate cost estimation for software development.
For known projects and projects parts, we should use expert
judgment method or analogy method if the similarities of them
can be got, since it is fast and under these circumstance, reliable;
For large, lesser known projects, it is better to use algorithmic
model. In this case, many researchers recommend the estimation
models that do not required SLOC as an input. I think COCOMO2.0
is the first candidate because COCOMO2.0 model not only can use
Source lines of code (SLOC) but also can use Object points, unadjusted
function points as metrics for sizing a project. If we approach
cost estimation by parts, we may use expert judgment for some
known parts. This way we can take advantage of both: the rigor
of models and the speed of expert judgment or analogy. Because
the advantages and disadvantages of each technique are complementary,
a combination will reduce the negative effect of any one technique,
augment their individual strengths and help to cross-check one
method against another.
It is very common that we apply some cost estimation methods to
estimate the cost of software development. But what we have to
note is that it is very important to continually re-estimate cost
and to compare targets against actual expenditure at each major
milestone. This keeps the status of the project visible and helps
to identify necessary corrections to budget and schedule as soon
as they occur.
At every estimation and re-estimation point, iteration is an important tool to improve estimation quality. The estimator can use several estimation techniques and check whether their estimates converge. The other advantages are as following:
It is also very important to compare actual cost and time to the
estimates even if only one or two techniques are used. It will
also provide the necessary feedback to improve the estimation
quality in the future. Generally, the historical data base for
cost estimation should be set up for future use.
Identifying the goals of the estimation process is very important
because it will influence the effort spent in estimating, its
accuracy, and the models used. Tight schedules with high risks
require more accurate estimates than loosely defined projects
with a relatively open-ended schedule. The estimators should look
at the quality of the data upon which estimates are based and
at the various objectives.
The act of calibration standardizes a model. Many model are developed
for specific situations and are, by definition, calibrated to
that situation. Such models usually are not useful outside of
their particular environment. So, the act of calibration is needed
to increase the accuracy of one of these general models by making
it temporarily a specific model for whatever product it has been
calibrated for. Calibration is in a sense customizing a generic
model. Items which can be calibrated in a model include: product
types, operating environments, labor rates and factors, various
relationships between functional cost items, and even the method
of accounting used by a contractor. All general models should
be standardized (i.e. calibrated), unless used by an experienced
modeler with the appropriate education, skills and tools, and
experience in the technology being modeled.
Calibration is the process of determining the deviation from a standard in order to compute the correction factors. For cost estimating models, the standard is considered historical actual costs. The calibration procedure is theoretically very simple. It is simply running the model with normal inputs (known parameters such as software lines of code) against items for which the actual cost are known. These estimates are then compared with the actual costs and the average deviation becomes a correction factor for the model. In essence, the calibration factor obtained is really good only for the type of inputs that were used in the calibration runs. For a general total model calibration, a wide range of components with actual costs need to be used. Better yet, numerous calibrations should be performed with different types of components in order to obtain a set of calibration factors for the various possible expected estimating situations.
The accurate prediction of software development costs is a critical
issue to make the good management decisions and accurately determining
how much effort and time a project required for both project managers
as well as system analysts and developers. There are many software
cost estimation methods available including algorithmic methods,
estimating by analogy, expert judgment method, top-down method,
and bottom-up method. No one method is necessarily better or worse
than the other, in fact, their strengths and weaknesses are often
complimentary to each other. To understand their strengths and
weaknesses is very important when you want to estimate your projects.
For a specific project to be estimated, which estimation methods
should be used depend on the environment of the project. According
to the weaknesses and strengths of the methods, you can choose
some methods to be used. I think a combination of the expert judgment
or analogy method and COCOMO2.0 is the best approach that you
can choose. For known projects and projects parts, we should use
expert judgment method or analogy method if the similarities of
them can be got, since it is fast and under these circumstance,
reliable; For large, lesser known projects, it is better to use
algorithmic model like COCOMO2.0 which will be available in early
1997. If COCOMO2.0 is not available, ESTIMACS or the other function
point based methods are highly recommended especially in the early
phase of the software life-cycle because in the early phase of
software life-cycle SLOC based methods have great uncertainty
values of size. If there are many great uncertainty values of
size, reuse, cost drivers etc., the analogous method or wide-band
Delphi technology should be considered as the first candidate.
And , the COCOMO 2.0 has capabilities to deal with the current
software process and is served as a framework for an extensive
current data collection and analysis effort to further refine
and calibrate the model's estimation capabilities. In general,
the COCOMO2.0 will be very popular. Now Dr. Barry Boehm and his
students are developing COCOMO2.0. They expect to have it calibrated
and usable in early 1997.
Some recommendations:
wul@cpsc.ucalgary.ca 4-Mar-97