Research Project
1: Extending the Geometric
Stratification Method
In spite of the explosion in electronic data collection,
sampling is still important today; if
anything it is even more important than heretofore. The sheer volume of the
data collected has remained tantalisingly ahead of computer capacity: for
example in data mining, the fundamental challenge is to mine sets which are so
large that they do not fit into a
computer memory. Data mining has a wide variety of applications ranging
from predicting consumer behaviour to identifying fraudulent credit card
transactions. The time is ripe to invest in research in this area, and not to
lose the initiative to elsewhere.
Rarely is a survey carried out
without stratification, where a population is
divided into a set of homogeneous subsets called strata, and independent samples are taken from each
stratum. The delineation of stratum
boundaries is a problem that has eluded
researchers for decades, even a
collosus like Cochran. It was first
addressed fifty years ago, but an
objective and practicable solution could not be found, that is until Gunning and Horgan (2004) developed geometric stratification, a surprisingly
easy method of obtaining definitive optimum boundaries.
While the geometric method is
beginning to replace the older more complicated methods for obtaining
stratification boundaries (see for example Gonzalo et al. 2007, and Keslinturk
and Er 2007 and Kozak and Verma 2006), recent research on the geometric method
has thrown up some controversy, and highlighted some potential limitations. For
example Kozak and Verma (2006), while acknowledging that the geometric method
is very simple compared to the random search method of Kozak (2004) and the
iterative procedure of Lavallee and Hidiroglou (1988), have challenged its
optimality, and called for further studies on real skewed populations. They have also pointed out that the
geometric method does not take into account the take-all stratum, which is
usually constructed when a positively skewed population is stratified.
Keskinturk and Er (2007) have observed that, with the geometric
method, some strata may be empty, and/or the sample sizes from some strata
may be greater than their population
sizes.
It is these issues that will be
addressed in this project. We intend to
modify and extend the geometric stratification method so that it provides
an optimum take-all stratum, and
guarantees that the remaining strata are non-empty, and that samples sizes from
each of these strata do not exceed the
stratum sizes.
References:
·
Gunning,
J. and Horgan, J.M. (2007). Improving the Lavallee and Hidiroglou Algorithm for
Stratification of Skewed Populations, Journal of Statistical Computation and
Simulation, Vol. 77, 4
pp. 277 - 291.
·
Gunning,
P. and Horgan, J.M. (2007). Choosing the Strata when Auditing: An Example using
Excel, Irish Accounting Review, Vol.
14, No. 1, pp.53-65.
·
Gunning,
P. and Horgan, J.M. (2004). A Simple Algorithm for Stratifying Skewed
Populations, Survey Methodology, Vol.
30, No. 2, pp. 177-185.
·
Gunning,
P., Horgan, J.M. and Keogh, G. (2007). An Implementation Strategy
for Efficient Convergence of the Lavallée and Hidiroglou Stratification
Algorithm, Journal of Official Statistics (in press)
·
Gunning,
P., Horgan, J.M and Keogh, G. (2006) Efficient Pareto Stratification, Mathematical Proceedings of the Royal Irish
Academy, 106(A), Vol. 2, pp. 131-138.
·
Gonzalo
M., Barbara G., Mitas G., Passamonti S. (2007) Muestras Equilibradas en
Poblaciones Finitas: Un Estudio Comparativo en Muestras de Explotaciones
Agropecuarias, Undecimas Journadas de Ciencias Economicas y Estadistica, Noviembre
. pp. 4 – 10.
·
Horgan,
J.M (2006). Stratification of Skewed Populations: A Review, International Statistical Review, Vol.
74, 1, pp. 67-76.
·
Horgan,
J.M. (2003). A List-Sequential Sampling Scheme with Applications in Financial
Auditing, IMA: Journal of
Management Mathematics, Vol 14, No.1, pp. 31-48.
·
Horgan,
J.M. (2003). A Criterion for an Efficient Ratio Estimator with Poisson
Sampling, Journal of
Information and Technology, Vol. 2,
pp. 1-18
·
Keskinturk,
T. and Er. S. A., Genetic Algorithm
Approach to Determine Stratum Boundaries and Sample Sizes of Each Stratum in
Stratified Sampling, Computational
Statistics and Data Analysis (to appear).
·
Kozak,
M. (2004). Optimum Allocation using Random Search Methods in Agricultural
Surveys, Statistics in Transition,
Vol. 6, No. 5. pp.797-806.
·
Kozak,
M. and Verma, M.R (2006). Geometric Versus Optimization Approach to
Stratification: A Comparison of
Efficiency, Survey Methodology,
Vol. 32, No. 2, pp.157-183.
·
Lavallée P. and Hidiroglou. M. (1988). On the
Stratification of Skewed Populations, Survey Methodology , Vol. 14, pp. 33-43.
Research Project 2: SSIA: Statistical Strategies in Auditing
In recent years financial auditing
has had to confront larger and larger populations of accounts and more and
more ingenious attempts at fraud. For example the report of the Panel of
Audit Effectiveness (2000) claims it is high time to make today's audits more
effective, and to improve their precision. Over the past few years, several
instances of major misstatements have resulted in massive declines in the
market capitalisation of the affected companies, and significant damage to the
reputation of the auditing profession, whose members have been challenged more
than ever before to produce accurate assessments of risk.
Financial auditing means examining
the financial statements and the recorded transactions of large organisations,
to obtain sufficient evidence to make authoritative statements about the
accuracy of such statements. Often auditors face situations where the data are
too voluminous to be cost-effectively audited, and an examination of all records is impractical. To assess such data it is necessary to
develop strategies to make accurate
estimates of the true amounts that have been misstated based on less than a
complete enumeration. The nature of accounting data, i.e. highly skewed with few but possibly
large errors, means that standard sampling and estimation techniques do not
apply: the usual sampling methods are not efficient, and confidence intervals based on the central limit theorem may have poor
coverage even for large amounts of data, causing the usual estimators of error
based on the classical large-sample normal distribution theory to be unreliable.
In this project we seek to develop
improved sampling and estimation strategies for statistical
auditing. Horgan (1996-2007) has developed new
strategies using the widely
known non-classical Stringer bound for
estimating the true error amount. This bound, though reliable, has been found
to be conservative. In the
literature several modifications have
been proposed (for example, Horgan 1997,
Pap and van Zuijlen 1996), but
none as yet have succeeded in
reducing the conservatism to a satisfactory level. Recently Bentkus,
Geuze and van Zuijlen (2006)have suggested a completely different approach for
obtaining bounds using Hoeffding's
inequalities, and noted that the
approach is relevant to the auditing problem.
They have called for further investigations, with real data and more
practical designs incorporating stratification and probability proportional to
size selection (PPS).Chen, Chen and Rao (2006) and Chen and Qin (2003) have
proposed empirical likelihood
methodology as a possibility for obtaining estimates for populations with low
error rates. While they assumed simple random sampling they illustrated how it
could be extended to PPS and stratified sampling, designs which are
frequently used in auditing. It is these avenues that we intend to
explore to develop new and improved statistical strategies for financial
auditing
References:
·
Bentkus,
V., Gueze, G. D. C., and van Zuijlen,
M. C. A. (2006). Optimal Hoeffding-like Inequalities under a Symmetry
Assumption, Journal of Theoretical and Applied Statistics, Vol. 40, No. 2. pp
159-164.
·
Chen,
J. Chen, S.X. and Rao, J.N.K. (2003). Empirical Likelihood Based Confidence Intervals for the Mean of a
Population Containing Many Zeros, The
Canadian Journal of Statistics, Vol. 31, No. 1, pp. 53-68.
·
Chen,
S.X. and Qin, J. (2006). An Empirical Likelihood Method in Mixture Models with
Incomplete Classifications.Statistica
Sinica, Vol. 65, Issue 1, pp.
1101-1115
·
Gunning,
P. and Horgan, J.M. (2007). Choosing the Strata when Auditing: An Example using
Excel, Irish Accounting Review, Vol.
14, No. 1, pp. 53-65.
·
Gunning,
P. and Horgan, J.M. and Yancey, W. (2004) Geometric Stratification of
Accounting Data, Journal de Contaduria y
Administracion, Vol. 214, pp. 11-21.
·
Horgan,
J.M. (2003). A List-Sequential Sampling Scheme with Applications in Financial
Auditing, IMA: Journal of
Management Mathematics, Vol. 14, No.1, pp. 31-48.
·
Horgan,
J.M. (2003). A Criterion for an Efficient Ratio Estimator with Poisson
Sampling, Journal of Information and Technology, Vol. 2., pp. 1-18
·
Horgan,
J.M. (2001). A Fixed-Sample-Size
Without-Replacement Plan for Substantive Tests, Proc. of Amer, Acc,
Association Western Region Meeting, May, pp. 46-47.
·
Horgan,
J.M. (2000). Auditing Y2K: A Sampling Fig Leaf, Aus. CPA, Nov., pp. 46-47.
·
Horgan,
J.M. (1999). Hand On: Here comes the Bug, Accountancy
Ireland, April, Vol. 31, No. 2, pp. 20-21
·
Horgan,
J.M. (1998). Stabilised Sieve Sampling: A Point Estimator Analysis, Journal of Business and Economic Statistics, American .
Staisticalt. Association, Vol. 16, No 1, pp. 42-51.
·
Horgan,
J.M. (1998). A Comparison of Lahiri Sampling with Traditional MUS Methods, Irish Accounting Review, Spring, Vol. 5, pp. 57-82.
·
Horgan,
J.M. (1997). Stabilising the Sample Size Using PPS, Auditing: A Journal
of Practice and Theory,
American Accounting Association, Vol. 16, No. 2, pp. 40-51.
·
Horgan,
J.M. (1996). The Moment Bound with Unrestricted Random, Cell and Sieve Sampling
of Monetary Units, Journal of Accounting and Business Research, Vol. 26, No. 3,
pp. 215-223.
·
Pap,
G. and van Zuijlen, M.C.A. (1996).On the Asymptotic Behaviour of the Stringer
Bound, Statistica Neerlandica, Vol.
50, No 3, pp. 267-289.
·
Panel
on Audit Effectiveness (2000), Report on
Audit Effectiveness, Public Oversight Board.