Research Project 1:  Extending the Geometric Stratification Method

 

In spite of  the explosion in electronic data collection, sampling is still important today;  if anything it is even more important than heretofore. The sheer volume of the data collected has remained tantalisingly ahead of computer capacity: for example in data mining, the fundamental challenge is to mine sets which are so large that they do not fit into a  computer memory. Data mining has a wide variety of applications ranging from predicting consumer behaviour to identifying fraudulent credit card transactions. The time is ripe to invest in research in this area, and not to lose the initiative to elsewhere.

 

Rarely is a survey carried out without stratification, where a population is  divided into a set of homogeneous subsets  called strata, and independent samples are taken from each stratum. The delineation  of stratum boundaries  is a problem that has eluded researchers for  decades, even a collosus like Cochran. It was  first addressed  fifty years ago, but an objective and practicable solution could not be found,  that is until Gunning and Horgan (2004)  developed geometric stratification, a surprisingly easy method of obtaining definitive optimum boundaries. 

 

While the geometric method is beginning to replace the older more complicated methods for obtaining stratification boundaries (see for example Gonzalo et al. 2007, and Keslinturk and Er 2007 and Kozak and Verma 2006), recent research on the geometric method has thrown up some controversy, and highlighted some potential limitations. For example Kozak and Verma (2006), while acknowledging that the geometric method is very simple compared to the random search method of Kozak (2004) and the iterative procedure of Lavallee and Hidiroglou (1988), have challenged its optimality, and called for further studies on real skewed populations.  They have also pointed out that the geometric method does not take into account the take-all stratum, which is usually constructed when a positively skewed population is  stratified.  Keskinturk and Er  (2007)  have observed that, with the geometric method, some strata may be empty, and/or the sample sizes from some strata may  be greater than their population sizes.

 

It is these issues that will be addressed in this project.  We intend to modify and extend the geometric stratification method so that it provides an  optimum take-all stratum, and guarantees that the remaining strata are non-empty, and that samples sizes from each of these strata do not  exceed the stratum sizes.

 

References:

 

·        Gunning, J. and Horgan, J.M. (2007). Improving the Lavallee and Hidiroglou Algorithm for Stratification of Skewed Populations,   Journal of Statistical Computation and Simulation,  Vol. 77,    4   pp. 277 - 291.

·        Gunning, P. and Horgan, J.M. (2007). Choosing the Strata when Auditing: An Example using Excel, Irish Accounting Review, Vol. 14, No. 1, pp.53-65.

·        Gunning, P. and Horgan, J.M. (2004). A Simple Algorithm for Stratifying Skewed Populations,  Survey Methodology,   Vol. 30, No. 2, pp. 177-185.

·        Gunning, P., Horgan, J.M.  and  Keogh, G. (2007). An Implementation Strategy for Efficient Convergence of the Lavallée and Hidiroglou Stratification Algorithm, Journal of Official Statistics (in press)

·        Gunning, P., Horgan, J.M and Keogh, G. (2006) Efficient Pareto Stratification, Mathematical Proceedings of the Royal Irish Academy, 106(A), Vol. 2, pp. 131-138.

·        Gonzalo M., Barbara G., Mitas G., Passamonti S. (2007) Muestras Equilibradas en Poblaciones Finitas: Un Estudio Comparativo en Muestras de Explotaciones Agropecuarias,  Undecimas Journadas de Ciencias Economicas y Estadistica, Noviembre . pp. 4 – 10.

·        Horgan, J.M (2006). Stratification of Skewed Populations: A Review, International Statistical Review, Vol. 74, 1, pp. 67-76.

·        Horgan, J.M. (2003). A List-Sequential Sampling Scheme with Applications in Financial Auditing, IMA: Journal  of  Management Mathematics, Vol 14, No.1, pp. 31-48.

·        Horgan, J.M. (2003). A Criterion for an Efficient Ratio Estimator with Poisson Sampling,  Journal  of Information  and Technology, Vol. 2, pp. 1-18

·        Keskinturk, T. and Er. S. A.,  Genetic Algorithm Approach to Determine Stratum Boundaries and Sample Sizes of Each Stratum in Stratified Sampling, Computational Statistics and Data Analysis (to appear).

·        Kozak, M. (2004). Optimum Allocation using Random Search Methods in Agricultural Surveys, Statistics in Transition, Vol. 6, No. 5. pp.797-806.

·        Kozak, M. and Verma, M.R (2006). Geometric Versus Optimization Approach to Stratification: A Comparison of  Efficiency, Survey Methodology, Vol. 32, No. 2, pp.157-183.

·         Lavallée P. and Hidiroglou. M. (1988). On the Stratification of Skewed Populations,  Survey Methodology , Vol. 14, pp. 33-43.

 

 

Research Project 2:  SSIA: Statistical Strategies in Auditing

 

In recent years financial auditing has had to confront larger and larger populations of  accounts and more and  more ingenious attempts at fraud. For example the report of the Panel of Audit Effectiveness (2000) claims it is high time to make today's audits more effective, and to improve their precision. Over the past few years, several instances of major misstatements have resulted in massive declines in the market capitalisation of the affected companies, and significant damage to the reputation of the auditing profession, whose members have been challenged more than ever before to produce accurate assessments of risk.

 

Financial auditing means examining the financial statements and the recorded transactions of large organisations, to obtain sufficient evidence to make authoritative statements about the accuracy of such statements. Often auditors face situations where the data are too  voluminous  to be cost-effectively audited, and an  examination of all records is impractical.  To assess such data it is necessary to develop  strategies to make accurate estimates of the true amounts that have been misstated based on less than a complete enumeration. The nature of accounting data, i.e.  highly skewed  with  few but possibly large errors, means that standard sampling and estimation techniques do not apply: the usual sampling methods are not efficient, and  confidence intervals based on   the central limit theorem may have poor coverage even for large amounts of data, causing the usual estimators of error based on the classical large-sample normal distribution theory to be  unreliable.

 

In this project we seek to develop improved  sampling and  estimation strategies for statistical auditing. Horgan (1996-2007) has developed new  strategies  using the widely known non-classical  Stringer bound for estimating the true error amount. This bound, though reliable, has been found to be  conservative. In the literature  several modifications have been proposed (for example, Horgan 1997,  Pap and van Zuijlen 1996), but  none as yet have succeeded in  reducing the conservatism to a satisfactory level. Recently Bentkus, Geuze and van Zuijlen (2006)have suggested a completely different approach for obtaining  bounds using Hoeffding's inequalities, and noted that  the approach is relevant to the auditing problem.  They have called for further investigations, with real data and more practical designs incorporating stratification and probability proportional to size selection (PPS).Chen, Chen and Rao (2006) and Chen and Qin (2003) have proposed  empirical likelihood methodology as a possibility for obtaining estimates for populations with low error rates. While they assumed simple random sampling they illustrated how it could be extended to PPS and stratified sampling,  designs which are  frequently used in auditing. It is these avenues that we intend to explore to develop new and improved statistical strategies for financial auditing

 

 References:

 

·        Bentkus, V., Gueze, G. D. C.,  and van Zuijlen, M. C. A. (2006). Optimal Hoeffding-like Inequalities under a Symmetry Assumption,  Journal of Theoretical and Applied Statistics, Vol. 40, No. 2. pp 159-164.

·        Chen, J. Chen, S.X. and Rao, J.N.K. (2003). Empirical Likelihood  Based Confidence Intervals for the Mean of a Population Containing Many Zeros, The Canadian Journal of Statistics, Vol. 31, No. 1, pp. 53-68.

·        Chen, S.X. and Qin, J. (2006). An Empirical Likelihood Method in Mixture Models with Incomplete Classifications.Statistica Sinica, Vol.  65, Issue 1, pp. 1101-1115

·        Gunning, P. and Horgan, J.M. (2007). Choosing the Strata when Auditing: An Example using Excel, Irish Accounting Review, Vol. 14, No. 1, pp. 53-65.

·        Gunning, P. and Horgan, J.M. and Yancey, W. (2004) Geometric Stratification of Accounting Data, Journal de Contaduria y Administracion, Vol. 214, pp. 11-21.

·        Horgan, J.M. (2003). A List-Sequential Sampling Scheme with Applications in Financial Auditing, IMA: Journal  of  Management Mathematics, Vol. 14, No.1, pp. 31-48.

·        Horgan, J.M. (2003). A Criterion for an Efficient Ratio Estimator with Poisson Sampling, Journal  of Information  and Technology, Vol. 2., pp. 1-18

·        Horgan, J.M. (2001).  A Fixed-Sample-Size Without-Replacement Plan for Substantive Tests,  Proc. of Amer, Acc, Association Western Region Meeting, May, pp. 46-47.

·        Horgan, J.M. (2000). Auditing Y2K: A Sampling Fig Leaf, Aus. CPA, Nov., pp. 46-47.

·        Horgan, J.M. (1999). Hand On: Here comes the Bug, Accountancy Ireland, April, Vol. 31, No. 2, pp. 20-21

·        Horgan, J.M. (1998). Stabilised Sieve Sampling: A Point Estimator Analysis,  Journal  of  Business and Economic Statistics, American . Staisticalt. Association, Vol. 16, No 1, pp. 42-51.

·        Horgan, J.M. (1998). A Comparison of Lahiri Sampling with Traditional MUS Methods, Irish Accounting  Review, Spring, Vol. 5, pp. 57-82.

·        Horgan, J.M. (1997). Stabilising the Sample Size Using PPS, Auditing: A Journal  of  Practice and Theory, American  Accounting  Association, Vol. 16, No. 2,  pp. 40-51.

·        Horgan, J.M. (1996). The Moment Bound with Unrestricted Random, Cell and Sieve Sampling of Monetary Units,  Journal  of Accounting  and Business Research, Vol. 26, No. 3, pp. 215-223.

·        Pap, G. and van Zuijlen, M.C.A. (1996).On the Asymptotic Behaviour of the Stringer Bound, Statistica Neerlandica, Vol. 50, No 3, pp. 267-289.

·        Panel on Audit Effectiveness (2000), Report on Audit Effectiveness, Public Oversight Board.