Skip to main content

29.03.2024 | Original Article

Convex and nonconvex nonparametric frontier-based classification methods for anomaly detection

verfasst von: Qianying Jin, Kristiaan Kerstens, Ignace Van de Woestyne

Erschienen in: OR Spectrum

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Effective methods for determining the boundary of the normal class are very useful for detecting anomalies in commercial or security applications—a problem known as anomaly detection. This contribution proposes a nonparametric frontier-based classification (NPFC) method for anomaly detection. By relaxing the commonly used convexity assumption in the literature, a nonconvex-NPFC method is constructed and the nonconvex nonparametric frontier turns out to provide a more conservative boundary enveloping the normal class. By reflecting on the monotonic relation between the characteristic variables and the membership, the proposed NPFC method is in a more general form since both input-like and output-like characteristic variables are incorporated. In addition, by allowing some of the training observations to be misclassified, the convex- and nonconvex-NPFC methods are extended from a hard nonparametric frontier to a soft one, which also provides a more conservative boundary enclosing the normal class. Both simulation studies and a real-life data set are used to evaluate and compare the proposed NPFC methods to some well-established methods in the literature. The results show that the proposed NPFC methods have competitive classification performance and have consistent advantages in detecting abnormal samples, especially the nonconvex-NPFC methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
The data set is collected from the Statlib data archive at: http://lib.stat.cmu.edu/datasets/
 
Literatur
Zurück zum Zitat Ahmed M, Mahmood AN, Islam MR (2016) A survey of anomaly detection techniques in financial domain. Futur Gener Comput Syst 55:278–288CrossRef Ahmed M, Mahmood AN, Islam MR (2016) A survey of anomaly detection techniques in financial domain. Futur Gener Comput Syst 55:278–288CrossRef
Zurück zum Zitat Akbani R, Kwek S, Japkowicz N (2004) Applying support vector machines to imbalanced datasets. In: Boulicaut J, Esposito F, Giannotti F, Pedreschi D (eds) Machine learning: ECML 2004. Springer, Berlin, pp 39–50CrossRef Akbani R, Kwek S, Japkowicz N (2004) Applying support vector machines to imbalanced datasets. In: Boulicaut J, Esposito F, Giannotti F, Pedreschi D (eds) Machine learning: ECML 2004. Springer, Berlin, pp 39–50CrossRef
Zurück zum Zitat Al-Hashedi KG, Magalingam P (2021) Financial fraud detection applying data mining techniques: a comprehensive review from 2009 to 2019. Comput Sci Rev 40:100402CrossRef Al-Hashedi KG, Magalingam P (2021) Financial fraud detection applying data mining techniques: a comprehensive review from 2009 to 2019. Comput Sci Rev 40:100402CrossRef
Zurück zum Zitat Alam S, Sonbhadra SK, Agarwal S, Nagabhushan P (2020) One-class support vector classifiers: a survey. Knowl-Based Syst 196:105754CrossRef Alam S, Sonbhadra SK, Agarwal S, Nagabhushan P (2020) One-class support vector classifiers: a survey. Knowl-Based Syst 196:105754CrossRef
Zurück zum Zitat Andersen P, Petersen NC (1993) A procedure for ranking efficient units in data envelopment analysis. Manage Sci 39(10):1261–1264CrossRef Andersen P, Petersen NC (1993) A procedure for ranking efficient units in data envelopment analysis. Manage Sci 39(10):1261–1264CrossRef
Zurück zum Zitat Aparicio J, Esteve M, Rodriguez-Sala JJ, Zofio JL (2021) The estimation of productive efficiency through machine learning techniques: efficiency analysis trees. In: Zhu J, Charles V (eds) Data-enabled analytics: DEA for big data. Springer, Cham, pp 51–92CrossRef Aparicio J, Esteve M, Rodriguez-Sala JJ, Zofio JL (2021) The estimation of productive efficiency through machine learning techniques: efficiency analysis trees. In: Zhu J, Charles V (eds) Data-enabled analytics: DEA for big data. Springer, Cham, pp 51–92CrossRef
Zurück zum Zitat Banker RD, Charnes A, Cooper WW (1984) Some models for estimating technical and scale inefficiencies in data envelopment analysis. Manage Sci 30(9):1078–1092CrossRef Banker RD, Charnes A, Cooper WW (1984) Some models for estimating technical and scale inefficiencies in data envelopment analysis. Manage Sci 30(9):1078–1092CrossRef
Zurück zum Zitat Bhattacharyya S, Jha S, Tharakunnel K, Westland JC (2011) Data mining for credit card fraud: a comparative study. Decis Support Syst 50(3):602–613CrossRef Bhattacharyya S, Jha S, Tharakunnel K, Westland JC (2011) Data mining for credit card fraud: a comparative study. Decis Support Syst 50(3):602–613CrossRef
Zurück zum Zitat Briec W (1997) A graph-type extension of Farrell technical efficiency measure. J Prod Anal 8(1):95–110CrossRef Briec W (1997) A graph-type extension of Farrell technical efficiency measure. J Prod Anal 8(1):95–110CrossRef
Zurück zum Zitat Briec W, Kerstens K, Van de Woestyne I (2016) Congestion in production correspondences. J Econ 119(1):65–90CrossRef Briec W, Kerstens K, Van de Woestyne I (2016) Congestion in production correspondences. J Econ 119(1):65–90CrossRef
Zurück zum Zitat Briec W, Kerstens K, Van de Woestyne I (2018) Hypercongestion in production correspondences: an empirical exploration. Appl Econ 50(27):2938–2956CrossRef Briec W, Kerstens K, Van de Woestyne I (2018) Hypercongestion in production correspondences: an empirical exploration. Appl Econ 50(27):2938–2956CrossRef
Zurück zum Zitat Chambers R, Chung Y, Färe R (1998) Profit, directional distance functions, and nerlovian efficiency. J Optim Theory Appl 98(2):351–364CrossRef Chambers R, Chung Y, Färe R (1998) Profit, directional distance functions, and nerlovian efficiency. J Optim Theory Appl 98(2):351–364CrossRef
Zurück zum Zitat Chang D, Kuo Y (2008) An Approach for the two-group discriminant analysis: an application of DEA. Math Comput Model 47(9–10):970–981CrossRef Chang D, Kuo Y (2008) An Approach for the two-group discriminant analysis: an application of DEA. Math Comput Model 47(9–10):970–981CrossRef
Zurück zum Zitat Cherchye L, Kuosmanen T, Post T (2001) FDH directional distance functions with an application to European commercial banks. J Prod Anal 15(3):201–215CrossRef Cherchye L, Kuosmanen T, Post T (2001) FDH directional distance functions with an application to European commercial banks. J Prod Anal 15(3):201–215CrossRef
Zurück zum Zitat Clifton L, Clifton DA, Zhang Y, Watkinson P, Tarassenko L, Yin H (2014) Probabilistic novelty detection with support vector machines. IEEE Trans Reliab 63(2):455–467CrossRef Clifton L, Clifton DA, Zhang Y, Watkinson P, Tarassenko L, Yin H (2014) Probabilistic novelty detection with support vector machines. IEEE Trans Reliab 63(2):455–467CrossRef
Zurück zum Zitat Cox LH, Johnson MM, Kafadar K (1982) Exposition of statistical graphics technology. In: Proceedings of the statistical computation section, American Statistical Association, Washington, D.C, pp 55–56 Cox LH, Johnson MM, Kafadar K (1982) Exposition of statistical graphics technology. In: Proceedings of the statistical computation section, American Statistical Association, Washington, D.C, pp 55–56
Zurück zum Zitat da Silva DB, Schmidt D, da Costa CA, da Rosa Righi R, Eskofier B (2021) Deepsigns: a predictive model based on deep learning for the early detection of patient health deterioration. Expert Syst Appl 165:113905CrossRef da Silva DB, Schmidt D, da Costa CA, da Rosa Righi R, Eskofier B (2021) Deepsigns: a predictive model based on deep learning for the early detection of patient health deterioration. Expert Syst Appl 165:113905CrossRef
Zurück zum Zitat Deprins D, Simar L, Tulkens H (1984) Measuring labor efficiency in post offices. In: Marchand M, Pestieau P, Tulkens H (eds) The performance of public enterprises: concepts and measurements. North Holland, Amsterdam, pp 243–268 Deprins D, Simar L, Tulkens H (1984) Measuring labor efficiency in post offices. In: Marchand M, Pestieau P, Tulkens H (eds) The performance of public enterprises: concepts and measurements. North Holland, Amsterdam, pp 243–268
Zurück zum Zitat El-Yaniv R, Nisenson M (2006) Optimal single-class classification strategies. Adv Neural Inform Process Syst 19 El-Yaniv R, Nisenson M (2006) Optimal single-class classification strategies. Adv Neural Inform Process Syst 19
Zurück zum Zitat Emrouznejad A, Banker RD, Neralic L (2019) Advances in data envelopment analysis: celebrating the 40th anniversary of DEA and the 100th anniversary of professor Abraham Charnes, Birthday. Eur J Op Res 278(2):365–367CrossRef Emrouznejad A, Banker RD, Neralic L (2019) Advances in data envelopment analysis: celebrating the 40th anniversary of DEA and the 100th anniversary of professor Abraham Charnes, Birthday. Eur J Op Res 278(2):365–367CrossRef
Zurück zum Zitat Emrouznejad A, Yang G-L (2018) A survey and analysis of the first 40 years of scholarly literature in DEA: 1978–2016. Socioecon Plann Sci 61:4–8CrossRef Emrouznejad A, Yang G-L (2018) A survey and analysis of the first 40 years of scholarly literature in DEA: 1978–2016. Socioecon Plann Sci 61:4–8CrossRef
Zurück zum Zitat Esteve M, Aparicio J, Rodriguez-Sala JJ, Zhu J (2023) Random forests and the measurement of super-efficiency in the context of free disposal hull. Eur J Oper Res 304(2):729–744CrossRef Esteve M, Aparicio J, Rodriguez-Sala JJ, Zhu J (2023) Random forests and the measurement of super-efficiency in the context of free disposal hull. Eur J Oper Res 304(2):729–744CrossRef
Zurück zum Zitat He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284CrossRef He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284CrossRef
Zurück zum Zitat Jiang T, Li Y, Xie W, Du Q (2020) Discriminative reconstruction constrained generative adversarial network for hyperspectral anomaly detection. IEEE Trans Geosci Remote Sens 58(7):4666–4679CrossRef Jiang T, Li Y, Xie W, Du Q (2020) Discriminative reconstruction constrained generative adversarial network for hyperspectral anomaly detection. IEEE Trans Geosci Remote Sens 58(7):4666–4679CrossRef
Zurück zum Zitat Kaffash S, Azizi R, Huang Y, Zhu J (2020) A survey of data envelopment analysis applications in the insurance industry 1993–2018. Eur J Oper Res 284(3):801–813CrossRef Kaffash S, Azizi R, Huang Y, Zhu J (2020) A survey of data envelopment analysis applications in the insurance industry 1993–2018. Eur J Oper Res 284(3):801–813CrossRef
Zurück zum Zitat Kerstens K, Sadeghi J, Toloo M, Van de Woestyne I (2022) Procedures for ranking technical and cost efficient units: with a focus on nonconvexity. Eur J Oper Res 300(1):269–281CrossRef Kerstens K, Sadeghi J, Toloo M, Van de Woestyne I (2022) Procedures for ranking technical and cost efficient units: with a focus on nonconvexity. Eur J Oper Res 300(1):269–281CrossRef
Zurück zum Zitat Kerstens K, Van de Woestyne I (2011) Negative data in DEA: a simple proportional distance function approach. J Op Res Soc 62(7):1413–1419CrossRef Kerstens K, Van de Woestyne I (2011) Negative data in DEA: a simple proportional distance function approach. J Op Res Soc 62(7):1413–1419CrossRef
Zurück zum Zitat Khan SS, Madden MG (2014) One-class classification: taxonomy of study and review of techniques. Knowl Eng Rev 29(3):345–374CrossRef Khan SS, Madden MG (2014) One-class classification: taxonomy of study and review of techniques. Knowl Eng Rev 29(3):345–374CrossRef
Zurück zum Zitat Kim J-H (2009) Estimating classification error rate: repeated cross-validation, repeated hold-out and bootstrap. Comput Stat Data Anal 53(11):3735–3745CrossRef Kim J-H (2009) Estimating classification error rate: repeated cross-validation, repeated hold-out and bootstrap. Comput Stat Data Anal 53(11):3735–3745CrossRef
Zurück zum Zitat Leon CF, Palacios F (2009) Evaluation of rejected cases in an acceptance system with data envelopment analysis and goal programming. J Op Res Soc 60(10):1411–1420CrossRef Leon CF, Palacios F (2009) Evaluation of rejected cases in an acceptance system with data envelopment analysis and goal programming. J Op Res Soc 60(10):1411–1420CrossRef
Zurück zum Zitat Lovell K, Pastor JT (1999) Radial DEA models without inputs or without outputs. Eur J Oper Res 118(1):46–51CrossRef Lovell K, Pastor JT (1999) Radial DEA models without inputs or without outputs. Eur J Oper Res 118(1):46–51CrossRef
Zurück zum Zitat Marcot BG, Hanea AM (2021) What is an optimal value of k in k-fold cross-validation in discrete Bayesian network analysis? Comput Stat 36(3):2009–2031CrossRef Marcot BG, Hanea AM (2021) What is an optimal value of k in k-fold cross-validation in discrete Bayesian network analysis? Comput Stat 36(3):2009–2031CrossRef
Zurück zum Zitat Markou M, Singh S (2003) Novelty detection: A review-part 1: statistical approaches. Signal Process 83(12):2481–2497CrossRef Markou M, Singh S (2003) Novelty detection: A review-part 1: statistical approaches. Signal Process 83(12):2481–2497CrossRef
Zurück zum Zitat Markou M, Singh S (2003) Novelty detection: a review-part 2: neural network based approaches. Signal Process 83(12):2499–2521CrossRef Markou M, Singh S (2003) Novelty detection: a review-part 2: neural network based approaches. Signal Process 83(12):2499–2521CrossRef
Zurück zum Zitat Moya MM, Koch MW, Hostetler LD (1993) One-class classifier networks for target recognition applications. In: World congress on neural networks, vol 3. Lawrence Erlbaum Associates, Portland, pp 797–801 Moya MM, Koch MW, Hostetler LD (1993) One-class classifier networks for target recognition applications. In: World congress on neural networks, vol 3. Lawrence Erlbaum Associates, Portland, pp 797–801
Zurück zum Zitat Noble WS (2006) What is a support vector machine? Nat Biotechnol 24(12):1565–1567CrossRef Noble WS (2006) What is a support vector machine? Nat Biotechnol 24(12):1565–1567CrossRef
Zurück zum Zitat Pang G, Shen C, Cao L, Hengel AVD (2021) Deep learning for anomaly detection: a review. ACM Comput Surv 54(2):1–38CrossRef Pang G, Shen C, Cao L, Hengel AVD (2021) Deep learning for anomaly detection: a review. ACM Comput Surv 54(2):1–38CrossRef
Zurück zum Zitat Park C, Huang JZ, Ding Y (2010) A computable plug-in estimator of minimum volume sets for novelty detection. Oper Res 58(5):1469–1480CrossRef Park C, Huang JZ, Ding Y (2010) A computable plug-in estimator of minimum volume sets for novelty detection. Oper Res 58(5):1469–1480CrossRef
Zurück zum Zitat Pendharkar P, Khosrowpour M, Rodger J (2000) Application of Bayesian network classifiers and data envelopment analysis for mining breast cancer patterns. J. Comput. Inform. Syst. 40(4):127–132 Pendharkar P, Khosrowpour M, Rodger J (2000) Application of Bayesian network classifiers and data envelopment analysis for mining breast cancer patterns. J. Comput. Inform. Syst. 40(4):127–132
Zurück zum Zitat Pendharkar P, Rodger J, Yaverbaum G (1999) Association, statistical, mathematical and neural approaches for mining breast cancer patterns. Expert Syst Appl 17(3):223–232CrossRef Pendharkar P, Rodger J, Yaverbaum G (1999) Association, statistical, mathematical and neural approaches for mining breast cancer patterns. Expert Syst Appl 17(3):223–232CrossRef
Zurück zum Zitat Pendharkar PC (2002) A potential use of data envelopment analysis for the inverse classification problem. Omega 30(3):243–248CrossRef Pendharkar PC (2002) A potential use of data envelopment analysis for the inverse classification problem. Omega 30(3):243–248CrossRef
Zurück zum Zitat Qiu K (2022) Support Vector Data Description (SVDD Version 2.2), https://github.com/iqiukp/SVDD–MATLAB Qiu K (2022) Support Vector Data Description (SVDD Version 2.2), https://​github.​com/​iqiukp/​SVDD–MATLAB
Zurück zum Zitat Ruff L, Kauffmann JR, Vandermeulen RA, Montavon G, Samek W, Kloft M, Dietterich TG, Müller K-R (2021) A unifying review of deep and shallow anomaly detection. Proc IEEE 109(5):756–795CrossRef Ruff L, Kauffmann JR, Vandermeulen RA, Montavon G, Samek W, Kloft M, Dietterich TG, Müller K-R (2021) A unifying review of deep and shallow anomaly detection. Proc IEEE 109(5):756–795CrossRef
Zurück zum Zitat Schölkopf B, Williamson RC, Smola A, Shawe-Taylor J, Platt J (1999) Support vector method for novelty detection. In: Solla A, Müller K-R, Leen TK (eds) Advances in neural information processing systems, vol 12. MIT Press, Cambridge, Massachusetts, pp 582–588 Schölkopf B, Williamson RC, Smola A, Shawe-Taylor J, Platt J (1999) Support vector method for novelty detection. In: Solla A, Müller K-R, Leen TK (eds) Advances in neural information processing systems, vol 12. MIT Press, Cambridge, Massachusetts, pp 582–588
Zurück zum Zitat Seiford L, Zhu J (1998) An acceptance system decision rule with data envelopment analysis. Comput Op Res 25(4):329–332CrossRef Seiford L, Zhu J (1998) An acceptance system decision rule with data envelopment analysis. Comput Op Res 25(4):329–332CrossRef
Zurück zum Zitat Sueyoshi T (2006) DEA-discriminant analysis: methodological comparison among eight discriminant analysis approaches. Eur J Oper Res 169(1):247–272CrossRef Sueyoshi T (2006) DEA-discriminant analysis: methodological comparison among eight discriminant analysis approaches. Eur J Oper Res 169(1):247–272CrossRef
Zurück zum Zitat Tax DMJ, Duin RPW (1999) Support vector domain description. Pattern Recogn Lett 20(11–13):1191–1199CrossRef Tax DMJ, Duin RPW (1999) Support vector domain description. Pattern Recogn Lett 20(11–13):1191–1199CrossRef
Zurück zum Zitat Tax DMJ, Juszczak P (2003) Kernel whitening for one-class classification. Int J Pattern Recognit Artif Intell 17(3):333–347CrossRef Tax DMJ, Juszczak P (2003) Kernel whitening for one-class classification. Int J Pattern Recognit Artif Intell 17(3):333–347CrossRef
Zurück zum Zitat Troutt M, Rai A, Zhang A (1996) The potential use of DEA for credit applicant acceptance systems. Comput Op Res 23(4):405–408CrossRef Troutt M, Rai A, Zhang A (1996) The potential use of DEA for credit applicant acceptance systems. Comput Op Res 23(4):405–408CrossRef
Zurück zum Zitat Valero-Carreras D, Aparicio J, Guerrero NM (2021) Support vector frontiers: a new approach for estimating production functions through support vector machines. Omega 104:102490CrossRef Valero-Carreras D, Aparicio J, Guerrero NM (2021) Support vector frontiers: a new approach for estimating production functions through support vector machines. Omega 104:102490CrossRef
Zurück zum Zitat Wu J, An Q, Liang L (2011) A modified super-efficiency DEA approach for solving multi-groups classification problems. Int J Comput Intell Syst 4(4):606–618 Wu J, An Q, Liang L (2011) A modified super-efficiency DEA approach for solving multi-groups classification problems. Int J Comput Intell Syst 4(4):606–618
Zurück zum Zitat Yan H, Wei Q (2011) Data envelopment analysis classification machine. Inf Sci 181(22):5029–5041CrossRef Yan H, Wei Q (2011) Data envelopment analysis classification machine. Inf Sci 181(22):5029–5041CrossRef
Zurück zum Zitat Zhu N, Zhu C, Emrouznejad A (2021) A combined machine learning algorithms and DEA method for measuring and predicting the efficiency of chinese manufacturing listed companies. J Manage Sci Eng 6(4):435–448 Zhu N, Zhu C, Emrouznejad A (2021) A combined machine learning algorithms and DEA method for measuring and predicting the efficiency of chinese manufacturing listed companies. J Manage Sci Eng 6(4):435–448
Metadaten
Titel
Convex and nonconvex nonparametric frontier-based classification methods for anomaly detection
verfasst von
Qianying Jin
Kristiaan Kerstens
Ignace Van de Woestyne
Publikationsdatum
29.03.2024
Verlag
Springer Berlin Heidelberg
Erschienen in
OR Spectrum
Print ISSN: 0171-6468
Elektronische ISSN: 1436-6304
DOI
https://doi.org/10.1007/s00291-024-00751-5