Preview

Administrative Consulting

Advanced search

Hierarchical clustering methods in a task to find abnormal observations based on groups with broken symmetry

https://doi.org/10.22394/1726-1139-2020-5-116-127

Abstract

The work is aimed at solving the actual problem of identification and interpretation of anomalous observations in the study of socio-economic processes. The proposed method is based on the use of a cluster approach to detecting anomalous observations. Clustering is performed using hierarchical methods, which are a set of data ordering algorithms aimed at creating dendrograms consisting of groups of observed points. In the case of mixed data consisting of numeric and categorical variables, it is proposed to use the Gower distance as a metric for distances between elements. Clustering quality is evaluated based on the sum of squares of metric distances between objects within the cluster and the average width of the silhouette. These indicators allow you to select the optimal number of clusters and evaluate the quality of the split results. The dendrogram can be used to study the symmetry groups of cluster systems and the causes of symmetry breaking. Anomaly detection is performed by analyzing the results of hierarchical clustering and identifying branches of the dendrogram that are located at the initial levels of tree construction and do not have branches. The implemented method makes it possible to more accurately interpret the results of clustering with respect to determining errors of the first and second kind in the form of anomalous observations in the data set. Using the described method, it is possible to effectively investigate socio-economic systems and manage their development.

About the Authors

A. N. Kislyakov
Russian Presidential Academy of National Economy and Public Administration (Vladimir Branch)
Russian Federation

Associate Professor of the Chair of Information Technology of Vladimir Branch of RANEPA, PhD in Technical Science

Vladimir



S. V. Polyakov
Russian Presidential Academy of National Economy and Public Administration (Vladimir Branch)
Russian Federation

Associate Professor of the Chair of Information Technology of Vladimir Branch of RANEPA, PhD in Technical Science

Vladimir



References

1. Barsky M. E., Shikov A. N. Research of the algorithm for finding anomalies of isolation forest // Fundamental and applied scientific research: topical issues, achievements and innovations. collection of articles of the XXIII International Scientific and Practical Conference on May 5, 2019 Penza: Science and Enlightenment (IP Gulyaev G. Y.), 2019. P. 113–117. (In rus)

2. Belotserkovskaya M. G. Clustering of client base of loyalty program participants // Moscow Economic Journal [Moskovskii ekonomicheskii zhurnal]. 2017. № 2. P. 112–119. (In rus)

3. Galyamova A. F., Tarhov S. V. Management of interaction with clients of commercial organization on the basis of methods of segmentation and clustering of client base // Journal of USATU [Vestnik UGATU]. 2014. V. 18, No. 4 (65). P. 149–156. (In rus)

4. Kislyakov A. N. Intelligent analysis of consumer demand in conditions of information asymmetry // Modern economy: problems and solutions [Sovremennaya ekonomika: problemy i resheniya]. 2019. № 10 (118). P. 8–17. (In rus)

5. Kislyakov A. N. Tikhonyuk N. E. Model of homogeneous market pricing taking into account asymmetric information // Innovative development of the economy [Innovatsionnoe razvitie ekonomiki]. 2019. № 1. P. 93–100. (In rus)

6. Kislyakov A. N. Methods and Tools of Data Analysis in Economy and Management: Educational and Methodological Manual. Vladimir : RANEPA Vladimir branch, 2019. 161 p. (In rus)

7. Polyakov S. V., Kislyakov A. N. Basics of mathematical modeling of socio-economic processes: educational and methodological manual. Vladimir : RANEPA Vladimir branch, 2017, 269 p. (In rus)

8. Rau V. G., Kislyakov A. N., Tikhonyuk N. E., Rau T. F. Principle of violation of asymmetry in models of economic systems development experience and problems // Regional economy: experience and problems. Materials of the XI International Scientific and Practical Conference (Gutman Readings) on May 15, 2018 / under general ed. of A. I. Novikov and A. E. Illarionov. Vladimir: RANEPA Vladimir branch, 2018. P. 201–211. (In rus)

9. Rau V. G., Polyakov S. V., Rau T. F., Firtsov I. V., Togunov I. A. Some features of using groups of broken symmetry to “visualize” processes in natural, “living” and socio-economic systems // Regional economy: experience and problems. Materials of the XII International Scientific and Practical Conference (Gutman Readings) May 15, 2019 / under the general ed. of A. I. Novikov and A. E. Illarionov. Vladimir: RANEPA Vladimir branch, 2019. P. 111–119. (In rus)

10. Tikhonyuk N. E., Kislyakov A. N. Economic models of work with asymmetry of information: evolution of approaches // Regional economy: experience and problems. Materials of the XI International Scientific and Practical Conference (Gutman Readings) on May 15, 2018 /under general ed. of A. I. Novikov and A. E. Illarionov. Vladimir : RANEPA Vladimir branch, 2018. P. 236–244. (In rus)

11. Yakimov V. N., Shurganova G. V., Cherepennikov V. V., Kudrin I. A., Ilin M. Y. Methods of comparative assessment of results of cluster analysis of hydrobiocenosis structure (on the example of zooplankton of Linda River of Nizhny Novgorod region) // Biology of internal waters [Biologiya vnutrennikh vod]. 2016. № 2. P. 94–103. (In rus)

12. Alboukadel K. Practical Guide to Cluster Analysis in R. Unsupervised Machine Learning (Multivariate Analysis). Vol. 1. 1st ed. / Publisher: CreateSpace Independent Publishing Platform, 2017.

13. Murtagh F., Contreras P. Methods of Hierarchical Clustering // Computing Research Repository — CORR, 2011.

14. Nielsen F. Introduction to HPC with MPI for Data Science // Springer International Publishing, 2016.

15. Gareth J., Witten D., Hastie T., Tibshirani R. An Introduction to Statistical Learning with Applications in R / Publisher: Springer, 2013.

16. Hastie T., Tibshirani R., Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Second Edition. Springer, 2017.

17. Tripathi Sh., Bhardwaj A., Poovammal E. Approaches to Clustering in Customer Segmentation // International Journal of Engineering &Technology, 2018, N 7 (3.12). P. 802–807.


Review

For citations:


Kislyakov A.N., Polyakov S.V. Hierarchical clustering methods in a task to find abnormal observations based on groups with broken symmetry. Administrative Consulting. 2020;(5):116-127. (In Russ.) https://doi.org/10.22394/1726-1139-2020-5-116-127

Views: 491


ISSN 1726-1139 (Print)
ISSN 1816-8590 (Online)