Advances in K-means Clustering: a Data Mining Thinking by Junjie Wu

By Junjie Wu

Nearly we all know K-means set of rules within the fields of information mining and enterprise intelligence. however the ever-emerging facts with tremendous complex features convey new demanding situations to this "old" set of rules. This publication addresses those demanding situations and makes novel contributions in constructing theoretical frameworks for K-means distances and K-means established consensus clustering, deciding upon the "dangerous" uniform impression and zero-value problem of K-means, adapting correct measures for cluster validity, and integrating K-means with SVMs for infrequent classification research. This e-book not just enriches the clustering and optimization theories, but in addition presents solid suggestions for the sensible use of K-means, in particular for very important projects akin to community intrusion detection and credits fraud prediction. The thesis on which this e-book is predicated has received the "2010 nationwide first-class Doctoral Dissertation Award", the top honor for no more than a hundred PhD theses according to 12 months in China.

Show description

Read Online or Download Advances in K-means Clustering: a Data Mining Thinking PDF

Best data mining books

Data Mining in Agriculture (Springer Optimization and Its Applications)

Data Mining in Agriculture represents a entire attempt to supply graduate scholars and researchers with an analytical textual content on info mining suggestions utilized to agriculture and environmental comparable fields. This e-book provides either theoretical and functional insights with a spotlight on featuring the context of every information mining method relatively intuitively with abundant concrete examples represented graphically and with algorithms written in MATLAB®.

Data Mining: Foundations and Practice

This e-book comprises worthwhile experiences in information mining from either foundational and functional views. The foundational stories of information mining will help to put an excellent starting place for information mining as a systematic self-discipline, whereas the sensible stories of information mining could lead to new info mining paradigms and algorithms.

Big Data Analytics and Knowledge Discovery: 17th International Conference, DaWaK 2015, Valencia, Spain, September 1-4, 2015, Proceedings

This publication constitutes the refereed court cases of the seventeenth foreign convention on facts Warehousing and data Discovery, DaWaK 2015, held in Valencia, Spain, September 2015. The 31 revised complete papers awarded have been conscientiously reviewed and chosen from ninety submissions. The papers are prepared in topical sections similarity degree and clustering; info mining; social computing; heterogeneos networks and knowledge; info warehouses; movement processing; purposes of massive facts research; and large info.

Understanding Complex Urban Systems: Integrating Multidisciplinary Data in Urban Models

This e-book is dedicated to the modeling and knowing of advanced city structures. This moment quantity of knowing advanced city platforms makes a speciality of the demanding situations of the modeling instruments, relating, e. g. , the standard and volume of knowledge and the choice of an acceptable modeling technique. it truly is intended to help city decision-makers—including municipal politicians, spatial planners, and citizen groups—in picking a suitable modeling process for his or her specific modeling standards.

Additional resources for Advances in K-means Clustering: a Data Mining Thinking

Sample text

The right hand side of Eq. 6) is also equal to d(C1 , C1 ), as there is no cross-cluster item. 1 holds. When k = 2, by Eq. 2), to prove Eq. 6) is equivalent to prove the following equation: 2d(C1 , C2 ) = n2 n1 d(C1 , C1 ) + d(C2 , C2 ) + 2n 1 n 2 m 1 − m 2 n1 n2 If we substitute m 1 = n1 i=1 xi n1 , m2 = n2 i=1 yi n2 , and 2 . 2 The Uniform Effect of K-means Clustering 21 n1 d(C1 , C1 ) = 2 xi − x j 2 = 2(n 1 − 1) 1≤i< j≤n 1 xi 2 −4 i=1 n2 d(C2 , C2 ) = 2 yi − y j 2 = 2(n 2 − 1) 1≤i< j≤n 2 yi 2 −4 i=1 xi − y j 2 = 2n 2 1≤i≤n 1 1≤ j≤n 2 2 xi i=1 −4 yi y j , 1≤i< j≤n 2 n2 n1 d(C1 , C2 ) = xi x j , 1≤i< j≤n 1 + 2n 1 2 yi i=1 xi y j 1≤i≤n 1 1≤ j≤n 2 into Eq.

As can be seen, every data set has a significant number of true clusters disappeared. 4 Entropy Percentage of Classes Disappeared (%) Fig. 7 The percentage of the disappeared true clusters in highly imbalanced data. © 2009 IEEE. Reprinted, with permission, from Ref. 4 Entropy Percentage of Classes Disappeared (%) Fig. 8 The percentage of the disappeared true clusters in relatively balanced data. © 2009 IEEE. Reprinted, with permission, from Ref. 1 0 la2 hitech ohscal pendigits letter 0 Data Sets disappear after K-means clustering!

Applied numerical linear algebra. Soc. Ind. App. Math. 32, 206–216 (1997) 7. : A new shared nearest neighbor clustering algorithm and its applications. In: Proceedings of the Workshop on Clustering High Dimensional Data and its Applications at the 2nd SIAM International Conference on Data Mining (2002) References 35 8. : A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.

Download PDF sample

Rated 4.62 of 5 – based on 43 votes