logo-Irisa

Belief relational clustering and its application to community detection

Kuang Zhou

PhD's student in team Druid - Directors: Arnaud MARTIN and Quan PAN

PhD's defense - July 5th 2016

The talk
img-logoPdfsmall The slides (Pdf)
The Jury
Abstract
index video of the PhD's defenses

img-exposeSoutenanceKuangZhou2016

Cliquer sur l'image pour lancer la vidéo (fichier téléchargeable (MP4) img-flecheHAUT

The members of the Jury:

Thierry DENOEUX, Professeur, Université Technologique de Compiègne
Christine LARGERON, Professeur, Université Jean Monet
Frédéric DAMBREVILLE, Expert senior HDR, DGA-MI
Zhun-ga LIU, Associate professor, Northwestern Polytechnical University, China
Jacques NICOLAS Directeur de recherche, INRIA
Arnaud MARTIN, Professeur, Université de Rennes 1 / directeur de thèse
Quan PAN, Professeur, Northwestern Polytechnical University, China / directeur de thèse

img-flecheHAUT

Abstract:

Clustering, also called unsupervised learning, is an important technique in the field of data mining. According to the type of data sets, clustering algorithms can be divided into two kinds. One is for  object data in the distance space, where the objects to be clustered are described by feature vectors. The other is for proximity data, where only the relationship values such as similarities or dissimilarities between objects are known. The latter is a more general case, as the relationship could also be got for the data represented by feature vectors. On the contrary,  many real-world data sets can only be represented by relational data for which object-based clustering algorithms could not be applied directly.

Communities are groups of nodes (vertices) which probably share common properties and/or play similar roles within the graph. They can extract specific structures from complex networks, and consequently community detection has attracted considerable attention crossing many areas where systems are often represented as graphs. Community detection is in fact a clustering problem on graphs, and the available information in this problem is often in the form of similarities or dissimilarities (between nodes).

We consider in this work to represent graphs as relational data, and propose models for the corresponding relational data clustering. Four approaches are brought forward to handle the community detection problem under different scenarios.

We start with a basic situation where nodes in the graph are clustered based on the dissimilarities and propose a new c-partition clustering approach named  Median Evidential C-Means (MECM) algorithm.
This approach  extends the median clustering methods in the framework of belief function theory. Moreover, a community detection scheme based on MECM is  presented. The proposed approach could provide credal partitions for data sets with only known dissimilarities. The dissimilarity measure could be neither symmetric nor fulfilling any metric requirements. It is only required to be of  intuitive meaning. Thus it expands application scope of credal partitions.  In addition, some practical issues about how to apply the method into community detection problems such as how to determine the initial prototypes and the optimum community number in the sense of credal partitions are discussed.  This makes the approach appropriate for graph partitions and enables us to gain a better understanding of the analysed networks, especially for the uncertain and imprecise structures.

In MECM, one single representative object in the original data set is used to describe each of the individual  classes. However,  in some cases the way of using only one node to describe a community may not be sufficient enough.  In order to capture various aspects of the community structures,  more members rather than one should  be referred as the prototypes of an individual group. Motivated by this idea,  a  imilarity-based Multiple Prototype (SMP) community detection approach is proposed. The centrality values are used as the criterion to select multiple prototypes to characterize each community. The prototype weights are derived to describe the degree of representativeness of objects for their own communities. Then the similarity
between each node and community is defined, and the nodes are partitioned into divided communities according to these similarities. Crisp and fuzzy partitions could be obtained by the application of SMP.

Following, we extend SMP in the framework of belief functions to get credal partitions so that we can gain a better understanding of the data structure. The prototype weights are incorporate into the objective function of evidential clustering.
The mass membership and the prototype weights could be updated alternatively during the optimization process. In this case, each cluster could be described using multiple weighted prototypes. As we will show, the prototype weights could also provide us some useful information for structure analysis of the data sets.

With the increasing size of social networks in real world, community detection approaches should be fast. The Label Propagation Algorithm (LPA) is known to be one of the near-linear solutions and benefits of easy implementation, thus it forms a good basis for efficient community detection methods. We extend the original update rule and propagation criterion of LPA in the framework of belief functions. A new community detection approach, called Semi-supervised Evidential Label Propagation (SELP), is proposed as an enhanced version of the conventional LPA. One of the advantages of SELP is that it can take use of the available prior knowledge about the community labels of some individuals. This is very common in real practice. For instance, in the co-authorship network, some domain experts are very easy to be labeled as  their research interests
are well-known to everyone. In SELP, the nodes are divided into two parts. One contains the labeled nodes, and the other includes the unlabeled ones.  The community labels are propagated from the labeled nodes to the unlabeled ones step by step according to the proposed evidential label propagation rule.

The performance of the  proposed approaches is evaluated using benchmark graph data sets and generated graphs. Our experimental results illustrate the effectiveness of the proposed clustering algorithms and community detection approaches.

img-logoFlecheHaut
© 2016 Pôle audiovisuel Inria Rennes- Bretagne Atlantique