**Abstract:**

Clustering, also called unsupervised learning, is an important technique in the field of data mining. According to the type of data sets, clustering algorithms can be divided into two kinds. One is for object data in the distance space, where the objects to be clustered are described by feature vectors. The other is for proximity data, where only the relationship values such as similarities or dissimilarities between objects are known. The latter is a more general case, as the relationship could also be got for the data represented by feature vectors. On the contrary, many real-world data sets can only be represented by relational data for which object-based clustering algorithms could not be applied directly.

Communities are groups of nodes (vertices) which probably share common properties and/or play similar roles within the graph. They can extract specific structures from complex networks, and consequently community detection has attracted considerable attention crossing many areas where systems are often represented as graphs. Community detection is in fact a clustering problem on graphs, and the available information in this problem is often in the form of similarities or dissimilarities (between nodes).

We consider in this work to represent graphs as relational data, and propose models for the corresponding relational data clustering. Four approaches are brought forward to handle the community detection problem under different scenarios.

We start with a basic situation where nodes in the graph are clustered based on the dissimilarities and propose a new c-partition clustering approach named Median Evidential C-Means (MECM) algorithm.

This approach extends the median clustering methods in the framework of belief function theory. Moreover, a community detection scheme based on MECM is presented. The proposed approach could provide credal partitions for data sets with only known dissimilarities. The dissimilarity measure could be neither symmetric nor fulfilling any metric requirements. It is only required to be of intuitive meaning. Thus it expands application scope of credal partitions. In addition, some practical issues about how to apply the method into community detection problems such as how to determine the initial prototypes and the optimum community number in the sense of credal partitions are discussed. This makes the approach appropriate for graph partitions and enables us to gain a better understanding of the analysed networks, especially for the uncertain and imprecise structures.

In MECM, one single representative object in the original data set is used to describe each of the individual classes. However, in some cases the way of using only one node to describe a community may not be sufficient enough. In order to capture various aspects of the community structures, more members rather than one should be referred as the prototypes of an individual group. Motivated by this idea, a imilarity-based Multiple Prototype (SMP) community detection approach is proposed. The centrality values are used as the criterion to select multiple prototypes to characterize each community. The prototype weights are derived to describe the degree of representativeness of objects for their own communities. Then the similarity

between each node and community is defined, and the nodes are partitioned into divided communities according to these similarities. Crisp and fuzzy partitions could be obtained by the application of SMP.

Following, we extend SMP in the framework of belief functions to get credal partitions so that we can gain a better understanding of the data structure. The prototype weights are incorporate into the objective function of evidential clustering.

The mass membership and the prototype weights could be updated alternatively during the optimization process. In this case, each cluster could be described using multiple weighted prototypes. As we will show, the prototype weights could also provide us some useful information for structure analysis of the data sets.

With the increasing size of social networks in real world, community detection approaches should be fast. The Label Propagation Algorithm (LPA) is known to be one of the near-linear solutions and benefits of easy implementation, thus it forms a good basis for efficient community detection methods. We extend the original update rule and propagation criterion of LPA in the framework of belief functions. A new community detection approach, called Semi-supervised Evidential Label Propagation (SELP), is proposed as an enhanced version of the conventional LPA. One of the advantages of SELP is that it can take use of the available prior knowledge about the community labels of some individuals. This is very common in real practice. For instance, in the co-authorship network, some domain experts are very easy to be labeled as their research interests

are well-known to everyone. In SELP, the nodes are divided into two parts. One contains the labeled nodes, and the other includes the unlabeled ones. The community labels are propagated from the labeled nodes to the unlabeled ones step by step according to the proposed evidential label propagation rule.

The performance of the proposed approaches is evaluated using benchmark graph data sets and generated graphs. Our experimental results illustrate the effectiveness of the proposed clustering algorithms and community detection approaches.