Novel distance measures of hesitant fuzzy sets and their applications in clustering analysis

Distance and similarity measures are very important in clustering, pattern recognition, decision-making and other scientific fields. For the existing hesitant fuzzy distance, most of them do not consider the hesitance degree. Even if the hesitance degree is considered, only the degree of dispersion or the number of hesitant fuzzy values are considered. Aiming at these shortages, a new hesitance degree is defined, which has better accuracy and applicability. Then, some hesitant fuzzy distance measures based on the proposed hesitance degree are proposed, which can overcome some shortcomings of the existing distance measures. Finally, the new hesitant fuzzy distance is applied to the hierarchical hesitant fuzzy k-means clustering algorithm, and an illustration example is given to illustrate the effectiveness of the proposed method.


Introduction
The theory of fuzzy sets proposed by Zadeh [1] has achieved a great success in various fields.Afterwards, many new theories and approaches about uncertainty and imprecision have been proposed by scholars, such as intuitionistic fuzzy sets(IFS) [2], interval-valued intuitionistic fuzzy sets [3], linguistic variables [4], type-2 fuzzy sets [5], fuzzy multiset [6], picture fuzzy sets(PFS) [7], etc.With the growing complexity and uncertainty of the real-life problems, it is hard to establish the degree of membership of fuzzy set.To do this, Torra [8] introduced the concept of hesitant fuzzy set(HFS) which permitted the membership having a set of possible values.As an extended form of the fuzzy set, hesitant fuzzy set can better simulate the hesitation preference of decision makers to deal with the actual situation of people hesitating between several possible values.Since the hesitation fuzzy set came out, it has received extensive attention and obtained rich research results.For example, Zhang [9] proposed the hesitant fuzzy power average operator, it is characterized by the weight of hesitation fuzzy information depends on the degree of support for it with other hesitation fuzzy information.Considering that attributes may be related to each other in realistic decision-making problems, Zhu [10,11] proposed the hesitant fuzzy Bonferroni mean operator and hesitant fuzzy Bonferroni geometric operator.Wei [12] considered the priority relationship between attributes and proposed the hesitant fuzzy prioritized operator.Xu et al. [13] introduced a hesitant fuzzy TOPSIS method based on the principle of maximum deviation and applied it to multi-attribute decisionmaking problems.Liao et al. [14] presented the hesitant fuzzy VIKOR multi-attribute decision-making method considering the psychological preference of decision-makers.Wang et al. [15] introduced the prospect value function of hesitant fuzzy elements based on prospect theory and distance measure, and then proposed a multi-attribute decisionmaking method according to the TOPSIS method that considers the risk preference of decision maker.Hesitant fuzzy sets also have been applied to the other fields such as cluster analysis [16][17][18][19], decision analysis [20][21][22][23] and pattern recognition [24][25][26][27] and so on.
Distance measure is one of the important direction in the theory of hesitant fuzzy set.So far, many research results on hesitant fuzzy distance have been obtained.For instance, Xu and Xia [28] first proposed a variety of hesitant fuzzy distance measures and discussed their properties.On the basis of hesitant fuzzy distance measure by Xu, Tong [29] introduced a hybrid hesitant fuzzy distance measure considering the preference of decision makers.And Peng [30] presented a generalized hesitant fuzzy cooperative weighted distance measure.Although the above hesitant fuzzy distance measures have many merits, they require that each corresponding hesitant fuzzy element has the same length.When the length of hesitant fuzzy elements is not equal, it is necessary to add elements to meet the requirements.However, this is bound to change the original information of hesitant fuzzy elements.That is to change the real expression of experts.To overcome the shortcoming, Tang et al. [31] proposed a distance measure without considering the length of the hesitant fuzzy element.But except for the length of the hesitant fuzzy element is 1, the distance between two identical hesitant fuzzy elements is not equal to 0, which is contrary to the fact.Later, some researchers further consider the hesitance degree of hesitant fuzzy element in distance measure.Zhang and Xu [18] proposed the concept of hesitation index which determined by the degree of dispersion of hesitant fuzzy values in the hesitant fuzzy element, and proposed a series of distance and similarity measures that consider hesitation index of hesitant fuzzy sets.Li et al. [32] proposed the concept of hesitance degree which determined by the number of hesitant fuzzy values in the hesitant fuzzy element, and proposed a series of hesitant fuzzy distance measures containing hesitance degree.However, it needs to be pointed out that the hesitance degree mentioned above only considers the degree of dispersion or the number of hesitant fuzzy values in the hesitant fuzzy element, which is imperfect and has the defect of insufficient discrimination.

Methods
According to the above analysis, the existing hesitant fuzzy distance measures have different shortcomings.To overcome the shortcomings, we first define a new hesitance degree by considering the degree of dispersion and the number of hesitant fuzzy values in hesitant fuzzy element, and put forward some distance measures based on the proposed hesitance degree.The distance is divided into two cases of equal length and unequal length between two hesitant fuzzy elements, which can solve the problem of original information distortion caused by supplementary elements in the case of inconsistent lengths.Further, we apply the new hesitant fuzzy distance to the hierarchical hesitant fuzzy K-means clustering.
The paper is organized as follows.In Methods section, some concepts related to hesitant fuzzy sets are introduced.In Preliminaries section, a new hesitance degree and some new hesitant fuzzy distance measures are proposed, and their properties are discussed.In Some New hesitant fuzzy distance measures section, we applied the new distance measure to the hierarchical hesitant fuzzy K-means clustering algorithm.The fifth section is the conclusion of this paper.

Preliminaries
Definition 1 [8] Given a fixed set X, then a hesitant fuzzy set (HFS) on X is in terms of a function that when applied to X returns a subset of [0,1].
For convenience, Xia and Xu [33] usually express HFS simply as a mathematical symbol: where h E (x) is a set of some different values in [0,1], representing the possible membership degrees of the element x ∈ X to E. For convenience, we call h = h E (x) a hesitant fuzzy element (HFE) and H the set of all HFEs.
For the convenience of comparison, We arrange the elements in h E (x i ) in increasing order, and let h σ (j) E (x i ) be the jth largest value in h E (x i ).
Li [32] put forward the axiomatic definition of distance measure for hesitant fuzzy sets (HFSs).
Definition 2 [32].Let A, B and C be three HFSs on X.Then, d is called a hesitant fuzzy distance measure for HFSs, which satisfies the following properties: d(A, B) = d(B, A);  (4) d(A, B) + d(B, C) ≥ d(A, C).
It is noted that the number of values in different HFEs may be different, Xu and Xia extend the shorter one by adding the same value until both of them have the same length when we compare them.Let l(h E (x i )) be the number of values in h E (x i ), and l x i = max{l(h A (x i )), l(h B (x i ))}.Xu and Xia [28] proposed a series of hesitant fuzzy set distances as follows: Definition 3 [28].Let A and B be two HFSs on X = {x 1 , x 2 , . . ., x n }.Then, the hesitant normalized Hamming distance as follows: the hesitant normalized Euclidean distance as follows: (3) the generalized hesitant normalized distance: where λ > 0.
In order to measure the deviation of each HFE in each HFS, Zhang and Xu [18] et al. proposed the concept of hesitance degree of HFS.
Definition 4 [18].Let H be an HFS in a reference set X, denoted by Then, the hesitance degree of x in H can be defined as follows: where l h is the number of the elements in h H (x i ).
In general, the bigger the range among the possible values in each HFE is, the larger the hesitance degree of the HFE is.By considering the impact of the hesitance degree of HFEs, Xu and Zhang proposed a new method for measuring the distance between HFSs: Definition 5 [18].Let A and B be two HFSs on X.Then, the hesitant normalized Hamming distance including hesitance degree between A and B is defined as: the hesitant normalized Euclidean distance including hesitance degree is defined as: the generalized hesitant normalized distance including hesitance degree is defined as:

B (x i ) are the jth values in h A (x i ) and h B (x i ), respectively. h Z (h A (x i )) and h Z (h B (x i )) are referred to the hesitance degree of two HFEs h A (x i ) and h B (x i ), respectively.
Li [32] defined a hesitance degree based on the number of hesitant fuzzy values in hesitant fuzzy elements, and proposed a series of hesitant fuzzy distance measures.
Definition 6 [32].Let H be an HFS on X = {x 1 , x 2 , . . ., x n }.Then, the hesitance degree of x in H can be defined as follows: where l (h H (x i )) be the length of h H (x i ).
Therefore, the hesitance degree of the HFS H is defined as: Definition 7 Let M 1 , M 2 , . . ., M m and B be a set of HFS on X = {x 1 , x 2 , . . ., x n },then for any M k and M t , k, t = 1, 2, . . ., m, the normalized Hamming distance including hesitance degree between M k and M t is defined as follows: the normalized Euclidean distance including hesitance degree between M k and M t is defined as follows: the normalized generalized distance including hesitance degree between M k and M t is defined as follows: In order to relax the limitation that the corresponding hesitant fuzzy elements have the same length.Tang et al. [31] proposed a series of distance measures.

Definition 8 Let A and B be two HFSs on X. Then, the hesitant normalized Hamming distance between A and B is defined as:
the normalized Euclidean distance between A and B is defined as follows: (15) the normalized generalized distance between A and B is defined as follows: where λ > 0, h σ (j)

Some New hesitant fuzzy distance measures
According to analysis, the existing method only considers the number or the degree of dispersion, which is obviously one-sided.Therefore, by simultaneously considering them, we propose a new hesitance degree as follows.

. , l h A . Then, the hesitance degree of x in A can be defined as follows:
where l h A is the length of h A (x i ), g is the minimum accuracy of values in the hesitant fuzzy Therefore, the hesitance degree of the HFS A is defined as: Remark 1 n is the number of digits after the decimal point of the hesitant fuzzy element, then g = 1/10 n .For example, let h = {0.2,0.3} be a hesitant fuzzy element, then the minimum accuracy g = 1/10 = 0.1.If h = {0.25,0.36}, then the minimum accuracy g = 1/10 2 = 0.01.
Next, we use a numerical example to illustrate the advantages of the proposed hesitance degree in processing hesitation fuzzy information.

Example 1
Let h 1 = {03, 0.5}, h 2 = {05, 0.6} and h 3 = {0.3,0.5, 0.6} be three hesitant fuzzy elements, g = 0.1, θ = μ = 0.5.Then, their hesitance degrees are calculated by the different formulas respectively.the result calculated by formula ( 5) is as follows: the result calculated by formula ( 9) is as follows: the result calculated by formula ( 17) is as follows: From the above results, we can find that h Z ({03, Obviously, the results calculated by formula (5) and formula (9) are unreasonable.However, h(h 1 ) = h(h 2 ) = h(h 3 ).That is to say the proposed hesitance degree can clearly distinguish the hesitance degrees of hesitant fuzzy elements h 1 , h 2 and h 3 , which is consistent with people's intuitive feeling.Therefore, the proposed hesitance degree is more reasonable than the existing hesitance degree mentioned above.
Based on the proposed hesitance degree, we proposes some new distance measures, which can compare HFEs of equal or unequal length, so we can avoid destroying the original information by adding elements when the length is unequal.Definition 10 Let h A (x i ) and h B x j be two HFEs.Then, the normalized Hamming distance between h A (x i ) and h B x j is defined as: The normalized Euclidean distance between h A (x i ) and h B x j is defined as: The Hausdorff metric distance between h A (x i ) and h B x j is defined as: The normalized generalized distance between h A (x i ) and h B x j is defined as: where λ > 0, α, β ∈[ 0, 1] , α + β = 1, l h A and l h B are the lengths of HFEs h A (x i ) and h B x j , respectively.Especially, if λ = 1, then formula (22) degenerates to formula (21).If λ = 2, then formula (22) degenerates to formula (20).If λ → ∞, then formula (22) degenerates to formula (21).
(1) By Definition 10, We have Since the following equation holds: On the condition that 1 ≤ λ ≤ +∞, we can reason from Lemma 1 that d hllg (A, C) ≤ d hllg (A, B) + d hllg (B, C).Therefore, Property (4) is verified.Thus, we complete the proof of Theorem 1.
From Table 1, it can be seen that d hllh (h 1 , h 1 ) = 0 and d hllh (h 1 , , which is consistent with people's intuitive feeling.That means the results based on proposed distance measure is more reasonable than those of the above mentioned distance measures.
On the other hand, we compare the characteristics of the proposed distance measure with those of the existing distance measures.The results are shown in Table 2.
From Table 2, it can be seen that the proposed distance measure has all listed characteristics, but the mentioned distance measures do not have all of them.This means that the proposed distance measure is superior to the existing distance measures above in many complex situations.

The description of clustering Algorithm
Recently, many studies focus on the clustering analysis of HFSs.Chen and Xu [35] focused on studied the clustering for hesitant fuzzy sets based on the K-means clustering algorithm, which uses the result of hierarchical clustering as the initial clusters.Zhang and Xu [36] proposed a novel hesitant fuzzy agglomerative hierarchical clustering algorithm.The algorithm considers each of the given HFSs as a unique cluster, and then compares each pair of the HFSs by using the weighted Hamming distance or the weighted Euclidean distance.The two clusters with smaller distance are jointed.Repeat the process until the desired number of clusters is achieved.
We focused on studied the hierarchical hesitant fuzzy K-means clustering algorithm, and using the new distance measure to calculate the distance between hesitant fuzzy sets.The specific steps of the hierarchical hesitant fuzzy K-means clustering algorithm are as follows: step1.(Hierarchical clustering) Consider each hesitant fuzzy set A i (i = 1, 2, . . ., n) as an independent cluster {A 1 }, {A 2 }, . . ., {A n }.Then calculate the distance between A i and A j , which is denoted by d ij = d(A i , A j ).The two clusters with smaller distance are jointed by average function, which is given as follows: This iterative process is repeated until all clusters are aggregated into one cluster.The proposed distance step2.According to the given number of clusters, select the corresponding result in step 1 as the initial cluster, then calculate the distance between the hesitant fuzzy set A i (i = 1, 2, . . ., n) and the center of each cluster.Finally classify A i to the cluster with the closest cluster center.step3.Recalculate the new cluster center through the average function of the hesitant fuzzy set.step4.Repeat steps 2 and 3 until all cluster centers are stable.

Illustrative example
A specific example (adapted from Ref. [35]) is given below to illustrate the above algorithm.The proposed hesitant fuzzy distance is applied to the hierarchical hesitant fuzzy K-means clustering algorithm.There are five tourism resources need to be evaluated and classified.Experts give corresponding evaluation information (g = 0.1, θ = μ = 0.5, α = β = 0.5) to tourism resources from six aspects, namely: scale, environmental conditions, integrity, service, tourist routes and convenient transportation, which is expressed as X = {x 1 , x 2 , . . ., x 6 , }, the evaluation information of the five tourism resources is represented by hesitant fuzzy sets A i = (i = 1, 2, 3, 4, 5), which are listed in Table 3: step1.Consider each hesitating fuzzy set A i (i = 1, 2, 3, 4, 5) as an independent cluster: {A 1 }, {A 2 }, {A 3 }, {A 4 } and {A 5 }.Using the formula 21 calculate the distance between each hesitant fuzzy set and the other four hesitant fuzzy sets: Obviously, {A 2 } and {A 3 } are the two closest clusters, then calculate the new cluster {A 2 , A 3 } by formula (25).Therefore, the hesitant fuzzy set A i (i = 1, 2, 3, 4, 5) is divided Because of {A 2 , A 3 } and {A 5 } are the two closest clusters, then the hesitant fuzzy sets are divided into the following three clusters: {A 2 , A 3 , A 5 }, {A 1 } and {A 4 }.Calculate the new cluster and the distances between each cluster and the other clusters: Where {A 1 } and {A 4 } are the two closest clusters, then the hesitant fuzzy sets are divided into two clusters: {A 2 , A 3 , A 5 } and {A 1 , A 4 }.
In the end, the two clusters merged into one cluster: {A 1 , A 2 , A 3 , A 4 , A 5 }.step2.Assuming number of clusters c = 3 is given, according to the result of step1, then c 1 = {A 1 }, c 2 = {A 2 , A 3 , A 5 } and c 3 = {A 4 } are selected as the initial clusters.Next, calculate the distances of each hesitant fuzzy set A i (i = 1, 2, . . ., 5) between each initial cluster c j (j = 1, 2, 3) as follows:

Comparative analysis
In order to illustrate the performance of the proposed method, we make a comparative analysis with the hierarchical hesitant fuzzy k-means clustering algorithm introduced by Chen et al [35].

Results and Discussion
According to the above analysis, the comparison result is shown in Table 4.
From Table 4, we can find that there are two different clustering results using Chen's method introduced in [35].It is very difficult to decide which one to choose in the clustering process.And even if it can be selected correctly, it will increase the complexity of the algorithm.However, a unique clustering result can be obtained by the proposed method.

Category
Ref [35] The proposed method case1 case2 And the result is same as the best one obtained by Chen's method.Therefore, the hierarchical hesitant fuzzy k-means clustering method based on the proposed distance measure is more reasonable and effective.

Conclusions
Considering the existing hesitance degrees does not take into account both degree of dispersion and number of the hesitant fuzzy values in the hesitant fuzzy element, a new hesitance degree is defined in this paper, which has better accuracy and applicability.We have elaborated the important role of hesitance degree in hesitant fuzzy distance measure.Further, we proposed some hesitant fuzzy distance measures based on the new hesitance degree, which can overcome the shortcomings of the existing distance measures.Moreover, we applied the new hesitant fuzzy distance to the hierarchical hesitant fuzzy k-means clustering algorithm, and presented an example to illustrate the effectiveness of the proposed method.In addition, we have compared and analyzed with the existing hierarchical hesitant fuzzy k-means clustering algorithm.It has been found that the clustering algorithm based on new distance measure is more reasonable.The proposed distance measure can avoid the original information distortion and have higher resolution.Therefore, it can help decision-makers get the only ideal results in practical problems.In the future, We will apply the proposed distance measure to multi-attribute group decisionmaking.We will extend this approach to interval valued environment.We will develop the knowledge measure [37] for hesitant fuzzy set.
2, 3, 4, 5(i = j) = 0.2222, there are two options when merging the two clusters into a new cluster.Therefore, the following two cases are considered.case1: Hesitant fuzzy sets A i (i = 1, 2, 3, 4, 5) are divided into the following four clusters: {A 1 }{A 2 , A 3 }{A 4 } and {A 5 }.Calculate the distances between each cluster and the other three clusters.We have d ({A 2 , A 3 } , A 5 ) is the shortest distance.Merging {A 2 , A 3 } and {A 5 } into a new cluster, the hesitant fuzzy sets are divided into three clusters: {A 2 , A 3 , A 5 }{A 1 } and {A 4 }.Calculate the new cluster and the distances between each cluster and the other clusters.We have d (A 1 , A 4 ) is the shortest distance.Therefore, hesitant fuzzy sets are divided into the following two clusters: {A 2 , A 3 , A 5 } and {A 1 , A 4 }.In the end, the two clusters are merged into one cluster:{A 1 , A 2 , A 3 , A 4 , A 5 }.case2: Hesitant fuzzy sets A i (i = 1, 2, 3, 4, 5) are divided into the following four clusters: {A 1 } {A 2 } {A 3 , A 4 } and {A 5 }.Calculate the distance between each cluster and the other three clusters.We have d (A 2 , A 5 ) is the shortest distance.Merging {A 2 } and {A 5 } into a new cluster, the hesitant fuzzy set is divided into three clusters: {A 1 }, {A 3 , A 4 } and {A 2 , A 5 }.Calculate the new cluster and the distances between each cluster and the other clusters.We have d ({A 3 , A 4 } , {A 2 , A 5 }) is the shortest distance.Therefore, hesitant fuzzy sets are divided into two clusters: {A 1 } and {A 2 , A 3 , A 4 , A 5 }.In the end, the two clusters are merged into one cluster:{A 1 , A 2 , A 3 , A 4 , A 5 }.Obviously, the clustering results obtained in different cases are different.Next, we analyze the quality of the clustering results of the two cases.Generally, the average distance d ρ is an indicator to measure the quality of clustering results.The smaller the d ρ , the better the clustering result.The calculation process is as follows: d({A 2 , A 3 },A 2 ) = 0.1531, d({A 2 , A 3 },A 3 ) = 0.0714 d ρ ({A 2 , A 3 }) = 0.1531 + 0.0714 2 = 0.1123 d({A 3 , A 4 },A 3 ) = 0.1750, d ({A 3 , A 4 },A 4 ) = 0.1069 d ρ ({A 3 , A 4 }) = 0.1750 + 0.1.692 = 0.1410 Proof As d hllh , d hlle and d hd are the special cases of d hllg , here we only prove that d hllg is a distance measure.According to Definition 10, it can be obtained easily that Property m}, k, t ∈ I.Then, d hllh (A k , A t ), d hlle (A k , A t ), and d hllg (A k , A t ) are hesitant fuzzy distances.

Table 2
The characteristic comparisons with existing distance measures

Table 3
Hesitance fuzzy assessment information {A 1 }, {A 2 , A 3 }, {A 4 } and {A 5 }.Continue to calculate the distance between each cluster and the other three clusters: