Agglomerative Best Similarity/Distance.

This module contains classes to find the best similarity/distance equation based on Agglomerative Clustering.

class measurenary.agglomerative_best.AgglomerativeBestMeasure(show_result: bool = False, result_count: int = 5)[source]

A class to get best usage of similarity/distance with Agglomerative Clustering.

Parameters
  • show_result (bool, optional) – True if you want to show the result. The default is False.

  • result_count (int, optional) – The number of result to print out. The default is 5.

fit(df: pandas.core.frame.DataFrame, n_clusters=2, affinity='all', linkage='all', use_sampling='none', sample_rate=0.1, **kwargs)[source]

Fit data with Agglomerative Clustering.

Parameters
  • df (pandas.DataFrame) – Dataframe to fit with Agglomerative Clustering

  • n_clusters (int, optional) – Number of cluster to generate, by default 2

  • affinity (str, optional) – Type of affinity to use, by default ‘all’

  • linkage (str, optional) – Type of linkage to use, by default ‘all’

  • use_sampling (str, optional) – Sampling method that used to reduce computation time by sampling the data. It can be ‘none’ for not to implement sampling, ‘random’ to implement random sampling, and ‘stratified’ for implement stratified sampling

  • sample_rate (float, optional) – Sampling rate to determine size of sample. Value from 0 to 1

Returns

Return type

None

get_result(csv: bool = False) pandas.core.frame.DataFrame[source]

Return the result of best similarity equation that match with the best linkage

Returns

result_df – Dataframe that contains the result of best similarity equation that match with the best linkage

Return type

pandas.DataFrame