Agglomerative Best Similarity/Distance.¶

This module contains classes to find the best similarity/distance equation based on Agglomerative Clustering.

class measurenary.agglomerative_best.AgglomerativeBestMeasure(show_result: bool = False, result_count: int = 5)[source]¶

A class to get best usage of similarity/distance with Agglomerative Clustering.

Parameters

show_result (bool, optional) – True if you want to show the result. The default is False.
result_count (int, optional) – The number of result to print out. The default is 5.

fit(df: pandas.core.frame.DataFrame, n_clusters=2, affinity='all', linkage='all', use_sampling='none', sample_rate=0.1, **kwargs)[source]¶

Fit data with Agglomerative Clustering.

Parameters

df (pandas.DataFrame) – Dataframe to fit with Agglomerative Clustering
n_clusters (int, optional) – Number of cluster to generate, by default 2
affinity (str, optional) – Type of affinity to use, by default ‘all’
linkage (str, optional) – Type of linkage to use, by default ‘all’
use_sampling (str, optional) – Sampling method that used to reduce computation time by sampling the data. It can be ‘none’ for not to implement sampling, ‘random’ to implement random sampling, and ‘stratified’ for implement stratified sampling
sample_rate (float, optional) – Sampling rate to determine size of sample. Value from 0 to 1

Returns

Return type

None

get_result(csv: bool = False) → pandas.core.frame.DataFrame[source]¶

Return the result of best similarity equation that match with the best linkage

Returns: result_df – Dataframe that contains the result of best similarity equation that match with the best linkage
Return type: pandas.DataFrame