Agglomerative Best Similarity/Distance.¶
This module contains classes to find the best similarity/distance equation based on Agglomerative Clustering.
- class measurenary.agglomerative_best.AgglomerativeBestMeasure(show_result: bool = False, result_count: int = 5)[source]¶
A class to get best usage of similarity/distance with Agglomerative Clustering.
- Parameters
show_result (bool, optional) – True if you want to show the result. The default is False.
result_count (int, optional) – The number of result to print out. The default is 5.
- fit(df: pandas.core.frame.DataFrame, n_clusters=2, affinity='all', linkage='all', use_sampling='none', sample_rate=0.1, **kwargs)[source]¶
Fit data with Agglomerative Clustering.
- Parameters
df (pandas.DataFrame) – Dataframe to fit with Agglomerative Clustering
n_clusters (int, optional) – Number of cluster to generate, by default 2
affinity (str, optional) – Type of affinity to use, by default ‘all’
linkage (str, optional) – Type of linkage to use, by default ‘all’
use_sampling (str, optional) – Sampling method that used to reduce computation time by sampling the data. It can be ‘none’ for not to implement sampling, ‘random’ to implement random sampling, and ‘stratified’ for implement stratified sampling
sample_rate (float, optional) – Sampling rate to determine size of sample. Value from 0 to 1
- Returns
- Return type
None