Usage

Installation

To use Measurenary, first install it using pip:

(.venv) $ pip install measurenary

Quickstart

Assume we have a binary pandas DataFrame with 9 features (A to I) and 1 target column.:

>> import pandas as pd

>> df = pd.DataFrame(data={
'A': [1, 1, 0, 0, 0, 0, 0, 0, 0, 0],
'B': [0, 0, 1, 1, 0, 0, 0, 0, 0, 0],
'C': [0, 0, 0, 0, 1, 1, 0, 0, 0, 0],
'D': [0, 0, 0, 0, 0, 0, 1, 1, 0, 0],
'E': [0, 0, 0, 0, 0, 0, 0, 0, 1, 1],
'F': [0, 0, 0, 0, 1, 1, 1, 1, 1, 0],
'G': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
'H': [0, 1, 1, 1, 1, 1, 0, 0, 0, 0],
'I': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
'Target': [0, 0, 0, 1, 1, 1, 1, 1, 1, 0]})

>> df
   A  B  C  D  E  F  G  H  I  Target
0  1  0  0  0  0  0  0  0  0     0
1  1  0  0  0  0  0  0  1  0     0
2  0  1  0  0  0  0  0  1  0     0
3  0  1  0  0  0  0  0  1  0     1
4  0  0  1  0  0  1  0  1  0     1
5  0  0  1  0  0  1  0  1  0     1
6  0  0  0  1  0  1  0  0  0     1
7  0  0  0  1  0  1  0  0  0     1
8  0  0  0  0  1  1  0  0  0     1
9  0  0  0  0  1  0  0  0  0     0

The form of data that can be processed by this class is limited only in the form of an example. If you have other forms of data, make sure to transform the data so that it has n binary feature columns and 1 target column.

measurenary.AgglomerativeBestSimilarity

You can use the measurenary.AgglomerativeBestMeasure() class to fit your data with best similarity/distance equation based on Agglomerative Clustering. This class also can find best linkage method based on the used similarity/distance. Make sure your target class are in the last column of your dataframe.

In this case, we already have a binary dataframe with target class in the last and have 2 cluster. So we can pass it to measurenary.AgglomerativeBestMeasure() class:

>> from measurenary import AgglomerativeBestMeasure

>> aggbs = AgglomerativeBestMeasure()

>> aggbs.fit(df, n_clusters=2)

100%|████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:03<00:00,  1.06s/it]

To get the result, we can call get_result() method to our object. It return a pandas DataFrame with 5 best similarity/distance recommendation:

>> result = aggbs.get_result()

>> print(result.head())
   linkage           equation  homogeneity  completeness  v_measure   adjusted_rand_index
0  complete          sim gower     0.628236      0.609987   0.618977             0.013972
1  complete         sim stiles     0.573438      0.631777   0.601196             0.013972
2   average         sim peirce     0.432538      0.432538   0.432538             0.010967
3  complete  sim fager_mcgowan     0.331560      0.445928   0.380332             0.010204
4   average  sim fager_mcgowan     0.331560      0.445928   0.380332             0.005673

Or, if you want to calculate based on pair of your target, you can use measurenary.PairBestMeasure() class instead.

measurenary.PairBestMeasure

This class uses the pair of combinations of each row to determine whether the combination has a large similarity value or not and with the appropriate target or not. If the target is the same, then the two lines should have a large similarity value. Make sure your target class are in the last column of your dataframe.

In our case, we can just pass our data to the class and use fit() method to find the recommended equations:

>> from measurenary import PairBestMeasure

>> pbs = PairBestMeasure(show_result=True, result_count=5)

>> bs.fit(df, use_seed=True, num_sample=2)

1it [00:00, 96.36it/s]
100%|████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 21.34it/s]

to get our result, we can call get_result() method to our object. It return a pandas DataFrame with 5 best similarity/distance recommendation:

>> print(bs.get_result())

final 5 best similarity:
               sim/dis name  mean_auc
0          jaccard similarity       1.0
1            gower similarity       1.0
2     hellinger dissimilarity       1.0
3  pearson_heron_1 similarity       1.0
4           dice_2 similarity       1.0