outrank

Welcome to OutRank's documentation!

All functions/methods can be searched-for (search bar on the left).

This tool enables fast screening of feature-feature interactions. Its purpose is to give the user fast insight into potential redundancies/anomalies in the data. It is implemented to operate in _mini batches_, it traverses the raw data incrementally, refining the rankings as it goes along. The core operation, interaction ranking, outputs triplets which look as follows:

featureA    featureB    0.512
featureA    featureC    0.125

Setup

pip install outrank

and test a minimal cycle with

outrank --task selftest

if this passes, you can be pretty certain OutRank will perform as intended. OutRank's primary use case is as a CLI tool, begin exploring with

outrank --help

Example use cases

A minimal showcase of performing feature ranking on a generic CSV is demonstrated with this example.
More examples demonstrating OutRank's capabilities are also available.

OutRank as a Python library

Once installed, _OutRank_ can be used as any other Python library. For example, generic feature ranking algorithms can be accessed as

from outrank.algorithms.feature_ranking.ranking_mi_numba import (
    mutual_info_estimator_numba,
)

# Some synthetic minimal data (Numpy vectors)
a = np.array([1, 0, 0, 0, 1, 1, 1, 0], dtype=np.int32)

lowest = np.array(np.random.permutation(a), dtype=np.int32)
medium = np.array([1, 1, 0, 0, 1, 1, 1, 1], dtype=np.int32)
high = np.array([1, 0, 0, 0, 1, 1, 1, 1], dtype=np.int32)

lowest_score = mutual_info_estimator_numba(
    a, lowest, np.float32(1.0), False,
)
medium_score = mutual_info_estimator_numba(
    a, medium, np.float32(1.0), False,
)
high_score = mutual_info_estimator_numba(
    a, high, np.float32(1.0), False,
)

scores = [lowest_score, medium_score, high_score]
sorted_score_indices = np.argsort(scores)
assert np.sum(np.array([0, 1, 2]) - sorted_score_indices) ==  0

Creating a simple dataset

from outrank.algorithms.synthetic_data_generators.cc_generator import CategoricalClassification

cc = CategoricalClassification()

# Creates a simple dataset of 10 features, 10k samples, with feature cardinality of all features being 35
X = cc.generate_data(9, 
                     10000, 
                     cardinality=35, 
                     ensure_rep=True, 
                     random_values=True, 
                     low=0, 
                     high=40)

# Creates target labels via clustering
y = cc.generate_labels(X, n=2, class_relation='cluster')

View Source

1"""
2.. include:: ../docs/DOCSMAIN.md
3"""
4from __future__ import annotations