Metrics

sklego.metrics.correlation_score(column)[source]

The correlation score can score how well the estimator predictions correlate with a given column. This is especially useful to use in situations where “fairness” is a theme.

correlation_score takes a column on which to calculate the correlation and returns a metric function

Usage: correlation_score(‘gender’)(clf, X, y)

Parameters

column – Name of the column (when X is a dataframe) or the index of the column (when X is a numpy array).

Returns

A function which calculates the negative correlation between estimator.predict(X) and X[column] (in gridsearch, larger is better and we want to typically punish correlation).

sklego.metrics.equal_opportunity_score(sensitive_column, positive_target=1)[source]

The equality opportunity score calculates the ratio between the probability of a true positive outcome given the sensitive attribute (column) being true and the same probability given the sensitive attribute being false.

\[\min \left(\frac{P(\hat{y}=1 | z=1, y=1)}{P(\hat{y}=1 | z=0, y=1)}, \frac{P(\hat{y}=1 | z=0, y=1)}{P(\hat{y}=1 | z=1, y=1)}\right)\]

This is especially useful to use in situations where “fairness” is a theme.

Usage: equal_opportunity_score(‘gender’)(clf, X, y)

Source: - M. Hardt, E. Price and N. Srebro (2016), Equality of Opportunity in Supervised Learning

Parameters
  • sensitive_column – Name of the column containing the binary sensitive attribute (when X is a dataframe) or the index of the column (when X is a numpy array).

  • positive_target – The name of the class which is associated with a positive outcome

Returns

a function (clf, X, y_true) -> float that calculates the equal opportunity score for z = column

sklego.metrics.p_percent_score(sensitive_column, positive_target=1)[source]

The p_percent score calculates the ratio between the probability of a positive outcome given the sensitive attribute (column) being true and the same probability given the sensitive attribute being false.

\[\min \left(\frac{P(\hat{y}=1 | z=1)}{P(\hat{y}=1 | z=0)}, \frac{P(\hat{y}=1 | z=0)}{P(\hat{y}=1 | z=1)}\right)\]

This is especially useful to use in situations where “fairness” is a theme.

Usage: p_percent_score(‘gender’)(clf, X, y)

source: - M. Zafar et al. (2017), Fairness Constraints: Mechanisms for Fair Classification

Parameters
  • sensitive_column – Name of the column containing the binary sensitive attribute (when X is a dataframe) or the index of the column (when X is a numpy array).

  • positive_target – The name of the class which is associated with a positive outcome

Returns

a function (clf, X, y_true) -> float that calculates the p percent score for z = column

sklego.metrics.subset_score(subset_picker: Callable, score: Callable, **kwargs)[source]

Returns a method that applies the passed score only to a specific subset. The subset picker is a method that is passed the corresponding X and y_true and returns a one-dimensional boolean vector where every element corresponds to a row in the data. Only the elements with a True value are taken into account for the passed score, representing a filter.

This allows users to have an easy approach to measuring metrics over different slices of the population which can give insights into the model performance, either specifically for fairness or in general.

Usage: subset_score(lambda X, y_true: X[‘column’] == ‘A’, accuracy_score)(clf, X, y)

Parameters
  • subset_picker – Method that returns a boolean mask that is used for slicing the samples

  • score – The score that needs to be applied to the subset

  • kwargs – Additional keyword arguments to pass to score

Returns

a function that calculates the passed score for the subset