Metrics
- sklego.metrics.correlation_score(column)[source]
The correlation score can score how well the estimator predictions correlate with a given column. This is especially useful to use in situations where “fairness” is a theme.
correlation_score takes a column on which to calculate the correlation and returns a metric function
Usage: correlation_score(‘gender’)(clf, X, y)
- Parameters
column – Name of the column (when X is a dataframe) or the index of the column (when X is a numpy array).
- Returns
A function which calculates the negative correlation between estimator.predict(X) and X[column] (in gridsearch, larger is better and we want to typically punish correlation).
- sklego.metrics.equal_opportunity_score(sensitive_column, positive_target=1)[source]
The equality opportunity score calculates the ratio between the probability of a true positive outcome given the sensitive attribute (column) being true and the same probability given the sensitive attribute being false.
\[\min \left(\frac{P(\hat{y}=1 | z=1, y=1)}{P(\hat{y}=1 | z=0, y=1)}, \frac{P(\hat{y}=1 | z=0, y=1)}{P(\hat{y}=1 | z=1, y=1)}\right)\]This is especially useful to use in situations where “fairness” is a theme.
Usage: equal_opportunity_score(‘gender’)(clf, X, y)
Source: - M. Hardt, E. Price and N. Srebro (2016), Equality of Opportunity in Supervised Learning
- Parameters
sensitive_column – Name of the column containing the binary sensitive attribute (when X is a dataframe) or the index of the column (when X is a numpy array).
positive_target – The name of the class which is associated with a positive outcome
- Returns
a function (clf, X, y_true) -> float that calculates the equal opportunity score for z = column
- sklego.metrics.p_percent_score(sensitive_column, positive_target=1)[source]
The p_percent score calculates the ratio between the probability of a positive outcome given the sensitive attribute (column) being true and the same probability given the sensitive attribute being false.
\[\min \left(\frac{P(\hat{y}=1 | z=1)}{P(\hat{y}=1 | z=0)}, \frac{P(\hat{y}=1 | z=0)}{P(\hat{y}=1 | z=1)}\right)\]This is especially useful to use in situations where “fairness” is a theme.
Usage: p_percent_score(‘gender’)(clf, X, y)
source: - M. Zafar et al. (2017), Fairness Constraints: Mechanisms for Fair Classification
- Parameters
sensitive_column – Name of the column containing the binary sensitive attribute (when X is a dataframe) or the index of the column (when X is a numpy array).
positive_target – The name of the class which is associated with a positive outcome
- Returns
a function (clf, X, y_true) -> float that calculates the p percent score for z = column
- sklego.metrics.subset_score(subset_picker: Callable, score: Callable, **kwargs)[source]
Returns a method that applies the passed score only to a specific subset. The subset picker is a method that is passed the corresponding X and y_true and returns a one-dimensional boolean vector where every element corresponds to a row in the data. Only the elements with a True value are taken into account for the passed score, representing a filter.
This allows users to have an easy approach to measuring metrics over different slices of the population which can give insights into the model performance, either specifically for fairness or in general.
Usage: subset_score(lambda X, y_true: X[‘column’] == ‘A’, accuracy_score)(clf, X, y)
- Parameters
subset_picker – Method that returns a boolean mask that is used for slicing the samples
score – The score that needs to be applied to the subset
kwargs – Additional keyword arguments to pass to score
- Returns
a function that calculates the passed score for the subset