Linear Model

class sklego.linear_model.BaseScipyMinimizeRegressor(alpha=0.0, l1_ratio=0.0, fit_intercept=True, copy_X=True, positive=False)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.RegressorMixin, abc.ABC

Base class for regressors relying on scipy’s minimze method. Derive a class from this one and give it the function to be minimized.

alphafloat, default=0.0

Constant that multiplies the penalty terms. Defaults to 1.0.

l1_ratiofloat, default=0.0

The ElasticNet mixing parameter, with 0 <= l1_ratio <= 1. For l1_ratio = 0 the penalty is an L2 penalty. For l1_ratio = 1 it is an L1 penalty. For 0 < l1_ratio < 1, the penalty is a combination of L1 and L2.

fit_interceptbool, default=True

Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered).

copy_Xbool, default=True

If True, X will be copied; else, it may be overwritten.

positivebool, default=False

When set to True, forces the coefficients to be positive.

coef_np.array of shape (n_features,)

Estimated coefficients of the model.

intercept_float

Independent term in the linear model. Set to 0.0 if fit_intercept = False.

This implementation uses scipy.optimize.minimize, see https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html.

fit(X, y, sample_weight=None)[source]

Fit the model using the SLSQP algorithm.

Xnp.array of shape (n_samples, n_features)

The training data.

ynp.array, 1-dimensional

The target values.

sample_weightOptional[np.array], default=None

Individual weights for each sample.

Fitted regressor.

predict(X)[source]

Predict using the linear model.

Xnp.array, shape (n_samples, n_features)

Samples to get predictions of.

ynp.array, shape (n_samples,)

The predicted values.

class sklego.linear_model.DeadZoneRegressor(threshold=0.3, relative=False, effect='linear', n_iter=2000, stepsize=0.01, check_grad=False)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.RegressorMixin

fit(X, y)[source]

Fit the model using X, y as training data.

Parameters:
  • X – array-like, shape=(n_columns, n_samples, ) training data.

  • y – array-like, shape=(n_samples, ) training data.

Returns:

Returns an instance of self.

predict(X)[source]

Predict using DeadZoneRegressor.

Parameters:

X – array-like, shape=(n_columns, n_samples, ) training data.

Returns:

Returns an array of predictions shape=(n_samples,)

class sklego.linear_model.DemographicParityClassifier(*args, multi_class='ovr', n_jobs=1, **kwargs)[source]

Bases: sklearn.base.BaseEstimator, sklearn.linear_model._base.LinearClassifierMixin

A logistic regression classifier which can be constrained on demographic parity (p% score).

Minimizes the Log loss while constraining the correlation between the specified sensitive_cols and the distance to the decision boundary of the classifier.

Only works for binary classification problems

\[\begin{split}\begin{array}{cl}{\operatorname{minimize}} & -\sum_{i=1}^{N} \log p\left(y_{i} | \mathbf{x}_{i}, \boldsymbol{\theta}\right) \\ {\text { subject to }} & {\frac{1}{N} \sum_{i=1}^{N}\left(\mathbf{z}_{i}-\overline{\mathbf{z}}\right) d \boldsymbol{\theta}\left(\mathbf{x}_{i}\right) \leq \mathbf{c}} \\ {} & {\frac{1}{N} \sum_{i=1}^{N}\left(\mathbf{z}_{i}-\overline{\mathbf{z}}\right) d_{\boldsymbol{\theta}}\left(\mathbf{x}_{i}\right) \geq-\mathbf{c}}\end{array}\end{split}\]

Source: - M. Zafar et al. (2017), Fairness Constraints: Mechanisms for Fair Classification

Parameters:
  • covariance_threshold – The maximum allowed covariance between the sensitive attributes and the distance to the decision boundary. If set to None, no fairness constraint is enforced

  • sensitive_cols – List of sensitive column names(when X is a dataframe) or a list of column indices when X is a numpy array.

  • C – Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization.

  • penalty – Used to specify the norm used in the penalization. Expects ‘none’ or ‘l1’

  • fit_intercept – Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.

  • max_iter – Maximum number of iterations taken for the solvers to converge.

  • train_sensitive_cols – Indicates whether the model should use the sensitive columns in the fit step.

  • multi_class – The method to use for multiclass predictions

  • n_jobs – The amount of parallel jobs that should be used to fit multiclass models

class sklego.linear_model.EqualOpportunityClassifier(*args, multi_class='ovr', n_jobs=1, **kwargs)[source]

Bases: sklearn.base.BaseEstimator, sklearn.linear_model._base.LinearClassifierMixin

A logistic regression classifier which can be constrained on equal opportunity score.

Minimizes the Log loss while constraining the correlation between the specified sensitive_cols and the distance to the decision boundary of the classifier for those examples that have a y_true of 1.

Only works for binary classification problems

\[\begin{split}\begin{array}{cl}{\operatorname{minimize}} & -\sum_{i=1}^{N} \log p\left(y_{i} | \mathbf{x}_{i}, \boldsymbol{\theta}\right) \\ {\text { subject to }} & {\frac{1}{POS} \sum_{i=1}^{POS}\left(\mathbf{z}_{i}-\overline{\mathbf{z}}\right) d \boldsymbol{\theta}\left(\mathbf{x}_{i}\right) \leq \mathbf{c}} \\ {} & {\frac{1}{POS} \sum_{i=1}^{POS}\left(\mathbf{z}_{i}-\overline{\mathbf{z}}\right) d_{\boldsymbol{\theta}}\left(\mathbf{x}_{i}\right) \geq-\mathbf{c}}\end{array}\end{split}\]

where POS is the subset of the population where y_true = 1

Parameters:
  • covariance_threshold – The maximum allowed covariance between the sensitive attributes and the distance to the decision boundary. If set to None, no fairness constraint is enforced

  • positive_target – The name of the class which is associated with a positive outcome

  • sensitive_cols – List of sensitive column names(when X is a dataframe) or a list of column indices when X is a numpy array.

  • C – Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization.

  • penalty – Used to specify the norm used in the penalization. Expects ‘none’ or ‘l1’

  • fit_intercept – Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.

  • max_iter – Maximum number of iterations taken for the solvers to converge.

  • train_sensitive_cols – Indicates whether the model should use the sensitive columns in the fit step.

  • multi_class – The method to use for multiclass predictions

  • n_jobs – The amount of parallel jobs that should be used to fit multiclass models

class sklego.linear_model.FairClassifier(*args, **kwargs)[source]

Bases: sklego.linear_model.DemographicParityClassifier

Deprecated since version 0.4.0: Please use sklego.linear_model.DemographicParityClassifier instead

class sklego.linear_model.ImbalancedLinearRegression(alpha=0.0, l1_ratio=0.0, fit_intercept=True, copy_X=True, positive=False, overestimation_punishment_factor=1.0)[source]

Bases: sklego.linear_model.BaseScipyMinimizeRegressor

Linear regression where overestimating is overestimation_punishment_factor times worse than underestimating.

A value of overestimation_punishment_factor=5 implies that overestimations by the model are penalized with a factor of 5 while underestimations have a default factor of 1. The formula optimized for is

\[\]

rac{1}{2 N} |s circ (y - Xw) |_2^2 + lpha cdot l_1 cdot|w|_1 + rac{lpha}{2} cdot (1-l_1)cdot |w|_2^2

where circ is component-wise multiplication and s is a vector with value overestimation_punishment_factor if y - Xw < 0, else 1.

ImbalancedLinearRegression fits a linear model to minimize the residual sum of squares between the observed targets in the dataset, and the targets predicted by the linear approximation. Compared to normal linear regression, this approach allows for a different treatment of over or under estimations.

alphafloat, default=0.0

Constant that multiplies the penalty terms. Defaults to 1.0.

l1_ratiofloat, default=0.0

The ElasticNet mixing parameter, with 0 <= l1_ratio <= 1. For l1_ratio = 0 the penalty is an L2 penalty. For l1_ratio = 1 it is an L1 penalty. For 0 < l1_ratio < 1, the penalty is a combination of L1 and L2.

fit_interceptbool, default=True

Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered).

copy_Xbool, default=True

If True, X will be copied; else, it may be overwritten.

positivebool, default=False

When set to True, forces the coefficients to be positive.

overestimation_punishment_factorfloat, default=1

Factor to punish overestimations more (if the value is larger than 1) or less (if the value is between 0 and 1).

coef_np.array of shape (n_features,)

Estimated coefficients of the model.

intercept_float

Independent term in the linear model. Set to 0.0 if fit_intercept = False.

This implementation uses scipy.optimize.minimize, see https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html.

>>> import numpy as np
>>> np.random.seed(0)
>>> X = np.random.randn(100, 4)
>>> y = X @ np.array([1, 2, 3, 4]) + 2*np.random.randn(100)
>>> over_bad = ImbalancedLinearRegression(overestimation_punishment_factor=50).fit(X, y)
>>> over_bad.coef_
array([0.36267036, 1.39526844, 3.4247146 , 3.93679175])
>>> under_bad = ImbalancedLinearRegression(overestimation_punishment_factor=0.01).fit(X, y)
>>> under_bad.coef_
array([0.73519586, 1.28698197, 2.61362614, 4.35989806])
class sklego.linear_model.LADRegression(alpha=0.0, l1_ratio=0.0, fit_intercept=True, copy_X=True, positive=False)[source]

Bases: sklego.linear_model.QuantileRegression

Least absolute deviation Regression.

LADRegression fits a linear model to minimize the residual sum of absolute deviations between the observed targets in the dataset, and the targets predicted by the linear approximation, i.e.

\[\]

rac{1}{N}|y - Xw |_1 + lpha cdot l_1 cdot|w|_1 + rac{lpha}{2} cdot (1-l_1)cdot |w|_2^2

Compared to linear regression, this approach is robust to outliers. You can even optimize for the lowest MAPE (Mean Average Percentage Error), if you pass in np.abs(1/y_train) for the sample_weight keyword when fitting the regressor.

alphafloat, default=0.0

Constant that multiplies the penalty terms. Defaults to 1.0.

l1_ratiofloat, default=0.0

The ElasticNet mixing parameter, with 0 <= l1_ratio <= 1. For l1_ratio = 0 the penalty is an L2 penalty. For l1_ratio = 1 it is an L1 penalty. For 0 < l1_ratio < 1, the penalty is a combination of L1 and L2.

fit_interceptbool, default=True

Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered).

copy_Xbool, default=True

If True, X will be copied; else, it may be overwritten.

positivebool, default=False

When set to True, forces the coefficients to be positive.

coef_np.array of shape (n_features,)

Estimated coefficients of the model.

intercept_float

Independent term in the linear model. Set to 0.0 if fit_intercept = False.

This implementation uses scipy.optimize.minimize, see https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html.

>>> import numpy as np
>>> np.random.seed(0)
>>> X = np.random.randn(100, 4)
>>> y = X @ np.array([1, 2, 3, 4])
>>> l = LADRegression().fit(X, y)
>>> l.coef_
array([1., 2., 3., 4.])
>>> import numpy as np
>>> np.random.seed(0)
>>> X = np.random.randn(100, 4)
>>> y = X @ np.array([-1, 2, -3, 4])
>>> l = LADRegression(positive=True).fit(X, y)
>>> l.coef_
array([7.39575926e-18, 1.42423304e+00, 2.80467827e-17, 4.29789588e+00])
class sklego.linear_model.LowessRegression(sigma=1, span=None)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.RegressorMixin

Does LowessRegression. Note that this can get expensive to predict.

Parameters:
  • sigma – float, how wide we will smooth the data

  • span – float, what percentage of the data is to be used. Defaults to using all data.

fit(X, y)[source]

Fit the model using X, y as training data.

Parameters:
  • X – array-like, shape=(n_columns, n_samples, ) training data.

  • y – array-like, shape=(n_samples, ) training data.

Returns:

Returns an instance of self.

predict(X)[source]

Predict using the LowessRegression.

Parameters:

X – array-like, shape=(n_columns, n_samples, ) training data.

Returns:

Returns an array of predictions shape=(n_samples,)

class sklego.linear_model.ProbWeightRegression(non_negative=True)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.RegressorMixin

This regressor assumes that all input signals in X need to be reweighted with weights that sum up to one in order to predict y. This can be very useful in combination with sklego.meta.EstimatorTransformer because it allows you to construct an ensemble.

Parameters:

non_negative – boolean, default=True, setting that forces all weights to be >= 0

fit(X, y)[source]

Fit the model using X, y as training data.

Parameters:
  • X – array-like, shape=(n_columns, n_samples, ) training data.

  • y – array-like, shape=(n_samples, ) training data.

Returns:

Returns an instance of self.

predict(X)[source]

Predict using ProbWeightRegression.

Parameters:

X – array-like, shape=(n_columns, n_samples, ) training data.

Returns:

Returns an array of predictions shape=(n_samples,)

class sklego.linear_model.QuantileRegression(alpha=0.0, l1_ratio=0.0, fit_intercept=True, copy_X=True, positive=False, quantile=0.5)[source]

Bases: sklego.linear_model.BaseScipyMinimizeRegressor

Compute Quantile Regression. This can be used for computing confidence intervals of linear regressions. QuantileRegression fits a linear model to minimize a weighted residual sum of absolute deviations between the observed targets in the dataset and the targets predicted by the linear approximation, i.e.

1 / (2 * n_samples) * switch * ||y - Xw||_1 + alpha * l1_ratio * ||w||_1 + 0.5 * alpha * (1 - l1_ratio) * ||w||_2 ** 2

where switch is a vector with value quantile if y - Xw < 0, else 1 - quantile. The regressor defaults to LADRegression for its default value of quantile=0.5. Compared to linear regression, this approach is robust to outliers. Parameters ———- alpha : float, default=0.0

Constant that multiplies the penalty terms.

l1_ratiofloat, default=0.0

The ElasticNet mixing parameter, with 0 <= l1_ratio <= 1. For l1_ratio = 0 the penalty is an L2 penalty. For l1_ratio = 1 it is an L1 penalty. For 0 < l1_ratio < 1, the penalty is a combination of L1 and L2.

fit_interceptbool, default=True

Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered).

copy_Xbool, default=True

If True, X will be copied; else, it may be overwritten.

positivebool, default=False

When set to True, forces the coefficients to be positive.

quantilefloat, between 0 and 1, default=0.5

The line output by the model will have a share of approximately quantile data points under it. A value of quantile=1 outputs a line that is above each data point, for example. quantile=0.5 corresponds to LADRegression.

coef_np.ndarray of shape (n_features,)

Estimated coefficients of the model.

intercept_float

Independent term in the linear model. Set to 0.0 if fit_intercept = False.

This implementation uses scipy.optimize.minimize, see https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html. Examples ——– >>> import numpy as np >>> np.random.seed(0) >>> X = np.random.randn(100, 4) >>> y = X @ np.array([1, 2, 3, 4]) >>> l = QuantileRegression().fit(X, y) >>> l.coef_ array([1., 2., 3., 4.]) >>> import numpy as np >>> np.random.seed(0) >>> X = np.random.randn(100, 4) >>> y = X @ np.array([-1, 2, -3, 4]) >>> l = QuantileRegression(quantile=0.8).fit(X, y) >>> l.coef_ array([-1., 2., -3., 4.])

fit(X, y, sample_weight=None)[source]

Fit the model using the SLSQP algorithm. Parameters ———- X : np.ndarray of shape (n_samples, n_features)

The training data.

ynp.ndarray, 1-dimensional

The target values.

sample_weightOptional[np.ndarray], default=None

Individual weights for each sample.

Fitted regressor.