Decomposition

class sklego.decomposition.PCAOutlierDetection(n_components=None, threshold=None, variant='relative', whiten=False, svd_solver='auto', tol=0.0, iterated_power='auto', random_state=None)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.OutlierMixin

Does outlier detection based on the reconstruction error from PCA.

decision_function(X)[source]
difference(X)[source]

Shows the calculated difference between original and reconstructed data. Row by row.

Parameters

X – array-like, shape=(n_columns, n_samples, ) training data.

Returns

array, shape=(n_samples,) the difference

fit(X, y=None)[source]

Fit the model using X as training data.

Parameters
  • X – array-like, shape=(n_columns, n_samples,) training data.

  • y – ignored but kept in for pipeline support

Returns

Returns an instance of self.

predict(X)[source]

Predict if a point is an outlier.

Parameters

X – array-like, shape=(n_columns, n_samples, ) training data.

Returns

array, shape=(n_samples,) the predicted data. 1 for inliers, -1 for outliers.

score_samples(X)[source]
transform(X)[source]

Uses the underlying PCA method to transform the data.

class sklego.decomposition.UMAPOutlierDetection(n_components=2, threshold=None, variant='relative', n_neighbors=15, min_dist=0.1, metric='euclidean', random_state=None)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.OutlierMixin

Does outlier detection based on the reconstruction error from UMAP.

difference(X)[source]

Shows the calculated difference between original and reconstructed data. Row by row.

Parameters

X – array-like, shape=(n_columns, n_samples, ) training data.

Returns

array, shape=(n_samples,) the difference

fit(X, y=None)[source]

Fit the model using X as training data.

Parameters
  • X – array-like, shape=(n_columns, n_samples,) training data.

  • y – ignored but kept in for pipeline support

Returns

Returns an instance of self.

predict(X)[source]

Predict if a point is an outlier.

Parameters

X – array-like, shape=(n_columns, n_samples, ) training data.

Returns

array, shape=(n_samples,) the predicted data. 1 for inliers, -1 for outliers.

transform(X)[source]

Uses the underlying UMAP method to transform the data.