Pandas Utils

sklego.pandas_utils.add_lags(X, cols, lags, drop_na=True)[source]

Appends lag column(s).

Parameters
  • X – array-like, shape=(n_columns, n_samples,) training data.

  • cols – column name(s) or index (indices).

  • lags – the amount of lag for each col.

  • drop_na – remove rows that contain NA values.

Returns

pd.DataFrame | np.ndarray with only the selected cols.

Example

>>> import pandas as pd
>>> df = pd.DataFrame([[1, 2, 3],
...                    [4, 5, 6],
...                    [7, 8, 9]],
...                    columns=['a', 'b', 'c'],
...                    index=[1, 2, 3])
>>> add_lags(df, 'a', [1]) 
   a  b  c  a1
1  1  2  3  4.0
2  4  5  6  7.0
>>> add_lags(df, ['a', 'b'], 2) 
   a  b  c  a2   b2
1  1  2  3  7.0  8.0
>>> import numpy as np
>>> X = np.array([[1, 2, 3],
...               [4, 5, 6],
...               [7, 8, 9]])
>>> add_lags(X, 0, [1])
array([[1, 2, 3, 4],
       [4, 5, 6, 7]])
>>> add_lags(X, 1, [-1, 1])
array([[4, 5, 6, 2, 8]])
sklego.pandas_utils.log_step(func=None, *, time_taken=True, shape=True, shape_delta=False, names=False, dtypes=False, print_fn=<built-in function print>, display_args=True, log_error=True)[source]

Decorates a function that transforms a pandas dataframe to add automated logging statements

Parameters
  • func – callable, function to log, defaults to None

  • time_taken – bool, log the time it took to run a function, defaults to True

  • shape – bool, log the shape of the output result, defaults to True

  • shape_delta – bool, log the difference in shape of input and output, defaults to False

  • names – bool, log the names of the columns of the result, defaults to False

  • dtypes – bool, log the dtypes of the results, defaults to False

  • print_fn – callable, print function (e.g. print or logger.info), defaults to print

  • print_args – bool, whether or not to print the arguments given to the function.

  • log_error – bool, whether to add the Exception message to the log if the function fails, defaults to True.

Returns

the result of the function

Example

>>> @log_step
... def remove_outliers(df, min_obs=5):
...     pass
>>> @log_step(print_fn=logging.info, shape_delta=True)
... def remove_outliers(df, min_obs=5):
...     pass
sklego.pandas_utils.log_step_extra(*log_functions, print_fn=<built-in function print>, **log_func_kwargs)[source]

Decorates a function that transforms a pandas dataframe to add automated logging statements

Parameters
  • *log_functions

    callable(s), functions that take the output of the decorated function and turn it into a log. Note that the output of each log_function is casted to string using str()

  • print_fn – callable, print function (e.g. print or logger.info), defaults to print

  • **log_func_kwargs

    keyword arguments to be passed to log_functions

Returns

the result of the function

Example

>>> @log_step_extra(lambda d: d["some_column"].value_counts())
... def remove_outliers(df, min_obs=5):
...     pass