Pandas Utils
- sklego.pandas_utils.add_lags(X, cols, lags, drop_na=True)[source]
Appends lag column(s).
- Parameters
X – array-like, shape=(n_columns, n_samples,) training data.
cols – column name(s) or index (indices).
lags – the amount of lag for each col.
drop_na – remove rows that contain NA values.
- Returns
pd.DataFrame | np.ndarray
with only the selected cols.- Example
>>> import pandas as pd >>> df = pd.DataFrame([[1, 2, 3], ... [4, 5, 6], ... [7, 8, 9]], ... columns=['a', 'b', 'c'], ... index=[1, 2, 3])
>>> add_lags(df, 'a', [1]) a b c a1 1 1 2 3 4.0 2 4 5 6 7.0
>>> add_lags(df, ['a', 'b'], 2) a b c a2 b2 1 1 2 3 7.0 8.0
>>> import numpy as np >>> X = np.array([[1, 2, 3], ... [4, 5, 6], ... [7, 8, 9]])
>>> add_lags(X, 0, [1]) array([[1, 2, 3, 4], [4, 5, 6, 7]])
>>> add_lags(X, 1, [-1, 1]) array([[4, 5, 6, 2, 8]])
- sklego.pandas_utils.log_step(func=None, *, time_taken=True, shape=True, shape_delta=False, names=False, dtypes=False, print_fn=<built-in function print>, display_args=True, log_error=True)[source]
Decorates a function that transforms a pandas dataframe to add automated logging statements
- Parameters
func – callable, function to log, defaults to None
time_taken – bool, log the time it took to run a function, defaults to True
shape – bool, log the shape of the output result, defaults to True
shape_delta – bool, log the difference in shape of input and output, defaults to False
names – bool, log the names of the columns of the result, defaults to False
dtypes – bool, log the dtypes of the results, defaults to False
print_fn – callable, print function (e.g. print or logger.info), defaults to print
print_args – bool, whether or not to print the arguments given to the function.
log_error – bool, whether to add the Exception message to the log if the function fails, defaults to True.
- Returns
the result of the function
- Example
>>> @log_step ... def remove_outliers(df, min_obs=5): ... pass
>>> @log_step(print_fn=logging.info, shape_delta=True) ... def remove_outliers(df, min_obs=5): ... pass
- sklego.pandas_utils.log_step_extra(*log_functions, print_fn=<built-in function print>, **log_func_kwargs)[source]
Decorates a function that transforms a pandas dataframe to add automated logging statements
- Parameters
*log_functions –
callable(s), functions that take the output of the decorated function and turn it into a log. Note that the output of each log_function is casted to string using str()
print_fn – callable, print function (e.g. print or logger.info), defaults to print
**log_func_kwargs –
keyword arguments to be passed to log_functions
- Returns
the result of the function
- Example
>>> @log_step_extra(lambda d: d["some_column"].value_counts()) ... def remove_outliers(df, min_obs=5): ... pass