Skip to content

Smoothing

The smoothing module offers three practical methods for data smoothing: bin, window, and savgol_filter. The bin method groups data into equal-width bins and computes the mean for each bin, ideal for reducing noise in large datasets. The window method applies a rolling window smoothing technique, allowing users to specify the window size and type for flexible smoothing. The savgol_filter method uses the Savitzky-Golay filter to smooth data while preserving features like peaks, making it suitable for signal processing tasks.

bin(dfs, column, minimum, maximum, width)

Bins the data in the specified column of the DataFrame(s) into equal-width bins and computes the mean of each bin.

Parameters:

Name Type Description Default
dfs pandas.DataFrame or list of pandas.DataFrame

The input DataFrame(s) containing the data to be binned.

required
column str

The name of the column in the DataFrame(s) to be binned.

required
minimum float

The minimum value of the range to be binned.

required
maximum float

The maximum value of the range to be binned.

required
width float

The width of each bin.

required

Returns:

Type Description
DataFrame

pandas.DataFrame: A DataFrame containing the mean values of each bin, with the bin midpoints as the values in the specified column.

Examples:

>>> import quantalyze as qz
>>> binned_df = qz.bin(df, 'column_to_bin', minimum=0, maximum=100, width=0.5)
Source code in src/quantalyze/core/smoothing.py
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
def bin(dfs, column, minimum, maximum, width) -> pd.DataFrame:
    """
    Bins the data in the specified column of the DataFrame(s) into equal-width bins and computes the mean of each bin.

    Args:
        dfs (pandas.DataFrame or list of pandas.DataFrame): The input DataFrame(s) containing the data to be binned.
        column (str): The name of the column in the DataFrame(s) to be binned.
        minimum (float): The minimum value of the range to be binned.
        maximum (float): The maximum value of the range to be binned.
        width (float): The width of each bin.

    Returns:
        pandas.DataFrame: A DataFrame containing the mean values of each bin, with the bin midpoints as the values in the specified column.

    Examples:
        >>> import quantalyze as qz
        >>> binned_df = qz.bin(df, 'column_to_bin', minimum=0, maximum=100, width=0.5)
    """
    def bin_single_df(df):
        bin_edges = np.arange(minimum - width / 2, maximum + width / 2 + width, width)
        bin_midpoints = (bin_edges[1:] + bin_edges[:-1]) / 2
        df['bin'] = pd.cut(df[column], bins=bin_edges, include_lowest=True)
        binned = df.groupby('bin').agg('mean').reset_index()
        binned[column] = bin_midpoints
        binned = binned.drop('bin', axis=1)
        return binned

    if isinstance(dfs, list):
        combined_df = pd.concat(dfs, ignore_index=True)
        return bin_single_df(combined_df)
    else:
        return bin_single_df(dfs)

window(df, column, window_size=5, window_type='triang')

Applies a rolling window smoothing to a specified column in a DataFrame.

Parameters:

Name Type Description Default
df DataFrame

The input DataFrame containing the data.

required
column str

The name of the column to which the rolling window smoothing will be applied.

required
window_size int

The size of the rolling window. Default is 5.

5
window_type str

The type of window to use. Default is 'triang'.

'triang'

Returns:

Type Description
Series

pandas.Series: A Series with the smoothed values.

Examples:

>>> import quantalyze as qz
>>> df['smoothed_column'] = qz.window(df, 'column_to_be_smoothed', window_size=7)
Source code in src/quantalyze/core/smoothing.py
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
def window(df, column, window_size=5, window_type='triang') -> pd.Series:
    """
    Applies a rolling window smoothing to a specified column in a DataFrame.

    Args:
        df (pandas.DataFrame): The input DataFrame containing the data.
        column (str): The name of the column to which the rolling window smoothing will be applied.
        window_size (int, optional): The size of the rolling window. Default is 5.
        window_type (str, optional): The type of window to use. Default is 'triang'.

    Returns:
        pandas.Series: A Series with the smoothed values.

    Examples:
        >>> import quantalyze as qz
        >>> df['smoothed_column'] = qz.window(df, 'column_to_be_smoothed', window_size=7)
    """
    return df[column].rolling(window=window_size, win_type=window_type, center=True).mean()

savgol_filter(df, column, window_size=5, order=2)

Applies a Savitzky-Golay filter to a specified column in a DataFrame.

Parameters:

Name Type Description Default
df DataFrame

The input DataFrame containing the data.

required
column str

The name of the column to which the Savitzky-Golay filter will be applied.

required
window_size int

The size of the window. Default is 5.

5
order int

The order of the polynomial used to fit the samples. Default is 2.

2

Returns:

Type Description
Series

pandas.Series: A Series with the smoothed values.

Examples:

>>> import quantalyze as qz
>>> df['smoothed_column'] = qz.savgol_filter(df, 'column_to_be_smoothed', window_size=7, order=2)
Source code in src/quantalyze/core/smoothing.py
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
def savgol_filter(df, column, window_size=5, order=2) -> pd.Series:
    """
    Applies a Savitzky-Golay filter to a specified column in a DataFrame.

    Args:
        df (pandas.DataFrame): The input DataFrame containing the data.
        column (str): The name of the column to which the Savitzky-Golay filter will be applied.
        window_size (int, optional): The size of the window. Default is 5.
        order (int, optional): The order of the polynomial used to fit the samples. Default is 2.

    Returns:
        pandas.Series: A Series with the smoothed values.

    Examples:
        >>> import quantalyze as qz
        >>> df['smoothed_column'] = qz.savgol_filter(df, 'column_to_be_smoothed', window_size=7, order=2)
    """
    return pd.Series(scipy_savgol_filter(df[column], window_size, order), index=df.index)