ProMeteo functions

core

core.import_data(path: str) → DataFrame

Imports a CSV file containing data collected from sonic anemometer.

The file must contain the columns in the order: [“Time”, “u”, “v”, “w”, “T_s”], where “Time” is a column with valid timestamps and the other columns contain float values.

Parameters:

path (str or Path) – Path to the CSV file to be read.

Returns:

data – DataFrame with the timestamp as the index (from the “Time” column) and columns [“u”, “v”, “w”, “T_s”].

Return type:

pd.DataFrame

Raises:

FileNotFoundError – If the file does not exist at the specified path.
ValueError – If the columns in the file do not match the expected order, if any timestamp is invalid, or if the “Time” column contains values that cannot be converted to datetime.

core.load_config(path: str) → dict

Load and validate parameters from a configuration file.

This function reads a configuration file in INI format (e.g., config.txt), extracts and validates the necessary parameters, and returns them in a dictionary.

Parameters:

path (str) – Path to the configuration file (e.g., ‘config.txt’).

Returns:

Dictionary containing validated parameters required by main.py.

Return type:

dict

Raises:

FileNotFoundError – If the configuration file cannot be found or read.
configparser.NoSectionError – If a required section is missing from the configuration.
configparser.NoOptionError – If a required option is missing from a section.
ValueError – If a parameter has an invalid value or cannot be converted to its expected type.

core.min_to_points(minutes: int, sampling_freq: int) → int

Computes the number of data points contained in a signal of known sampling frequency given the time length in minutes, ensuring that the result is always an odd number, unless the result is 0, in which case it returns 0.

Parameters:

minutes (int) – Duration of the signal in minutes.
sampling_freq (int) – Sampling frequency in Hz (samples per second).

Returns:

n_points – Total number of data points in the signal for the given duration, adjusted to be odd if necessary. Returns 0 if the result is zero.

Return type:

int

core.running_stats(array: ndarray, window_length: int) → Tuple[ndarray, ndarray]

Compute the running (moving) mean and standard deviation of a 1D array using a sliding window.

This function uses a centered sliding window of fixed length to calculate the mean and standard deviation at each point in the input array. The array is padded at the edges to allow computation at the boundaries.

Parameters:

array (np.ndarray) – Input 1D array of numerical values.
window_length (int) – Number of points within the sliding window. Must be a positive odd integer less than or equal to the array length.

Returns:

running_mean (np.ndarray) – Array of the same length as the input, containing the running mean.
running_std (np.ndarray) – Array of the same length as the input, containing the running standard deviation.

Raises:

ValueError – If window_length is not a positive integer. If window_length is greater than the length of the input array.

Warns:

UserWarning – If window_length is even, a warning is issued that using an even-length window may result in asymmetric behavior.

Notes

The function pads the input array using constant values equals to the edge values of the array.
If the input contains NaNs, they are ignored in the mean and std computation using np.nanmean and np.nanstd.

core.running_stats_robust(array: ndarray, window_length: int) → Tuple[ndarray, ndarray]

Compute the running (moving) median and robust standard deviation of a 1D array using a sliding window.

This function uses a centered sliding window of fixed length to calculate the median and a robust estimate of the standard deviation at each point in the input array. The robust standard deviation is computed as half the difference between the 84th and 16th percentiles. The array is padded at the edges to allow computation at the boundaries.

Parameters:

array (np.ndarray) – Input 1D array of numerical values.
window_length (int) – Number of points within the sliding window. Must be a positive odd integer less than or equal to the array length.

Returns:

running_median (np.ndarray) – Array of the same length as the input, containing the running median.
running_std_robust (np.ndarray) – Array of the same length as the input, containing the robust running standard deviation computed as (84th percentile - 16th percentile) / 2.

Raises:

ValueError – If window_length is not a positive integer.
ValueError – If window_length is greater than the length of the input array.

Warns:

UserWarning – If window_length is even, a warning is issued that using an even-length window may result in asymmetric behavior.

Notes

The function pads the input array using constant values equal to the edge values of the array.
If the input contains NaNs, they are ignored in the percentile and median computations using np.nanpercentile and np.nanmedian.

pre_processing

pre_processing.despiking_VM97(array_to_despike: ndarray, c: float, window_length: int, max_consecutive_spikes: int, max_iterations: int, logger: Logger | None = None) → ndarray

Applies the despiking algorithm based on Vickers and Mahrt (1997) to remove spikes from a time series.

This method identifies and replaces spikes in the input array by comparing values against a running mean and standard deviation computed over a moving window. Points lying beyond a threshold defined by c times the local standard deviation from the local mean are considered spikes. Spikes are replaced using interpolation if their number is below max_consecutive_spikes.

The c factor is incrementally increased after each iteration. The process stops when no more spikes are detected or when max_iterations is reached.

The function calls pre_processing.identify_inter_spikes() and core.running_stats() functions.

Parameters:

array_to_despike (np.ndarray) – The input 1D array containing the signal to be despiked.
c (float) – Initial threshold multiplier for the standard deviation used to detect spikes.
window_length (int) – Length of the moving window used to compute running statistics.
max_consecutive_spikes (int) – Maximum number of consecutive spike points allowed for interpolation.
max_iterations (int) – Maximum number of iterations to perform if spikes continue to be found.
logger (Optional[logging.Logger], default=None) – A logger instance following the logging.Logger interface. If provided, the function will use it to log dialogues during the despiking procedure. If set to None, the function will operate silently without producing any log output.

Returns:

array_despiked – The despiked version of the input array.

Return type:

np.ndarray

Raises:

ValueError – If c is not a positive number.
ValueError – If window_length is not a positive integer.
ValueError – If max_consecutive_spikes is not a positive integer.
ValueError – If max_iterations is not a positive integer.
ValueError – If logger is not a logging.Logger instance or None.

References

Vickers, D., & Mahrt, L. (1997). Quality control and flux sampling problems for tower and aircraft data. Journal of Atmospheric and Oceanic Technology, 14(3), 512–526. https://doi.org/10.1175/1520-0426(1997)014<0512:QCAFSP>2.0.CO;2

pre_processing.despiking_robust(array_to_despike: ndarray, c: float, window_length: int) → Tuple[ndarray, int]

Applies a non-iterative despiking algorithm using robust statistics to remove spikes from a time series.

This method detects spikes by comparing each value in the input array against a local running median and a robust estimate of the local variability, computed over a moving window. A point is classified as a spike if it lies outside a dynamic threshold defined by c times the robust standard deviation added to and subtracted from the running median. The robust standard deviation is defined as half the inter-percentile range between the 84th and 16th percentiles within the moving window.

Detected spikes are replaced with the corresponding value of the running median. This procedure is applied in a single pass and does not perform iterative refinement.

The function calls core.running_stats_robust() function.

Parameters:

array_to_despike (np.ndarray) – The input 1D array containing the signal to be despiked.
c (float) – Threshold multiplier for the robust standard deviation used to detect spikes.
window_length (int) – Length of the moving window used to compute the running median and robust statistics.

Returns:

array_despiked (np.ndarray) – The despiked version of the input array, with spikes replaced by the running median.
count_spike (int) – The total number of spikes detected and replaced.

Raises:

ValueError – If c is not positive.
ValueError – If window_length is not a positive integer.

pre_processing.fill_missing_timestamps(data: DataFrame, freq: float) → DataFrame

Returns the input DataFrame data with a complete datetime index: all timestamps between the first and last entry are included based on the specified frequency, and missing timestamps are filled with rows containing NaN values.

Parameters:

data (pd.DataFrame) – DataFrame with a datetime index.
freq (float) – Sampling frequency in Hertz (Hz).

Returns:

complete_data – DataFrame reindexed to include all expected timestamps.

Return type:

pd.DataFrame

Raises:

ValueError – If freq is negative.

pre_processing.identify_interp_spikes(array: ndarray, mask: ndarray, max_length_spike: int) → tuple[ndarray, int]

Identifies and interpolates spikes in the provided array based on the given mask.

A spike is defined as a sequence of consecutive True values in the mask that is shorter than or equal to the specified maximum length (max_length_spike). The function applies linear interpolation to replace the spike values with interpolated ones by calling core.linear_interp(). If the spike is at the boundary of the array (either at the start or at the end), interpolation is not performed.

Parameters:

array (np.ndarray) – The 1D array containing the data to be processed.
mask (np.ndarray) – A boolean mask where True values indicate potential spikes.
max_length_spike (int) – The maximum length of consecutive True values in the mask to be considered a spike.

Returns:

array (np.ndarray) – The modified array with interpolated spike values
count_spike (int) – The total count of detected spikes.

Raises:

ValueError – If array and mask do not have the same length.
ValueError – If mask is not a boolean array.
ValueError – If max_length_spike is not a positive integer.

Notes

If either the left or right neighbor is missing (i.e., the spike is at the boundary), the spike is not interpolated.
Only sequences of True values that are smaller than or equal to max_length_spike are considered spikes.

pre_processing.interp_nan(array: ndarray) → Tuple[ndarray, int]

Interpolates NaN values in the input array using linear interpolation. NaNs are replaced with values computed by core.linear_interp() function, using the closest non-NaN values to the left and right as reference points. If NaNs are at the edges of the array (i.e., without valid neighbors on both sides), they are left unchanged.

Parameters:

array (np.ndarray) – The input array containing NaN values to interpolate.

Returns:

array_interp (np.ndarray) – A copy of the input array with NaN values replaced by interpolated values, where possible.
count_interp (int) – Number of NaN values that were successfully interpolated.

pre_processing.linear_interp(left_value: float, right_value: float, length: int) → ndarray

Performs linear interpolation between left_value and right_value, returning a NumPy array of length length whose elements represent evenly spaced values between the two endpoints.

Parameters:

left_value (float) – The reference value on the left border.
right_value (float) – The reference value on the right border.
length (int) – The length of the array to perform the interpolation over.

Returns:

interpolated_values – An array containing the interpolated values.

Return type:

np.ndarray

Raises:

ValueError – If length is not a positive integer.

pre_processing.remove_beyond_threshold(array: ndarray, threshold: float) → Tuple[ndarray, int]

Replaces all values in the input array that exceed a given absolute threshold with NaN.

Parameters:

array (np.ndarray) – The 1D array of numerical values to be cleaned.
threshold (float) – Absolute threshold. All values with absolute magnitude greater than this will be set to NaN.

Returns:

array_clean (np.ndarray) – A copy of the input array with values exceeding the threshold replaced by NaN.
count_beyond (int) – The number of elements that were beyond the threshold and replaced.

Raises:

ValueError – If threshold is negative.

frame

frame.rotation_to_LEC_reference(wind: ndarray, azimuth: float, model: str) → ndarray

Rotate the wind vector from the anemometer reference system to the Local Earth Coordinate system (LEC), given the orientation azimuth of the anemometer head with respect to the North.

Parameters:

wind (np.ndarray) – A 3xN array (shape (3, N)), where each column is a wind vector at a different time instant. The three rows correspond to the three velocity components.
azimuth (float) – The azimuth angle in degrees measured clockwise from the North, describing the orientation of the anemometer head with respect to the North-
model (str) – The anemometer model used for the measurement. Only two models are supported: “RM_YOUNG_81000”, “CAMPBELL_CSAT3”

Returns:

wind_rotated – A 3xN array (shape (3, N)) of wind vectors rotated into the LEC reference frame, with the y-axis aligned to the geographic North.

Return type:

np.ndarray

Raises:

ValueError – If ‘wind’ does not have shape (3, N).
ValueError – If the azimuth is outside the range [0, 360].
ValueError – If the anemometer model is not recognized (i.e., not “RM_YOUNG_81000” or “CAMPBELL_CSAT3”).

Notes

The function applies two sequential rotations: 1. A model-dependent transformation that maintains the Cartesian reference frame while ensuring that the u and v wind components are positive when aligned with the x- and y-axes, respectively. 2. A rotation aligning the y-axis to the North, according to the specified azimuth.

frame.rotation_to_streamline_reference(wind: ndarray, wind_averaged: ndarray) → ndarray

Rotate wind velocity components into the streamline coordinate system, using the double rotation method described in Kaimal and Finnigan (1979).

This technique aligns the coordinate system with the average wind direction, such that: - the streamwise component (ũ) approximates the total wind speed, - the crosswise (ṽ) and vertical (w̃) components are minimized.

The rotation is defined at each instant using the average wind vector, removing the mean crosswind and vertical components and aligning the flow with the x-axis of the new reference frame.

Parameters:

wind (np.ndarray) – Instantaneous wind velocity components of shape (3, N), where the first index represents (u, v, w).
wind_averaged (np.ndarray) – Averaged (mean) wind velocity components of shape (3, N), used to define the streamline reference frame at each instant.

Returns:

wind_rotated – Wind velocity components rotated into the streamline coordinate system, of shape (3, N).

Return type:

np.ndarray

Raises:

ValueError – If ‘wind’ or ‘wind_averaged’ do not have shape (3, N).
ValueError – If ‘wind’ and ‘wind_averaged’ do not have the same number of columns (N).

Notes

This method is most appropriate for stationary signals, where the mean wind vector is well defined and stable over time.