Test data (test_data.csv)

Within the data/ directory, the script generate_data.py creates a synthetic .csv file that can be used as input for a run of main.py in the ProMeteo library. This dataset is designed to simulate a set of raw data sampled from an anemometer of unspecified model. The wind components are assumed to be defined in the proprietary sonic coordinate system of the instrument.

Only the u component of the wind is intentionally altered by introducing anomalies, with the goal of testing the capabilities of the pre_processing module. An example of how to use these features and the resulting effects on the raw data is provided at the page: Pre-processing Module – Usage Example.

Variables

All variables are generated from a normal distribution:

  • u, v: mean = 2, std = 1

  • w: mean = 0.01, std = 1

  • T_s: mean = 20, std = 1

Timestamps

  • Time range: 2012-09-28 02:00:00 to 2012-09-28 03:00:00

  • Frequency: 20 Hz (every 50 milliseconds)

  • Total rows: 72,001

Data Anomalies Introduced

1. Missing Timestamps

  • Gap introduced from 02:05:00.000 to 02:05:10.000

  • Duration: 10 seconds

  • Missing samples: 201

2. Extreme Values

  • u set to 100 m/s (unrealistically high) from 02:30:00.000 to 02:30:00.100

  • Duration: 100 milliseconds

  • Affected samples: 3

3. Spikes in ``u``

  • Single spike at 02:15:00.000: +20 m/s

  • Single spike at 02:16:00.000: –20 m/s

  • Three consecutive spikes from 02:25:00.000 to 02:25:00.100: +20 m/s

  • Four consecutive spikes from 02:35:00.000 to 02:35:00.150: +20 m/s

4. Missing Values

  • u set to NaN from 02:40:00.000 to 02:40:05.000

  • Duration: 5 seconds

  • Missing samples: 101

Output

  • File: test_data.csv

  • Format: CSV with timestamp index

  • Missing values are saved as "NaN"