Test data (test_data.csv)
Within the data/ directory, the script generate_data.py creates a synthetic .csv file
that can be used as input for a run of main.py in the ProMeteo library.
This dataset is designed to simulate a set of raw data sampled from an anemometer
of unspecified model. The wind components are assumed to be defined in the proprietary
sonic coordinate system of the instrument.
Only the u component of the wind is intentionally altered by introducing anomalies,
with the goal of testing the capabilities of the pre_processing module.
An example of how to use these features and the resulting effects on the raw data
is provided at the page: Pre-processing Module – Usage Example.
Variables
All variables are generated from a normal distribution:
u,v: mean = 2, std = 1w: mean = 0.01, std = 1T_s: mean = 20, std = 1
Timestamps
Time range: 2012-09-28 02:00:00 to 2012-09-28 03:00:00
Frequency: 20 Hz (every 50 milliseconds)
Total rows: 72,001
Data Anomalies Introduced
1. Missing Timestamps
Gap introduced from
02:05:00.000to02:05:10.000Duration: 10 seconds
Missing samples: 201
2. Extreme Values
uset to 100 m/s (unrealistically high) from02:30:00.000to02:30:00.100Duration: 100 milliseconds
Affected samples: 3
3. Spikes in ``u``
Single spike at
02:15:00.000: +20 m/sSingle spike at
02:16:00.000: –20 m/sThree consecutive spikes from
02:25:00.000to02:25:00.100: +20 m/sFour consecutive spikes from
02:35:00.000to02:35:00.150: +20 m/s
4. Missing Values
uset to NaN from02:40:00.000to02:40:05.000Duration: 5 seconds
Missing samples: 101
Output
File:
test_data.csvFormat: CSV with timestamp index
Missing values are saved as
"NaN"