Methodology for XAI4HEAT solution

The main pipeline of the project consists of the following main groups of activities: data collection, AI models development and knowledge-based models’ development (including XAI) and validation.

Data Collection

For demand forecasting model data collection, the research will use the existing DHS infrastructure, the Faculty of Mechanical Engineering in Nis District Heating System (FMEDH). Demand/supply forecasting and consumption prediction models are based on data that will be acquired in the first 24 months of the project. Longer period is required because of the seasonality effects; during validation, the LSTM model will be trained with 50% (one year) of data, while remaining data will be used for testing. The sampling frequency is 15 min (equal to FMEDH system inertia in ideal conditions).

FMEDH has 12 consumers (and respective heating substations) of different demands, namely secondary education, higher-education and research organizations, student dormitory, restaurant and a residential block and total of nearly 120.000m2 heating area. Natural gas is used as primary fuel. FMEDH is equipped with SCADA system with continuous data acquisition facility (in plant and substations, primary and secondary supply and return lines water temperature, fluid flow and pressure) in last 10 years.

Both historical and forecasted weather data will be acquired from the Republic Hydrometeorological Service of Serbia (RHSS) weather stations. To improve the prediction model accuracy, a regression model will be implemented to make localized weather forecasts. Wind direction and speed data for that model will be collected by an anemometric station which will be installed in the area of FMEDH plant coverage. For predicting local ambient temperature historical data collected already by the sensor in each of the substations will be used. Finally, for collection of data for the individual consumers, a representative number of smart indoor air quality monitor devices (measuring humidity, air quality, noise and temperature) will be located in the selected apartments. Data acquisition will also include analysis and mitigation of possible risks from sensor faults and inaccurate readings. In addition, the effect of hydraulic imbalance in the individual consumers (buildings), leading to differences in hot water temperatures in lower and higher flats, will be considered in the consumers’ data acquisition strategy.

Predictive and problem detection models development

The demand/supply forecasting and end user model development will follow the typical Machine Learning pipeline: data preparation (including cleaning, transformation, missing data imputation, outlier detection and treatment), feature engineering (including correlation analysis), prediction model architecture design, hyper-parameters optimization, training and validation. The development of the forecasting model will be staged. Initial heat load forecasting model will be trained and tested by using existing FMEDH SCADA system’s historical data and meteorological data, available from the beginning of the project. The model will consider historical data on plant’s and substations’ heat loads and weather data (unfiltered, raw data from RHSS weather station Niš ) as features. Final demand/supply forecasting model will include additional features, localized weather data and more recent data from SCADA system.

DNN model architecture design is a special methodological challenge and different choices will be investigated. Many researches have already shown that LSTM performs best in exploiting the long-term dependencies in time series data, especially for ensuring trend and seasonality effects to the forecasts. Noise is treated as non-linear and non-stationary random signal. Different approaches for its treatment will be investigated, including Empirical Mode Decomposition (EMD), Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) and Adaptive Noise Reducer (ANR) methods. Non-linearities and non-regularities can be addressed also by ensemble learning approach, such as AdaBoost-LSTM . Hybrid and/or ensemble learning approaches have high potential for multivariate time-series forecasting where LSTM is in charge for extracting temporal features while some other method is used in generalization of the others. Convolutional Neural Network is often used as this another method . Also, sometimes multiple LSTMs are combined .

Optimization of hyper-parameters is also a time-consuming job, due to a high sensitivity in complex DL architectures. Different tactics will be applied here, such as Bayesian optimization or using genetic algorithm (GA) to find optimal time lags and number of layers or dimension-reducing symbolic representation .

Development of unsupervised model for anomaly detection will be based on existing, historical data. Data pattern matching indicating unwanted but expected behavior algorithm will be implemented, both for incoming data streams and bulk historical data (sliding window approach).

Validation

Validation is already embedded in the forecasting models development pipeline; data collected in the second year of data acquisition activities will be used for testing the forecasting models and their optimization. As for the overall approach, a 3-month trial period is planned, during which operation parameters recommended by the empirical model will be compared with actual, implemented heating strategies (by plant operator and SCADA). Key indicators for comparative analysis are fuel consumption, CO2 emissions and consistency of the operation parameters predicted by empirical and knowledge-based model (indicating its accuracy). During the trial period, a group of volunteers (from the apartments hosting the smart indoor air quality devices) will be offered to use the prototype of XAI4HEAT Mobile app, with minimum key functionality. After the trial period, their feedback, including satisfaction rate will be collected. Survey involving all end-users, related to measuring the thermal comfort satisfaction will be also implemented.