Background and the research problem

In a nutshell, DHS plant operation means automated or semi-automated control of primary (plant level) and secondary (substations level) supply lines water temperature and water flow on the primary supply line, based on the overall DHS demand and current weather conditions.

Overall DHS demand (or heat load) corresponds to the difference between supply and return water temperature, measured in the plant, corrected with distribution losses. Conventional DHSs are today controlled by Supervisory Control And Data Acquisition (SCADA) system, consisting of different sensors, control mechanisms and integrated algorithms which automatically change the operation parameters based on the sensor readings. DHS control is fully automated, both on the boiler plant and on the district heating substation level (primary and secondary side of DHS) appropriate hot water reset control (outdoor air reset; control curve) is implemented (often referred to as regulation curve, see figure below for example regulation curve with multiple control set points).

Although automated, current DHS control is reactive; it does not consider forecasted weather or other endogenous or exogeneous features, such as a weather forecast including ambient temperature, wind speed and direction, humidity, solar radiance and others, and assumed average resident behavior (presence or absence, active living/resting/sleeping, etc.). Furthermore, the control is based on the choices of the control set points; even though they are being updated based on the experience of the plant operator, these updates occur rarely, so they cannot reflect the fine dynamics of the demand (for example, weekly seasonality). For small systems with low degrees of inertia (time to deliver the produced heat to consumers), this is not necessarily a problem. However, typical urban DHS are quite complex and system inertia may also be as high as 1-2 hours; in those cases, forecasting capability is very important for the efficient plant operation.

Capability to forecast future heat load is critical for optimal and proactive DHS plant operation. Accurate forecasts facilitate decisions related to controlling the supply line water temperature and flow. Their appropriateness and timeliness have large impact on residents’ comfort and satisfaction, optimal use of resources and environment. For example, short-term forecasts of daily or even hourly heat demand can be used for optimal heat production and distribution. Long-term forecasts can be used for gas supply planning, different strategies’ analyses and DHS maintenance scheduling.

Conventional heat load forecasting models use rule-based approach where rules are mostly based on the weather forecast (based on ambient temperature forecasts only), namely a weather compensation control strategy. Such an approach produces decisions which are considered “good enough” in some cases, but certainly not optimal. More than sometimes, over-heating occurs, and it can be diagnosed by high supply line water temperatures and decreased heat load. While this does not have a negative effect to the residents’ comfort, it increases fuel consumption and has associated negative effects to the environment.

More complex knowledge-based models consider analytical or numerical system view (so-called ‘white-box’ models) and sometimes, forecasts are assisted by the simulation scenarios. The accuracy and complexity of such models increases with level of detail in modeling approach; end users’ behavior cannot be taken into account with white-box models; anomaly detection is not possible with those. Even while disregarding the latter issues, the modeling effort can become enormous and for complex DHSs (large distribution networks), this approach is not considered feasible.

Data-driven forecasting

Data-driven models are capable to address the system complexities with more reasonable investment. They use traditional multivariate time-series forecasting methods which combine historical heat production and weather data with weather forecast to predict short-term (hourly, daily, weekly) demand. The traditional methods, such as auto-regression models and exponential smoothing are not purely data-driven, as their implementation requires some domain expertise to introduce trends and seasonality. That is not the case with modern machine learning methods, such as neural networks which can learn temporal dynamic in a purely data-driven manner. To address non-linear and non-stationary behavior in data, advanced machine learning algorithms based on Deep Neural Networks (DNN) were introduced with convincing evidence on significantly improved forecasting performance. Most importantly, modern DNN methods implement network memory, enabling them to learn long-term dependencies. Such methods are based on Recurrent Neural Networks (RNN) and more advanced Long-Short Term Memory (LSTM) Deep Learning (DL) architectures. Main feature of RNNs is that they allow information to persist by introducing recurrent connections that add state of memory to the network, so they can inform the decision on some classification or regression task in the moment t, by using observations (or decisions) at moments t-1, t-2,.., t-n. In order to address the problem of so-called vanishing gradient, namely decay or exponentially blow-up of the network output as it cycles around the recurrent connections, LSTM models introduce recurrently connected blocks, known as memory blocks. Each of the blocks contains one or more recurrently connected memory cells and three multiplicative units – the input, output and forget gates. Each of the gates are associated with sigmoid activation function to control if the gates are triggered . While stacked LSTM architectures have shown good performance in single-step forecasting, for variable length sequence prediction Encoder-Decoder LSTM approaches are proposed. This approach implies use of two RNN/LSTM networks, one being an encoder that maps a variable-length source sequence to a fixed-length vector (context vector), and another acting as decoder, that maps that vector to a variable-length target sequence . A convolutional neural network (CNN) can be introduced to act as encoder in encoder-decoder architecture . Vanishing gradient problem re-appears in encoder-decoder architectures in which input vectors are encoded in their fixed-length (short) representations (context vectors), thus possibly losing some information from the very long input values sets. This problem is today addressed by so-called attention mechanisms . Attention upgrades encoder-decoder architecture by introducing that context vector is calculated by considering also previous hidden states and all the hidden states of the encoder. In addition to other above-mentioned benefits, in contrast to traditional time-series forecasting approaches, the accuracy of DNN models scales continuously with the increased training data volumes and they are agnostic to feature complexity. Still, their use introduces some weaknesses, one being a computational cost of training (periodically required for rolling forecasts). Also, the decisions based on the forecasts are not easily interpretable (so-called black-box model). The latter is especially important issue for DHS, typically managed by the local authorities with the key priority to act sustainably and in a transparent way.

Explainable AI

Model interpretability problem is often addressed by the Explainable AI (XAI) stack of methods and approaches. XAI helps to highlight evidence that prediction model is trustworthy (acting as intended when facing a given problem), that it is not biased, or that is compliant with regulation. XAI-driven explanations of the local model forecasts help to solidify the user trust of the model by revealing potential issues in its stability and robustness, such as the negative effects of small perturbations that might occur in a real world (for example, sensor drift) and biased observations. Local or post-hoc explainability relates to facilitating understanding of how input data is used by an already developed model to make a forecast for an instance of a class (for example, individual hourly forecast of heat load). Methods for post-hoc explainability can be model-agnostic or model-specific. Different explanation families can be used for different purposes, such as importance scores (saliency heatmaps), decision rules, decision trees, dependency plots and others. Some of the most commonly used model-agnostic approaches to local explainability are Local Interpretable Model-agnostic Explanations (LIME) and SHAP (SHapley Additive exPlanations). Global or ante-hoc explainability assumes that explainability is already incorporated in the trained model which is capable to reveal the features’ importances. Some of the basic techniques for global explainability are tree construction measures, model score features (permutations) and globalized local methods (LIME or SHAP). There are several XAI approaches relevant for RNN/LSTM architectures, mostly based on attention. Attention mechanisms already assign values corresponding to the importance of the different parts of the time series according to the model, so they are considered as ante-hoc explainability methods. In another approach , CNN is used as a feature extractor and a LSTM model to learn the temporal dependencies by feeding the hidden and output states of such architecture to the simple neural network whose weights are then considered as importances of the different timesteps. Finally, some methods specific to time-series forecasting that can be used to address XAI are Symbolic Aggregate approXimation (SAX) and Fuzzy logic. Explainability of time series forecasting did not gain attention of the research community such as computer vision or natural language processing. To the best of our knowledge, there were no attempts to use XAI approaches in heat load forecasting.

Anomaly detection

Anomaly detection is unsupervised machine learning problem of highlighting unexpected items or events (differing from a norm) in datasets or data streams. It is often used on time-series data, especially in industries, for fault detection. There are many different approaches for anomaly detection, from conventional (such as Predictive Confidence Level, Statistic Profiling, clustering-based approaches, etc.) to advanced, based on DL architectures (instance-based anomaly detection , generative adversarial networks , LSTM-based Encoder-Decoder architectures or others). Although the latter are much more effective, black-box nature of DL networks is considered as significant drawback for application in industrial system, where the explanation of detected anomaly is crucial for root-cause analyses, justification of high-impact shop-floor decisions (such as revision of operation parameters, equipment maintenance, etc.).

Improving forecasting model accuracy and explainability, enabling effective detection and interpretation of anomalous operation, decreasing computational cost and facilitating model interpretability are the key challenges that will be addressed by the proposed project activities. All those challenges are directly related to the main desired output of this research – becoming capable to recommend plant and substations operation control strategy with aim to minimize use of resources while maintaining the individual consumer satisfaction and his/her thermal comfort.