Attention Makes HVAC Control More Efficient
Conférence : Communications avec actes dans un congrès international
Heating, ventilation, and air-conditioning (HVAC) systems account for around 16.4% of global final energy consumption and about 14% of global operational CO2 emissions. Controlling them is a partially observable, sequential decision problem: relying solely on instantaneous sensor readings as inputs overlooks the full sequence of past conditions that shape future dynamics. To tackle this challenge and fully exploit the available temporal context, this study integrates a transformer encoder–decoder architecture into a Double Deep Q-Network (DDQN), forming a « Q-Transformer » capable of processing 24-hour sequences of observations. This approach is benchmarked against a conventional DDQN using a multilayer perceptron (MLP) and a sequence-aware Bi-LSTM within an EnergyPlus model of a 400 m2 university amphitheater. In a weekly adaptation test, the Q-Transformer demonstrated rapid learning convergence, reducing energy consumption by up to 48% and 66% (occupied and unoccupied periods) compared to Bi-LSTM and MLP, respectively, while maintaining thermal and air quality comfort. After one year of simulated training on weather and occupancy data from a reference site (Luxembourg), the agent was evaluated across 4 different climatic locations, resulting energy reductions of 40% and 74%, and comfort violation reductions of 11% and 29%, compared to Bi-LSTM and MLP, respectively. These results signifies the ability of generalization and control capabilities of transformer-based reinforcement learning for adaptive HVAC management.