Time Series Forecasting with Machine Learning Models

Time series forecasting — predicting future values of a sequence based on its historical values — is among the most practically important problems in data science. Energy companies forecast electricity demand to optimise generation schedules. Retailers forecast product demand to manage inventory. Banks forecast credit losses to determine capital requirements. Each requires modelling the complex temporal patterns — trend, seasonality, cycles, and irregular fluctuations — that characterise real-world time series.

Classical Methods: ARIMA and Exponential Smoothing

The workhorse of classical time series forecasting is ARIMA — AutoRegressive Integrated Moving Average. An ARIMA model combines three components: the autoregressive part (the series is regressed on its own past values), the integrated part (differencing to achieve stationarity), and the moving average part (the error term is a function of past forecast errors). Seasonal variants (SARIMA) extend this to handle seasonal patterns. ARIMA models are transparent, well-understood statistically, and competitive on many short-horizon forecasting tasks.

Exponential smoothing methods — particularly the Holt-Winters triple exponential smoothing — provide a simple, robust alternative. They model trend and seasonal components through weighted averages of recent observations, with more recent observations receiving exponentially larger weights. Despite their simplicity, exponential smoothing methods consistently outperform more complex approaches on many real-world datasets, as demonstrated by the M forecasting competitions.

Facebook Prophet

Facebook's Prophet, released as an open-source library in 2017, made high-quality time series forecasting accessible to non-specialists. Prophet decomposes the time series into trend, seasonality, and holiday components, fitting them using a Bayesian model. The trend component models non-linear growth with change points (abrupt trend changes) detected automatically. Seasonality is modelled using Fourier series, allowing smooth periodic patterns to be captured. Prophet is particularly well-suited to business time series with strong seasonal patterns and holiday effects, and its interpretable components make it popular for communicating forecasts to non-technical stakeholders.

Gradient Boosted Trees for Forecasting

Machine learning models designed for tabular data — XGBoost, LightGBM — can be applied to time series forecasting by engineering appropriate lag features, rolling statistics, and calendar features (day of week, month, holiday indicators). This approach requires careful feature engineering but can capture complex non-linear patterns and interactions that ARIMA and exponential smoothing miss.

A key advantage is the ability to incorporate external features (weather, promotional activity, macroeconomic indicators) directly as inputs, without the structural constraints of ARIMA extensions. Global models — single XGBoost models trained across many related time series simultaneously — can share statistical strength across series, improving performance particularly for series with limited history.

Deep Learning Architectures for Forecasting

LSTMs and related sequence models have been widely applied to time series forecasting, but their advantage over simpler methods is less consistent than their success in NLP would suggest. The Temporal Fusion Transformer (TFT), developed by Google Research, represents the current state of the art for multi-horizon probabilistic forecasting. TFT combines variable selection (automatically identifying the most predictive features), LSTM encoders for historical sequence processing, and self-attention for capturing long-range dependencies, producing calibrated prediction intervals rather than point forecasts.

N-BEATS and N-HiTS, purely feedforward neural architectures with interpretable trend-seasonality decomposition, have demonstrated strong performance with lower computational cost than recurrent or attention-based models. Amazon's DeepAR produces probabilistic forecasts across many related time series simultaneously by training a global LSTM model, sharing information across series while accounting for series-specific scale.

Model Selection and Evaluation

The M4 and M5 forecasting competitions — involving thousands of business, economic, and financial time series — provide the most rigorous empirical comparison of forecasting methods. A consistent finding is that ensemble methods — combining predictions from multiple models including classical methods, ML models, and neural networks — typically outperform any single approach. The M5 winner combined gradient boosted trees with ARIMA models and careful feature engineering tailored to the retail demand forecasting problem.

Proper evaluation requires time-series-specific cross-validation that respects temporal order — a model must never be trained on data from the future relative to the test period. Metrics such as MASE (Mean Absolute Scaled Error), which normalises by the naive forecast baseline, allow comparison across series with different scales and frequency structures.