Volatility — the statistical measure of the dispersion of returns — is one of the most fundamental quantities in financial markets. It determines the price of options, drives risk management decisions, and signals market uncertainty. Forecasting future volatility accurately is enormously valuable: better volatility forecasts translate directly into better hedged positions, better-priced derivatives, and better-managed risk books.

The Classical Approach: GARCH

For over three decades, the dominant family of volatility models has been GARCH (Generalised Autoregressive Conditional Heteroskedasticity), introduced by Tim Bollerslev in 1986 extending Robert Engle's 1982 ARCH framework. GARCH models the current variance as a weighted combination of the long-run average variance, the previous period's variance, and the squared return shock from the previous period. This captures the two most important empirical features of financial volatility: clustering (periods of high volatility tend to be followed by more high volatility) and mean reversion (volatility eventually returns to its long-run average).

Extensions such as EGARCH and GJR-GARCH capture the asymmetric leverage effect — the empirical observation that negative return shocks increase volatility more than positive shocks of equal magnitude. Despite their age, GARCH-family models remain competitive benchmarks on many forecasting tasks.

Why LSTMs for Volatility?

Long Short-Term Memory (LSTM) networks are a natural fit for volatility forecasting because volatility is a sequential, path-dependent process. Unlike simpler recurrent networks, LSTMs can selectively retain information over hundreds of time steps through their gating mechanism — deciding what to remember, what to forget, and what to output at each step. This allows them to capture long-range volatility dependencies that GARCH models, which typically use only a handful of lags, may miss.

LSTMs can also naturally incorporate multiple input sequences simultaneously — past returns, realised volatility measures, the VIX index, options implied volatility, bid-ask spreads, and any other relevant time series — without requiring the explicit specification of interactions that parametric models demand.

Building an LSTM Volatility Forecasting Model

A typical LSTM volatility model is constructed as follows. The target variable is usually the realised volatility over the next day or week — computed from intraday returns or daily returns, possibly normalised by a long-run average. Input features are a window of recent observations: daily returns, five-day and twenty-day realised volatility, VIX levels and changes, and potentially sentiment scores.

The sequence is passed through one or more LSTM layers, followed by a dense output layer producing the volatility forecast. The model is trained by minimising a loss function such as mean squared error (MSE) or the quasi-maximum likelihood loss derived from a GARCH specification. Walk-forward cross-validation — training on a rolling historical window and testing on the following period — is essential to avoid look-ahead bias and to evaluate out-of-sample performance honestly.

Empirical Findings

Academic and industry research generally finds that LSTMs outperform classical GARCH models on longer-horizon forecasts and during turbulent market regimes, while GARCH remains competitive at short horizons under normal conditions. Hybrid models that combine GARCH's theoretical structure with LSTM's sequence-learning capability — using LSTM to model GARCH residuals or to predict GARCH parameters — often achieve the best of both worlds.

The Heterogeneous Autoregressive model (HAR-RV), which models realised volatility using daily, weekly, and monthly averages, provides a particularly strong baseline. Many sophisticated ML models struggle to consistently outperform HAR-RV out of sample, underscoring the importance of feature engineering and the use of realised volatility measures as both inputs and targets.

Practical Considerations

Deploying an LSTM volatility model in production requires careful attention to computational efficiency (LSTM inference can be a bottleneck in time-sensitive applications), model monitoring (detecting when the model's calibration has degraded), and integration with existing risk systems. Ensemble approaches — averaging predictions from multiple models including GARCH, HAR-RV, and LSTM variants — typically provide more robust forecasts than any single model, and are favoured in production risk management environments where forecast failures can have significant financial consequences.