Why is backtesting expected shortfall (ES) so much harder than backtesting VaR?

Question

AcadiFi · Accepted Answer

The shift from VaR to expected shortfall (ES) in the Fundamental Review of the Trading Book (FRTB) introduced a significant practical challenge: ES is much harder to backtest than VaR. This tension is a key topic in FRM Part II.

## Why VaR Is Easy to Backtest

VaR backtesting is straightforward because it only checks a **binary outcome**: did the actual loss exceed the VaR estimate? With 99% VaR over 250 trading days, you expect about 2.5 exceptions. You count exceptions and run a binomial test (Basel traffic light system).

## Why ES Backtesting Is Hard

**Problem 1 — ES is about the magnitude of tail losses, not just the count**
ES = E[Loss | Loss > VaR]. You need to verify not just that losses exceed VaR the right number of times, but that the *average* of those exceedances matches the ES prediction. With only 2-3 exceptions per year, you have a tiny sample to estimate this average.

**Problem 2 — Small sample bias**
If you have 250 daily observations and your VaR is at the 97.5% level, you expect only about 6 exceptions. Computing a reliable mean from 6 observations is statistically meaningless. The confidence interval around that estimate is enormous.

**Problem 3 — ES is not elicitable (in the traditional sense)**
A risk measure is "elicitable" if there exists a scoring function that is uniquely minimized by the correct forecast. VaR is elicitable; ES alone is not. However, ES is jointly elicitable with VaR (the pair is elicitable), which opens up some backtesting approaches.

**Problem 4 — Regime changes**
Tail events are by definition rare. A model calibrated to quiet markets will fail in a crisis, but you won't know it's failing until the crisis is well underway.

## Proposed Solutions

1. **Basel's approach: Backtest VaR, calibrate ES**
FRTB backtests VaR at both the 97.5% and 99% levels. If VaR passes backtesting, the ES estimate (derived from the same model) is assumed to be reasonable. ES is then scaled by a multiplier.

2. **Acerbi-Szekely test**
Uses a test statistic based on the average of realized losses beyond VaR, standardized by the ES estimate. Requires fewer observations than direct estimation.

3. **Multinomial approach**
Divide the tail into multiple bins (e.g., 97.5%-99%, 99%-99.5%, 99.5%+) and test whether the observed distribution across bins matches the predicted distribution.

4. **Ridge backtesting**
Combines VaR and ES information into a joint test, exploiting their joint elicitability.

```mermaid
flowchart TD
    VAR_BT["VaR Backtesting
(Easy: Count exceptions)"] --> BINARY{"Binary outcome:
Exceed or not?"}
    ES_BT["ES Backtesting
(Hard: Magnitude of tail)"] --> ISSUES["Challenges"]
    ISSUES --> SMALL["Tiny sample
(2-6 exceptions/year)"]
    ISSUES --> ELICIT["Not directly
elicitable"]
    ISSUES --> REGIME["Regime changes
in tail behavior"]
    SMALL --> SOLUTIONS{"Solutions"}
    ELICIT --> SOLUTIONS
    REGIME --> SOLUTIONS
    SOLUTIONS --> S1["Basel: Backtest VaR,
scale to ES"]
    SOLUTIONS --> S2["Acerbi-Szekely
test statistic"]
    SOLUTIONS --> S3["Multinomial
bin approach"]
```

**FRM exam tip:** Know that the FRTB relies on VaR backtesting as a proxy for ES validation, and understand why direct ES backtesting remains an open research problem.

For more on FRTB and ES methodology, check our FRM Part II course.

Why is backtesting expected shortfall (ES) so much harder than backtesting VaR?

Why VaR Is Easy to Backtest

Why ES Backtesting Is Hard

Proposed Solutions

Master Part II with our FRM Course

Related Questions

Practice Questions