Building a Volatility Model

Why

The goal is to make volatility context more explicit before choosing a strategy. I want a model that helps answer whether premium is worth selling, whether realized movement is likely to expand, and how implied volatility aligns with realized volatility.

The model we build will output an estimated At-The-Money (ATM) Implied Volatility and ATM Straddle Price. These outputs are then compared to the current market ATM Implied Volatility and current market ATM Straddle price. This gives us four possible outcomes:

Model IV > Market IV and Model Straddle Price > Market Straddle Price
Model IV > Market IV and Model Straddle Price < Market Straddle Price
Model IV < Market IV and Model Straddle Price > Market Straddle Price
Model IV < Market IV and Model Straddle Price < Market Straddle Price

How

We will build a model, in stages, using Python. First, we gather the required data and build a baseline model. We will measure the model’s performance against the day’s realized volatility and straddle price (avoiding any lookahead bias). At a later stage, we will perform a backtest using the model to get a sense for its economic performance.

Key Decisions

It is important to outline some key decision up front before building the model.

The model’s output is only intended to be used for today’s option expiration (0 DTE tenor). This means the decision on which volatility estimator we use to measure realized volatility is the most consequential decision and lays the foundation for the model.

Below is a brief list of common estimators.

Estimator	Authors	Year	Best utilized when…
Close-to-Close	Standard sample variance of log close-to-close returns (widely used in early empirical finance; not tied to a single named paper)	Pre‑1980 (classical “historical volatility” measure)	You only have daily closing prices, need a simple benchmark or quick sanity‑check volatility for options or risk models; also useful when overnight moves are economically relevant and you do not have intraday data.
Parkinson (High–Low)	Michael Parkinson	1980	You have high/low data and want a more efficient estimator under a diffusion with negligible drift and no jumps, especially for intraday or execution‑algorithm volatility where overnight gaps are less important.
Garman–Klass (OHLC, zero drift)	Mark B. Garman and Michael J. Klass	1980	You have full OHLC bars and want a very efficient estimator in markets where the zero‑drift assumption is reasonable and opening jumps are modest, such as liquid futures or FX; often used when modeling continuous intraday volatility.
Rogers–Satchell (RS)	L. C. G. Rogers and S. E. Satchell	1991	You need an OHLC estimator that remains unbiased under non‑zero drift (trending markets), making it suitable for equities or assets with persistent drift where classic range estimators like Parkinson and Garman–Klass become biased.
Yang–Zhang	Dennis Yang and Qiang Zhang	2000	You want a “default” realized volatility for daily OHLC bars in equities: it combines overnight, open‑to‑close, and RS components, is drift‑independent, handles opening gaps, and is far more statistically efficient than Close‑to‑Close, making it well‑suited for equity options, VRP work, and short windows.

Since we only care about intraday volatility and do not need to measure volatility multiple sessions, given we are using this to trade 0-DTE only, we are going to use the Rogers-Satchell estimator. If we want to trade over longer tenors, then selecting the Yang-Zhang estimator would give us a more robust estimate.

This model will assume we are only trading SPX options; however, the model could accept any symbol data that aligns the timeframes used below.

Fetching the required data

To start, we only need the Daily OHLC (Open, High, Low, Close) data for SPX and the Daily OHLC price data for the ATM Straddle at market open. We will use the SPX open price to determine the ATM strike price.

This walkthrough will use Massive as the data source. We only need historic data at this point. There is no need for a “live” subscription to build the model.

Define Data Helpers

import pandas as pd
from datetime import datetime, timedelta
from pathlib import Path
from massive import RESTClient

def get_previous_bar(ticker: str) -> pd.DataFrame:
    # `MASSIVE_API_KEY` must be set in the environmenta
    client = RESTClient()

    agg = client.get_previous_close_agg(ticker)

    df = pd.DataFrame(agg)

    # convert 'timestamp' to date
    df["date"] = pd.to_datetime(df["timestamp"], unit="ms").dt.normalize()
    # don't need the following columns
    df.drop(columns=["timestamp", "volume", "vwap"], inplace=True)
    df.set_index("date", inplace=True)
    return df

def get_strike(price: float) -> int:
    """Round price to the nearest 5."""
    return int(round(price / 5) * 5)

def option_ticker(date: datetime, strike: int, option_type: str) -> str:
    """Format: O:SPXWYYMMDD[C/P]00000000 (8-digit strike in cents)"""
    datestr = date.strftime('%y%m%d')
    # Option strikes are usually multiplied by 1000 and padded to 8 digits
    strike_str = f"{strike * 1000:08d}"
    return f"O:SPXW{datestr}{option_type}{strike_str}"

def get_daily_bars(
    ticker: str, start_date: datetime, end_date: datetime = datetime.now() - timedelta(days=1)
) -> pd.DataFrame:
    # `MASSIVE_API_KEY` must be set in the environmenta
    client = RESTClient()

    start_date_str = start_date.strftime("%Y-%m-%d")
    end_date_str = end_date.strftime("%Y-%m-%d")

    aggs = client.list_aggs(
        ticker,
        1,
        "day",
        start_date_str,
        end_date_str,
        sort="asc",
    )

    df = pd.DataFrame(aggs)

    df = df.sort_values(by="timestamp")

    # convert 'timestamp' to date
    df["date"] = pd.to_datetime(df["timestamp"], unit="ms").dt.normalize()

    ohlc = ["open", "high", "low", "close"]
    df = df[["date"] + ohlc]
    df[ohlc] = df[ohlc].astype(float)

    df.set_index("date", inplace=True)

    return df

Fetch the Required Data

# Load data file from previous execution
# Delete the existing file to re-fetch all data
file_path = Path("inputs.csv")

if file_path.exists():
    # Read the file if it exists
    df = pd.read_csv(file_path, index_col="date")
    # Improvement: check the latest date in the file and fetch new dates only
else:
    # Fetch historic data (based on your Massive Options plan)
    start_date = datetime.now() - timedelta(weeks=52 * 2)
    df = get_daily_bars("I:SPX", start_date=start_date)

    straddle = pd.DataFrame(index=df.index, columns=[
        'straddle_open', 'straddle_high', 'straddle_low', 'straddle_close'
    ], dtype=float)

    # fetch 0-DTE straddle data
    for date in df.index:
        strike = get_strike(df.loc[date, "open"])
        # Fetch Call and Put
        call = get_previous_bar(option_ticker(date, strike, 'C'))
        put = get_previous_bar(option_ticker(date, strike, 'P'))

        open = call['open'].iloc[0] + put['open'].iloc[0]
        close = call['close'].iloc[0] + put['close'].iloc[0]
        high = max(
            call['high'].iloc[0] + put['low'].iloc[0],
            call['low'].iloc[0] + put['high'].iloc[0],
            call['open'].iloc[0] + put['open'].iloc[0],
            call['close'].iloc[0] + put['close'].iloc[0]
        )
        low = min(
            call['high'].iloc[0] + put['low'].iloc[0],
            call['low'].iloc[0] + put['high'].iloc[0],
            call['open'].iloc[0] + put['open'].iloc[0],
            call['close'].iloc[0] + put['close'].iloc[0]
        )

        straddle.loc[date] = [open, high, low, close]

    df = df.join(straddle)
    # Persist the file
    df.to_csv(file_path)

df.tail()

Define the Estimator

def rogers_satchell_estimator(
    open: pd.Series, high: pd.Series, low: pd.Series, close: pd.Series
) -> pd.Series:
    """
    Calculates the Rogers-Satchell Volatility Estimator.
    """
    h_c = np.log(high / close)
    h_o = np.log(high / open)
    l_c = np.log(low / close)
    l_o = np.log(low / open)

    var = (h_c * h_o) + (l_c * l_o)
    vol = np.log(np.sqrt(var) * np.sqrt(252) * 100)

    return pd.Series(data=vol, name="vol")

To be continued …