Projects
Building a Volatility Model
A complete walkthrough of building a volatility model to improve strategy decisions.
Why
The goal is to make volatility context more explicit before choosing a strategy. I want a model that helps answer whether premium is worth selling, whether realized movement is likely to expand, and how implied volatility aligns with realized volatility.
The model we build will output an estimated At-The-Money (ATM) Implied Volatility and ATM Straddle Price. These outputs are then compared to the current market ATM Implied Volatility and current market ATM Straddle price. This gives us four possible outcomes:
- Model IV > Market IV and Model Straddle Price > Market Straddle Price
- Model IV > Market IV and Model Straddle Price < Market Straddle Price
- Model IV < Market IV and Model Straddle Price > Market Straddle Price
- Model IV < Market IV and Model Straddle Price < Market Straddle Price
How
We will build a model, in stages, using Python. First, we gather the required data and build a baseline model. We will measure the model’s performance against the day’s realized volatility and straddle price (avoiding any lookahead bias). At a later stage, we will perform a backtest using the model to get a sense for its economic performance.
Key Decisions
It is important to outline some key decision up front before building the model.
The model’s output is only intended to be used for today’s option expiration (0 DTE tenor). This means the decision on which volatility estimator we use to measure realized volatility is the most consequential decision and lays the foundation for the model.
Below is a brief list of common estimators.
| Estimator | Authors | Year | Best utilized when… |
|---|---|---|---|
| Close-to-Close | Standard sample variance of log close-to-close returns (widely used in early empirical finance; not tied to a single named paper) | Pre‑1980 (classical “historical volatility” measure) | You only have daily closing prices, need a simple benchmark or quick sanity‑check volatility for options or risk models; also useful when overnight moves are economically relevant and you do not have intraday data. |
| Parkinson (High–Low) | Michael Parkinson | 1980 | You have high/low data and want a more efficient estimator under a diffusion with negligible drift and no jumps, especially for intraday or execution‑algorithm volatility where overnight gaps are less important. |
| Garman–Klass (OHLC, zero drift) | Mark B. Garman and Michael J. Klass | 1980 | You have full OHLC bars and want a very efficient estimator in markets where the zero‑drift assumption is reasonable and opening jumps are modest, such as liquid futures or FX; often used when modeling continuous intraday volatility. |
| Rogers–Satchell (RS) | L. C. G. Rogers and S. E. Satchell | 1991 | You need an OHLC estimator that remains unbiased under non‑zero drift (trending markets), making it suitable for equities or assets with persistent drift where classic range estimators like Parkinson and Garman–Klass become biased. |
| Yang–Zhang | Dennis Yang and Qiang Zhang | 2000 | You want a “default” realized volatility for daily OHLC bars in equities: it combines overnight, open‑to‑close, and RS components, is drift‑independent, handles opening gaps, and is far more statistically efficient than Close‑to‑Close, making it well‑suited for equity options, VRP work, and short windows. |
Since we only care about intraday volatility and do not need to measure volatility multiple sessions, given we are using this to trade 0-DTE only, we are going to use the Rogers-Satchell estimator. If we want to trade over longer tenors, then selecting the Yang-Zhang estimator would give us a more robust estimate.
This model will assume we are only trading SPX options; however, the model could accept any symbol data that aligns the timeframes used below.
Fetching the required data
To start, we only need the Daily OHLC (Open, High, Low, Close) data for SPX and the Daily OHLC price data for the ATM Straddle at market open. We will use the SPX open price to determine the ATM strike price.
This walkthrough will use Massive as the data source. We only need historic data at this point. There is no need for a “live” subscription to build the model.
Define Data Helpers
import pandas as pd
from datetime import datetime, timedelta
from pathlib import Path
from massive import RESTClient
def get_previous_bar(ticker: str) -> pd.DataFrame:
# `MASSIVE_API_KEY` must be set in the environmenta
client = RESTClient()
agg = client.get_previous_close_agg(ticker)
df = pd.DataFrame(agg)
# convert 'timestamp' to date
df["date"] = pd.to_datetime(df["timestamp"], unit="ms").dt.normalize()
# don't need the following columns
df.drop(columns=["timestamp", "volume", "vwap"], inplace=True)
df.set_index("date", inplace=True)
return df
def get_strike(price: float) -> int:
"""Round price to the nearest 5."""
return int(round(price / 5) * 5)
def option_ticker(date: datetime, strike: int, option_type: str) -> str:
"""Format: O:SPXWYYMMDD[C/P]00000000 (8-digit strike in cents)"""
datestr = date.strftime('%y%m%d')
# Option strikes are usually multiplied by 1000 and padded to 8 digits
strike_str = f"{strike * 1000:08d}"
return f"O:SPXW{datestr}{option_type}{strike_str}"
def get_daily_bars(
ticker: str, start_date: datetime, end_date: datetime = datetime.now() - timedelta(days=1)
) -> pd.DataFrame:
# `MASSIVE_API_KEY` must be set in the environmenta
client = RESTClient()
start_date_str = start_date.strftime("%Y-%m-%d")
end_date_str = end_date.strftime("%Y-%m-%d")
aggs = client.list_aggs(
ticker,
1,
"day",
start_date_str,
end_date_str,
sort="asc",
)
df = pd.DataFrame(aggs)
df = df.sort_values(by="timestamp")
# convert 'timestamp' to date
df["date"] = pd.to_datetime(df["timestamp"], unit="ms").dt.normalize()
ohlc = ["open", "high", "low", "close"]
df = df[["date"] + ohlc]
df[ohlc] = df[ohlc].astype(float)
df.set_index("date", inplace=True)
return df
Fetch the Required Data
# Load data file from previous execution
# Delete the existing file to re-fetch all data
file_path = Path("inputs.csv")
if file_path.exists():
# Read the file if it exists
df = pd.read_csv(file_path, index_col="date")
# Improvement: check the latest date in the file and fetch new dates only
else:
# Fetch historic data (based on your Massive Options plan)
start_date = datetime.now() - timedelta(weeks=52 * 2)
df = get_daily_bars("I:SPX", start_date=start_date)
straddle = pd.DataFrame(index=df.index, columns=[
'straddle_open', 'straddle_high', 'straddle_low', 'straddle_close'
], dtype=float)
# fetch 0-DTE straddle data
for date in df.index:
strike = get_strike(df.loc[date, "open"])
# Fetch Call and Put
call = get_previous_bar(option_ticker(date, strike, 'C'))
put = get_previous_bar(option_ticker(date, strike, 'P'))
open = call['open'].iloc[0] + put['open'].iloc[0]
close = call['close'].iloc[0] + put['close'].iloc[0]
high = max(
call['high'].iloc[0] + put['low'].iloc[0],
call['low'].iloc[0] + put['high'].iloc[0],
call['open'].iloc[0] + put['open'].iloc[0],
call['close'].iloc[0] + put['close'].iloc[0]
)
low = min(
call['high'].iloc[0] + put['low'].iloc[0],
call['low'].iloc[0] + put['high'].iloc[0],
call['open'].iloc[0] + put['open'].iloc[0],
call['close'].iloc[0] + put['close'].iloc[0]
)
straddle.loc[date] = [open, high, low, close]
df = df.join(straddle)
# Persist the file
df.to_csv(file_path)
df.tail()
Define the Estimator
def rogers_satchell_estimator(
open: pd.Series, high: pd.Series, low: pd.Series, close: pd.Series
) -> pd.Series:
"""
Calculates the Rogers-Satchell Volatility Estimator.
"""
h_c = np.log(high / close)
h_o = np.log(high / open)
l_c = np.log(low / close)
l_o = np.log(low / open)
var = (h_c * h_o) + (l_c * l_o)
vol = np.log(np.sqrt(var) * np.sqrt(252) * 100)
return pd.Series(data=vol, name="vol")
To be continued …