In today’s tutorial we investigate how you can use ThetaData’s API to retreive 10 years of historical options data for comparing Implied Volatility to Historical Volatility.

Q: What is the difference between Actual and Implied Volatility?

Historical/Actual/Realized Volatility (rv)

Realized volatility (rv) is the actual stock price variability due to randomness of the underlying Brownian motion the stock price. Using the stock return model (the solution of Geometric Brownian Motion SDE), the realized volatility is the coefficient of the Browanian Motion process (Wiener process). We will explain this more using some ito calculus in the next video!

Stock Returns Model

$$\large ln \frac{S_t}{S_0} = (\mu – \frac{\sigma^2}{2})t + \sigma^\mathbb{P} W_t$$

$$\large r_t = ln \frac{S_t}{S_0}$$

Realized Volatility (over the entire period of time)

$$\large rv^{(N)}_t = \sigma^\mathbb{P} \approx \sqrt{\sum^{N} {r_t}^2}$$

Realized Volatility (for each period of time – unbiased std dev.)

$$\large rv_t = \sigma^\mathbb{P} \approx \sqrt{\sum^{N} \frac{{r_t}^2}{N-1}}$$

Difference between realized volatility as the sqaure root of sum of squared returns or standard deviation of returns? The only difference is frequency. You would think you should use average log return over period. Look at papers Andersen, Bollerslev, Diebold and Labys 2001 and Barndorff-Nielsen and
Shephard 2001, 2002. Refer to this link

Implied Volatility (iv)

Implied volatility is how the market is pricing the option currently. To calculate implied volatility you use the market price of the option (as well as the contract terms) and a theoretical pricing model depending on the type of option being priced. For example using the Black-Scholes model, we get the price of a European call option is given by:

$$\large C = S_t \Phi{(d_1)} – Ke^{-r\tau} \Phi{(d_2)}$$

where:

$$\large d_1 = \frac{ln\frac{S_t}{K} + (r + \frac{\sigma^2}{2}\tau)}{\sigma \sqrt{\tau}}$$ and

$$\large d_2 = d_1 – \sigma \sqrt{\tau}$$

$\large S_t$ todays stock price
$\large K$ strike price of option
$\large \sigma$ volatility
$\large r$ interest rate
$\large \tau = T – t$ Difference between Maturity time T and todays date t

The implied volatility is the volatility that results in the observed market price of the option.
For a particular option with strike $\large K$, and time to expiry $\large \tau$, there will be an observable market price.

$$\large C_{obs}(K, \tau) = C(\sigma^{\mathbb{Q}}, K, \tau)$$

This can be obtained by a root finding methods like Newton, bi-section, secant or brent method. Specifically for the case of Black-Scholes model, there is also a more rational way to compute implied volatility and we will explore this in our series.

Realized vs Implied Volatility

Since the market does not have perfect knowledge about the future these two numbers can and will be different.

Therein, lies the risk management problem / business or trading opportunity.

import os
import time
import pickle

import random
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from datetime import timedelta, datetime, date
from thetadata import ThetaClient, OptionReqType, OptionRight, DateRange, DataType, StockReqType

Get all Expirations for MSFT Options

First thing we need is all the expiry dates of all contracts on MSFT that ThetaData has available.

your_username = ''
your_password = ''

def get_expirations(root_ticker) -> pd.DataFrame:
    """Request expirations from a particular options root"""
    # Create a ThetaClient
    client = ThetaClient(username=your_username, passwd=your_password, jvm_mem=4, timeout=15)

    # Connect to the Terminal
    with client.connect():

        # Make the request
        data = client.get_expirations(
            root=root_ticker,
        )

    return data

Making requests to API for all Contracts by Expiry Dates

root_ticker = 'MSFT'
expirations = get_expirations(root_ticker)
expirations

Get all Strikes for each MSFT Option Expiry

We will need these later, so I will build up a dictionary and pickle this data for future use.

def get_strikes(root_ticker, expiration_dates) -> pd.DataFrame:
    """Request strikes from a particular option contract"""
    # Create a ThetaClient
    client = ThetaClient(username=your_username, passwd=your_password, jvm_mem=4, timeout=15)
    
    all_strikes = {}

    # Connect to the Terminal
    with client.connect():
        
        for exp_date in expiration_dates:
        
            # Make the request
            data = client.get_strikes(
                root=root_ticker,
                exp=exp_date
            )
            
            all_strikes[exp_date] = pd.to_numeric(data)
            

    return all_strikes

Making requests to API for Strikes

root_ticker = 'MSFT'

all_strikes = get_strikes(root_ticker, expirations)

with open('MSFT_strikes.pkl', 'wb') as f:
    pickle.dump(all_strikes, f)

with open('MSFT_strikes.pkl', 'rb') as f:
    all_strikes = pickle.load(f)
    
all_strikes[expirations[360]]

MSFT Underlying ThetaData Request

We will be leveraging the ability to aggregate time periods throughout the day using the API, by defining a interval_size. We will then compare the historical volatility to the implied volatility for every trading day for quotes that were made in the underlying and options of ATM options in the afternoon (14:00).

def get_hist_stock(root_ticker, trading_days, interval_size) -> pd.DataFrame:
    """Request historical data for an underlying"""
    # Create a ThetaClient
    client = ThetaClient(username=your_username, passwd=your_password, jvm_mem=4, timeout=15)
    
    underlying = {}

    # Connect to the Terminal
    with client.connect():
        # Make the request
        
        for tdate in trading_days:
            
            try:
                data = client.get_hist_stock(
                    req=StockReqType.QUOTE,
                    root=root_ticker,
                    date_range=DateRange(tdate, tdate),
                    interval_size=interval_size
                )
                
                data = data.apply(weighted_mid_price, axis=1)
                
                underlying[tdate] = data[4]
                
            except:
                underlying[tdate] = np.nan

    return underlying

Calculate Weighted Mid Price (Micro-Price)

Calculate the weighted mid price (micro-price) for each row within our quotes dataframe.

def weighted_mid_price(row):
    try:
        V_mid = row[DataType.ASK_SIZE] + row[DataType.BID_SIZE]
        x_a = row[DataType.ASK_SIZE]/V_mid
        x_b = 1 - x_a
        return row[DataType.ASK]*x_a + row[DataType.BID]*x_b
    except:
        return np.nan

Making requests to API for Underlying

root_ticker = 'MSFT'
trading_days = pd.date_range(start=datetime(2012,6,1),end=datetime(2022,11,14),freq='B')
interval_size = 60*60000

underlying = get_hist_stock(root_ticker, trading_days, interval_size)

with open('underlying.pkl', 'wb') as f:
    pickle.dump(underlying, f)

Volatility over 30d window (~21 trading days)

Now I want to compute the realized volatility over a number of days, and we can calculate this by applying the standard deviation over a rolling window. H=30d but this is equivalent to approximately 21 trading days (or points of data).

$$\large rv^{H}t \approx \sqrt{\sum^{M-1}{j=0} {r^2_{t-j\Delta}}}$$

where:

$\large \Delta = \frac{H}{M}$ and
$\large r_{t-j\Delta} = ln \frac{S_{t-j\Delta}}{S_{t-(j+1)\Delta}}$

with open('underlying.pkl', 'rb') as f:
    underlying = pickle.load(f)

spot = pd.DataFrame(underlying.items(), columns=['trade_date', 'price'])
spot.set_index('trade_date', inplace=True)
spot = spot.dropna()

log_returns = np.log(spot/spot.shift(1)).dropna()

TRADING_DAYS = 21
spot['vol'] = log_returns.rolling(window=TRADING_DAYS).std()*np.sqrt(252)
spot.tail()

fig,ax = plt.subplots(figsize=(12,4))
ax.plot(spot['price'], color='tab:blue')
ax2=ax.twinx()
ax2.plot(spot['vol']*100, color='tab:red')

# set x-axis label
ax.set_xlabel("year", fontsize = 14)
# set y-axis label
ax.set_ylabel("Stock Price (USD $)",
              color="tab:blue",
              fontsize=14)

ax2.set_ylabel("Volatility (%)",color="tab:red",fontsize=14)
plt.show()

Market-Makers are not forced to show Quotes on all options!

There are rules listed for each Exchange that market makers must abide by. For Example on the NASDAQ where MSFT trades here are the rules

Specifically there is a large difference between the obligation of a Competitive Market Maker and the Primary Market Makers for a particular options series. This is notable in whether they need to present two-sided quotes on Non-standard options like weekly or quarterly expiry options and adjusted options.

To be safe here, we will only want to return option contracts with ‘standard’ option expires. These expire on the Saturday following the third Friday of the month, and some have the expiry date as the Third friday of the month, but in the past were recorded as the Saturday. Therefore we need to find the intersection of all the expiries that Thetadata has options data for and the 3rd Fridays and the following Saturday dates for every month since Jun-2021.

trading_days = pd.date_range(start=datetime(2012,6,1),end=datetime(2022,11,14),freq='B')
# The third friday in every month
contracts1 = pd.date_range(start=datetime(2012,6,1),end=datetime(2024,12,31),freq='WOM-3FRI')
# Saturday following the third friday in every month
contracts2 = pd.date_range(start=datetime(2012,6,1),end=datetime(2022,12,31),freq='WOM-3FRI')+timedelta(days=1)
# Combine these contracts into a total pandas index list
contracts = contracts1.append(contracts2)
# Find contract expiries that match with ThetaData expiries 
mth_expirations = [exp for exp in expirations if exp in contracts]
# Convert from python list to pandas datetime
mth_expirations = pd.to_datetime(pd.Series(mth_expirations))

print('Number of possible monthly contracts', len(contracts), 'compared to total avail',len(mth_expirations), 
      'compared to total no. options avail (incl. quarterly + weekly)', len(expirations))

Days to Expiry (DTE)

Find the contracts that are closest 1mth, 2mth, 3mth and 4mth to expiry

trading_days = pd.date_range(start=datetime(2012,6,1),end=datetime(2022,11,14),freq='B')

contracts = {}
DTE = [30,60,90,120]
for trade_date in trading_days:
    days = [delta.days for delta in mth_expirations - trade_date]
    index_contracts = [min({(abs(day-dte),i) for i,day in enumerate(days)})[1] for dte in DTE]
    contracts[trade_date] = index_contracts

Implied volatility requests

ThetaData uses the quotes information on a particular options chain for a specific trading day to compute the implied volatility. For every trading day we will look forward to the closest monthly trading option contracts and get the closest strike to the underlying (~ATM), and retrieve the implied volatility for the contracts ~30d (1mth), ~60d (2mth), ~90d (3mth) and ~120d (4mth) to expiry.

Make the request

# Make the request
def implied_vol(root_ticker, trading_days, interval_size=0, opt_type=OptionRight.CALL) -> pd.DataFrame:
    """Request quotes both bid/ask options data"""
    # Create a ThetaClient
    client = ThetaClient(username=your_username, passwd=your_password, jvm_mem=4, timeout=15)
    
    # Store all iv in datas dictionary
    datas = {}
    DTE = ['1mth','2mth','3mth','4mth']
    total_days = len(trading_days)

    # Connect to the Terminal
    with client.connect():
        
        for ind, trade_date in enumerate(trading_days):
            
            print('*'*100, '\nSTART:' ,trade_date, ind+1, '/', total_days ,'\n','*'*100)
            
            # Get the expiry dates for specific contracts on particular trade date
            exp_dates = mth_expirations[contracts[trade_date]]
            datas[trade_date] = {}
            
            # For each expiry we want to get closest ATM iv
            for exp_ind, exp_date in enumerate(exp_dates):

                # determine closest ATM strike - iterate through all strikes of expiry date.
                diff_strike = [delta for delta in all_strikes[exp_date] - underlying[trade_date]]
                # Min. difference between particular DTE interested, return index
                index_strike = min({(abs(Kdiff),i) for i,Kdiff in enumerate(diff_strike)})[1]
                # Return closest ATM strike
                strike = all_strikes[exp_date][index_strike]

                try:
                    # Attempt to request historical options implied volatility
                    data = client.get_hist_option(
                        req=OptionReqType.IMPLIED_VOLATILITY,
                        root=root_ticker,
                        exp=exp_date,
                        strike=strike,
                        right=opt_type,
                        date_range=DateRange(trade_date, trade_date),
                        progress_bar=False,
                        interval_size=interval_size
                    )
                    
                    # Store data in dictionary
                    datas[trade_date][DTE[exp_ind]] = data.loc[4,DataType.IMPLIED_VOL]

                except:
                    # If unavailable, store np.nan
                    datas[trade_date][DTE[exp_ind]] =  np.nan 

    return datas

Making requests to API for IV

start_all = time.time()

datas_call = implied_vol(root_ticker, trading_days, interval_size = 60*60000, opt_type=OptionRight.CALL)

with open('datas_mth_calls.pkl', 'wb') as f:
    pickle.dump(datas_call, f)

datas_put = implied_vol(root_ticker, trading_days, interval_size = 60*60000, opt_type=OptionRight.PUT)

with open('datas_mth_puts.pkl', 'wb') as f:
    pickle.dump(datas_put, f)

end_all = time.time()
print('*'*100,'  TOTAL time taken {:.2f} s'.format(end_all-start_all),'*'*100)

To demonstrate what that looks like

trading_days = pd.date_range(start=datetime(2022,11,7),end=datetime(2022,11,11),freq='B')

start_all = time.time()

datas = implied_vol(root_ticker, trading_days, interval_size = 60*60000, opt_type=OptionRight.CALL)

end_all = time.time()
print('*'*100,'  TOTAL time taken {:.2f} s'.format(end_all-start_all),'*'*100)

df = pd.DataFrame(datas.items(), columns=['trade_date', 'price']) 
N = len(df)
calls = np.empty([N, 4])
for ind, (date, data) in enumerate(datas.items()):
    calls[ind, 0] = data['1mth']
    calls[ind, 1] = data['2mth']
    calls[ind, 2] = data['3mth']
    calls[ind, 3] = data['4mth']

df = pd.DataFrame(data=calls, index=df.trade_date, columns=['1mth','2mth','3mth','4mth'])
df

Visualise IV from Calls

with open('datas_mth_calls.pkl', 'rb') as f:
    datas_call = pickle.load(f)

df_calls = pd.DataFrame(datas_call.items(), columns=['trade_date', 'price']) 

N = len(datas_call)
calls = np.empty([N, 4])
for ind, (date, data) in enumerate(datas_call.items()):
    calls[ind, 0] = data['1mth']
    calls[ind, 1] = data['2mth']
    calls[ind, 2] = data['3mth']
    calls[ind, 3] = data['4mth']

df_calls = pd.DataFrame(data=calls, index=df_calls.trade_date, columns=['1mth','2mth','3mth','4mth'])
print('Data available', len(df_calls.dropna(how='all')), 'out of', len(df_calls))

df_calls = df_calls.dropna(how='all')
df_calls.tail()

fig,ax = plt.subplots(figsize=(12,4))
ax.plot(df_calls['1mth'])
ax.plot(df_calls['2mth'])
ax.plot(df_calls['3mth'])
ax.plot(df_calls['4mth'])
plt.show()

Whats happening to 1mth+ options series?

Why does it appear there were no quotations on some options (ATM options I remind you), prior to 2020?

I encourage you to read the NASDAQ exchange rules and you will notice in Section 5. Market Maker Quotations, this was Adopted Dec 6, 2019 and ammended a few times since then.

fig,ax = plt.subplots(figsize=(12,4))
ax.plot(spot['price'], color='tab:blue', label='Stock')
ax2=ax.twinx()
ax2.plot(spot['vol']*100, color='tab:red', label='rv_30')
ax2.plot(df_calls['1mth']*100, color='tab:green', label='iv_30')
# set x-axis label
ax.set_xlabel("year", fontsize = 14)
# set y-axis label
ax.set_ylabel("Stock Price (USD $)",
              color="tab:blue",
              fontsize=14)

ax2.set_ylabel("Volatility (%)",color="tab:red",fontsize=14)

fig.legend()
plt.show()

Is this a fair comparison?

Let’s consider time scales:

Realized Volatility (rv) is backwards looking over H time periods. $\large rv(t – H \rightarrow t)$
Implied Volatility (iv) is forwards looking over H time periods. $\large iv(t \rightarrow t+H)$

So to compare these visually we could shift rv backwards by the period H so that we can compare $\large rv_{t-H}(t \rightarrow t+H)$ to $\large iv(t \rightarrow t+H)$

spot['vol_shift'] = spot['vol'].shift(-21)
fig,ax = plt.subplots(figsize=(12,4))
ax.plot(spot['price'], color='tab:blue', label='Stock')
ax2=ax.twinx()
plt.title('IV Calls vs RV shifted')
ax2.plot(spot['vol_shift']*100, color='tab:red', label='rv_30')
ax2.plot(df_calls['1mth']*100, color='tab:green', label='iv_30')
# set x-axis label
ax.set_xlabel("year", fontsize = 14)
# set y-axis label
ax.set_ylabel("Stock Price (USD $)",
              color="tab:blue",
              fontsize=14)

ax2.set_ylabel("Volatility (%)",color="tab:red",fontsize=14)

fig.legend()
plt.show()

Now let’s check the puts data

with open('datas_mth_puts.pkl', 'rb') as f:
    datas_put = pickle.load(f)

df_puts = pd.DataFrame(datas_put.items(), columns=['trade_date', 'price']) 

N = len(datas_put)
puts = np.empty([N, 4])
for ind, (date, data) in enumerate(datas_put.items()):
    puts[ind, 0] = data['1mth']
    puts[ind, 1] = data['2mth']
    puts[ind, 2] = data['3mth']
    puts[ind, 3] = data['4mth']
# df.set_index('trade_date', inplace=True)
# df = df.dropna()

df_puts = pd.DataFrame(data=puts, index=df_puts.trade_date, columns=['1mth','2mth','3mth','4mth'])
print('Data available', len(df_puts.dropna(how='all')), 'out of', len(df_puts))

df_puts = df_puts.dropna(how='all')
df_puts

fig,ax = plt.subplots(figsize=(12,4))
ax.plot(spot['price'], color='tab:blue')
ax2=ax.twinx()
plt.title('IV Puts vs IV Calls')
ax2.plot(df_puts['1mth'], color='tab:red', label='IV Puts')
ax2.plot(df_calls['1mth'], color='tab:orange', label='IV Calls')
# set x-axis label
ax.set_xlabel("year", fontsize = 14)
# set y-axis label
ax.set_ylabel("Stock Price (USD $)",
              color="tab:blue",
              fontsize=14)

ax2.set_ylabel("Volatility",color="tab:red",fontsize=14)
fig.legend()
plt.show()