Why most trading strategies are fake

There is an extremely high false discovery rate in both the academic and financial industry for trading strategies that “produce” alpha. In fact, most of these strategies are false discoveries due to research bias, multiple testing and the true probability of finding a new investment strategy being very low (<< 1%) due to competition.

As stated by Marcos Lopez de Prado with a true probability of a backtested strategy being profitable at 1%, and 80% power (rate of identifying true strategies), in testing 1000 trading strategies using a standard threshold of significance level at 5% would imply at least 86% false discoveries!

Today we investigate issues of multiple testing and false discovery of a profitable trading strategy. We develop a momentum-based trading strategy on Apple stock and show the issues that can arise from unknowingly completing multiple testing on the same dataset.

Papers discussed:

Evaluating Trading Strategies: https://www.stat.berkeley.edu/~aldous/157/Papers/harvey.pdf

The Pitfalls of Econometric Analysis (Marcos Lopez de Prado): https://www.quantresearch.org/Lectures.htm

Scientific method: Statistical errors: https://www.nature.com/articles/506150a

Moving to a World Beyond “p<0.05”: https://www.tandfonline.com/doi/pdf/10.1080/00031305.2019.1583913?

Why your trading strategy doesn’t work

The Perils of Multiple Testing – p-hacking during backtesting.

Here we use the example of a classic Simple Moving Average Crossover strategy, using Backtrader in Python. https://www.backtrader.com/home/helloalgotrading/

import datetime
import time
import math
import numpy as np
import pandas as pd
import scipy as sc
import matplotlib.pyplot as plt
from pandas_datareader import data as pdr
import backtrader as bt
import quantstats
import concurrent.futures as cf
from itertools import product

%matplotlib widget
%matplotlib inline
# import data
def get_data(stocks, start, end):
    stockData = pdr.get_data_yahoo(stocks, start, end)
    return stockData

stockList = ['AAPL']
endDate = datetime.datetime.now()
startDate = endDate - datetime.timedelta(days=2000)

stockData = get_data(stockList[0], startDate, endDate)
stockData = stockData.sort_values(by="Date")

stockData_IS = stockData[:int(len(stockData)*0.75)]
stockData_OS = stockData[-int(len(stockData)*0.25):]

print(len(stockData), len(stockData_IS), len(stockData_OS))

actualStart = stockData.index[0]

data = bt.feeds.PandasData(dataname=stockData_IS)

print('IS DATA: starting ', stockData_IS.index[0],' finshing ', stockData_IS.index[-1])
print('OS DATA: starting ', stockData_OS.index[0],' finshing ', stockData_OS.index[-1])

Define your trading strategy are a class in python.

# Create a subclass of Strategy to define the indicators and logic

class MAcrossover(bt.Strategy):
    # list of parameters which are configurable for the strategy
    params = dict(
        pfast=10,  # period for the fast moving average
        pslow=20   # period for the slow moving average
    def log(self, txt, dt=None):
        dt = dt or self.datas[0].datetime.date(0)
#         print(f'{dt.isoformat()} {txt}') # Comment this line when running optimization

    def __init__(self):
        sma1 = bt.ind.SMA(period=self.p.pfast)  # fast moving average
        sma2 = bt.ind.SMA(period=self.p.pslow)  # slow moving average
        self.crossover = bt.ind.CrossOver(sma1, sma2)  # crossover signal
    def notify_order(self, order):
        if order.status in [order.Submitted, order.Accepted]:
            # An active Buy/Sell order has been submitted/accepted - Nothing to do

        # Check if an order has been completed
        # Attention: broker could reject order if not enough cash
        if order.status in [order.Completed]:
            if order.isbuy():
                self.log(f'BUY EXECUTED, {order.executed.price:.2f}')
            elif order.issell():
                self.log(f'SELL EXECUTED, {order.executed.price:.2f}')
            self.bar_executed = len(self)

        elif order.status in [order.Canceled, order.Margin, order.Rejected]:
            self.log('Order Canceled/Margin/Rejected')

        # Reset orders
        self.order = None

    def next(self):
        if not self.position:  # not in the market
            if self.crossover > 0:  # if fast crosses slow to the upside
                self.buy()  # enter long

        elif self.crossover < 0:  # in the market & cross to the downside
            self.close()  # close long position

Define Commission Scheme.

class FixedCommisionScheme(bt.CommInfoBase):
    paras = (
        ('commission', 10),
        ('stocklike', True),
        ('commtype', bt.CommInfoBase.COMM_FIXED)

    def _getcommission(self, size, price, pseudoexec):
        return self.p.commission

Also create a sizing function for each trade, this takes a risk parameter.

class maxRiskSizer(bt.Sizer):
    Returns the number of shares rounded down that can be purchased for the
    max rish tolerance
        # list of parameters which are configurable for the strategy
    params = dict(

    def __init__(self):
        if self.p.prisk > 1 or self.p.prisk < 0:
            raise ValueError('The risk parameter is a percentage which must be'
                'entered as a float. e.g. 0.5')

    def _getsizing(self, comminfo, cash, data, isbuy):
        if isbuy == True:
            size = math.floor((cash * self.p.prisk) / data[0])
            size = math.floor((cash * self.p.prisk) / data[0]) * -1
        return size

Now we can perform a number of runs, ‘tuning’ hyperparameters for the best in-sample performance and then we will test the best combination (highest return over trading period) out-of-sample with testing set.

optimized_runs = {}
def run(data, params, graph=False, benchmark=False):
    #Add Data
    cerebro = bt.Cerebro()

    cerebro.addanalyzer(bt.analyzers.SharpeRatio_A, _name='sharpe_ratio')
    cerebro.addanalyzer(bt.analyzers.PyFolio, _name='PyFolio')

    # Broker Information
    broker_args = dict(coc=True)
    cerebro.broker = bt.brokers.BackBroker(**broker_args)
    comminfo = FixedCommisionScheme()

    # Add Strategy
    if benchmark:
        cerebro.addstrategy(MAcrossover, pfast=params[0],pslow=params[1])

        #Default position size
        cerebro.addsizer(maxRiskSizer, prisk=params[2])

    strats = cerebro.run()
    if graph:
        cerebro.plot(iplot=False, style='candlestick')
    return strats


params = list(product(pfast, pslow, prisk))
# params

start = time.time()
for param in params:
    optimized_runs[param] = run(data, param)
end = time.time()
print('  time taken {:.2f} s'.format(end-start))

Now ‘optimise’ the runs from above.

final_results_list = []
for runs in optimized_runs:
    for strategy in optimized_runs[runs]:
        PnL = round(strategy.broker.get_value() - 10000,2)
        sharpe = strategy.analyzers.sharpe_ratio.get_analysis()
            strategy.params.pslow, round(runs[2],1), PnL, round(sharpe['sharperatio'],2)])
sort_by_sharpe = sorted(final_results_list, key=lambda x: x[3], 

sort_by_sharpe = sorted(sort_by_sharpe, key=lambda x: x[4], 
for line in sort_by_sharpe[:]:

Now we can test out-of-sample and see how we’ve done. Then we can understand why we need to adjust sharpe ratio t-statistic to account for multiple testing!!!

data_OS = bt.feeds.PandasData(dataname=stockData_OS)

strats = run(data,(5,50,1), graph=True)
strat_0 = strats[0]
portfolio_stats = strat_0.analyzers.getbyname('PyFolio')
returns, positions, transactions, gross_lev = portfolio_stats.get_pf_items()
vol = np.std(returns)*np.sqrt(252)
returns.index = returns.index.tz_convert(None)
PnL = round(strat_0.broker.get_value() - 10000,2)

sharpe = strat_0.analyzers.sharpe_ratio.get_analysis()

print('PnL $                  : ',round(PnL,2))
print('Sharpe Ratio           : ',round(sharpe['sharperatio'],2))
print('Annualised Volatility %: ', round(vol*100,2))
t_statistic = sharpe['sharperatio']*np.sqrt(len(stockData_IS)/252)
print('Our T-statistic: ',round(t_statistic,2))
print('p_value ', round(sc.stats.t.pdf(t_statistic,999),3))
print('T_crit at 5% CI', round(sc.stats.t.ppf(1-0.05,999),3))

Is this statistically significant?

Here we ask ourselves, is this statistically different from a result of a portfolio that has a sharpe ratio of 0.

t_statistic = sharpe['sharperatio']*np.sqrt(len(stockData_IS)/252)
print('Our T-statistic: ',round(t_statistic,2))
print('p_value ', round(sc.stats.t.pdf(t_statistic,999),3))
print('T_crit at 5% CI', round(sc.stats.t.ppf(1-0.05,999),3))

Based on this we would assume that our result is significant, BUT we need to account for multiple testing. When we ‘tuned’ our hyperparameters of the model, this is essentially performing multiple testing on the same dataset. This is a BAD research practice, and is one of the key things Dr Marcos Lopez de Prado preaches against, because it’s seen again an again in the finance industry.

Below we use the Bonferroni adjustment in our p-value to see if our result is significant. We divide the significance value by 216 as this is the number of tests we completed (permutations of hyperparaters) on the same dataset.

print('T_crit at adjusted Bonferroni t-stat', round(sc.stats.t.ppf(1-0.05/216,999),3))

Please remember this is an example of what not to do! Don’t complete multiple testing without changing the p_value appropraitely please.