Minimum Backtest Length: Avoid Overfitting Traps

Every quant has been there. You've built a strategy, run the backtest, and the equity curve looks like it was drawn by someone who already knew the answer. Sharpe ratio of 2.4, maximum drawdown of 8%, CAGR that makes your super fund look embarrassing. The problem? That beautiful curve might be pure fiction — a statistical ghost conjured by too many optimisation attempts on too little data.

This is the overfitting problem, and it's nastier than most traders realise. It's not just about curve-fitting a moving average crossover. Every time you test a hypothesis, tweak a parameter, or discard a strategy that didn't work and try another, you're effectively running multiple comparisons. The more attempts you make, the higher the probability that one result looks great purely by chance. Statisticians call this the multiple comparisons problem. Traders call it "finding" a strategy. The market calls it a donation.

Bailey and Lopez de Prado formalised this intuition in their 2014 paper with a concept called the Minimum Backtest Length, or MinBTL. The core idea is elegant. Given a target Sharpe ratio, a number of independent strategy trials, and the statistical properties of your returns, you can calculate the minimum data length required for that Sharpe ratio to be credible rather than coincidental. Think of it like a courtroom standard of evidence — the more suspects you've already ruled out, the more proof you need before convicting the next one.

The practical formula derived by Bailey and Lopez de Prado shows that MinBTL grows roughly with the expected maximum Sharpe ratio across all trials — which itself scales with the logarithm of the number of trials. Run 20 strategy variants and your required backtest length might be 15 years. Run 200 variants and it jumps past 40 years. Most strategies don't have 40 years of relevant, regime-consistent data. That's the trap. For deeper grounding in the statistics, Investopedia's overview of overfitting captures the concept cleanly, and the underlying mechanics draw heavily on the multiple comparisons problem — a cornerstone of applied statistics. Bailey and Lopez de Prado's broader framework for evaluating strategy performance is also grounded in a more rigorous interpretation of the Sharpe ratio than most practitioners apply.

The takeaway you can use today: keep a trial log. Every strategy you test, every parameter set you run, every variant you discard — record it. That count is the denominator in your credibility calculation, and ignoring it doesn't make the problem disappear. It just makes your overfit curve look prettier.

A backtest without a trial count isn't evidence — it's a story you told yourself until the numbers agreed.

This content is for educational purposes only and does not constitute financial product advice. Past performance is not indicative of future results. Profit Logic Ltd (ACN 688 669 936) accepts no responsibility for errors or omissions in this content or anywhere on this website. Always seek advice from a licensed financial adviser before making investment decisions.

Minimum Backtest Length: Avoid Overfitting Traps

Related Articles

COT Data: What Futures Positioning Reports Really Show

Index Inclusion Events and Price Distortions

FX Session Overlaps: Liquidity, Spreads & Depth