Anyone who has built a quantitative model on ASX small-caps knows the particular joy of opening a return matrix and finding it looks like Swiss cheese. Thin trading, halts, suspensions, data vendor gaps — the missing observations pile up fast. And here is the dirty secret: how you handle those gaps matters enormously, often more than the fancy factor model you built around them.
The naive approaches — dropping stocks with incomplete histories, or forward-filling the last known price — both introduce serious bias. Dropping observations creates survivorship distortion. Forward-filling inflates autocorrelation and destroys any volatility signal you were hoping to detect. Traders relying on these shortcuts are essentially building on sand and wondering why the backtest does not translate to live performance.
The Kalman filter, introduced by Rudolf Kálmán in 1960, was originally designed for aerospace navigation — estimating the position of a rocket when sensor readings drop out. The analogy to small-cap equities is uncomfortably perfect. You have a true latent return process ticking along each day; some days a transaction occurs and you observe it noisily, other days the stock simply does not trade and you observe nothing. The filter propagates your best estimate of that latent state forward through measurement gaps, updating its uncertainty band honestly as it goes. No invented observations, just calibrated uncertainty.
Where the Kalman filter shines in real-time sequential estimation, the Expectation-Maximisation algorithm takes a different approach that suits batch model fitting. EM treats missing returns as latent variables and iterates between imputing them given current parameter estimates (E-step) and re-estimating model parameters given imputed data (M-step). For a factor model applied to a large sparse ASX small-cap universe, EM can jointly estimate factor loadings and fill gaps without ever needing a complete return matrix. The two methods are genuinely complementary — Kalman for online state tracking, EM for offline parameter estimation — and researchers have found combining them in a state-space framework yields the best of both. For deeper grounding, the original Kalman filter derivation on Wikipedia is surprisingly readable, the maximum likelihood estimation explainer on Investopedia contextualises the EM optimisation objective neatly, and the EM algorithm article on Wikipedia covers convergence properties worth understanding before trusting your imputed covariance matrix.
The practical takeaway is simple: before you run any backtest or risk model on ASX small-caps, audit your missing data rate per stock. Anything above 20% non-trading days deserves proper state-space treatment, not a spreadsheet shortcut.
Your model is only as honest as the data you fed it — garbage imputation in, garbage alpha out.
This content is for educational purposes only and does not constitute financial product advice. Past performance is not indicative of future results. Profit Logic Ltd (ACN 688 669 936) accepts no responsibility for errors or omissions in this content or anywhere on this website. Always seek advice from a licensed financial adviser before making investment decisions.