Forecasting the NASDAQ-100

Introduction

This spring, I managed a quantitative research pod for Paragon Global Investments, which is, I suppose, the premier intercollegiate quantitative finance group (not that we have much competition). I had my pod work on a model that forecasts a probability distribution of the $\text{NASDAQ-100}$ ( $\text{NDX}$ ) closing level. In particular, our model runs on days with scheduled, market-moving events, e.g. Fed rate cut decisions and Mag 7 quarterly earnings. Our alpha is found on the Kalshi $\text{KXNASDAQ100}$ series, a stratified market with really high spreads ( $\approx15$ c), which indicates a great deal of uncertainty on the part of market participants. We trade on the series by using our aforementioned distribution to generate per-bucket trading signals with bootstrap-derived confidence intervals. Special thanks to Tommy S. and David H. for their excellent technical work on this, as well as Forrest G., my boss/mentor.

Alpha Thesis

To motivate the project a bit, it’s worth noting that forecasting some point estimate for the $\text{NDX}$ closing level and then trading on that estimate itself is very silly. Your decision-making space is quite constrained: you can short if your prediction is lower, long if higher, overleverage as much as you want, run $\text{0DTE}$ options, et cetera, but you are fundamentally just taking directional bets on the assumption that the market is mispriced. If you are really beating an index consistently, RenTech is always hiring! In practice, this is probably just coin-flipping with slight negative EV, because you are not nearly that good at modelling.

Okay, so if forecasting a point estimate is unrealistic, why is forecasting a distribution much better? It turns out that forecasting distributions captures a lot of additional information. Consider the following toy example. Suppose that the entire $\text{NDX}$ depends on a single company $c$ (necessarily it can’t because of the whole $100$ thing… but suppose the other $99$ comprise some negligible $\epsilon$ ). Suppose, further, that the company has an earnings call today and we have information regarding that call which no one else has: if the call goes well, with $p=0.75$ , then $\Delta \text{price}(c)=+10$ , and if it goes poorly, with $p=0.25$ , then $\Delta \text{price}(c)=-30$ . Then $\mathbb{E}[\Delta \text{price}(c)]$ is $0$ , and if our point estimate takes the e.g. $\text{MSE}$ (very reasonable), then the $\widehat{\text{price}(c)}=\text{price}(c)_{t=0}$ . This is very unfortunate for us, because despite having information which isn’t priced into the distribution (by hypothesis), the information is priced into the point. Notice how e.g. $\text{Var}[\Delta\text{price}(c)] \;=\; 0.75 \cdot 10^{2} + 0.25 \cdot 30^{2} - 0^{2} =300$ is unique information which trading on a point doesn’t incorporate.

Of course, in this case there are clever things we can do with options, but we would prefer less informed counterparties. That is why we turn to the $\text{KXNASDAQ100}$ series. The spreads on that series are generally quite high, owing to uncertainty. And, moreover, amateurish market makers often quote with an unsophisticated Gaussian distribution about the current price. Therefore, we aim to forecast the closing level distribution influenced by these market-moving events, hoping that the fat tails unaccounted for by makers can generate us profit.

Importantly, there are good reasons to think kurtosis is high enough for alpha. For starters, a few companies comprise much of the $\text{NDX}$ ; the index is ridiculously concentrated (Nvidia comprises $14\%$ ), with the only salient comparison being the dot-com bubble. Thus, if these highly weighted companies are capricious and volatile, they will have a meaningful impact on the index. And they very much are! Trailing $\text{P/E}$ is at an all-time high, $\text{VVIX}$ and $\text{SKEW}$ are close to or at all-time highs, and so on.

Okay, so we have a strong pretext: certain events have outsized impacts on companies which have outsized impacts on our index price. What are these events?

Forecasting

We curate a $\text{JSON}$ event file where we track events of interest starting in 2019—a year chosen because that’s when a lot of the aforementioned theses became particularly true. The events we care about include $444$ macro events (FOMC, CPI, NFP, PCE, Jackson Hole, etc.), $309$ mega-cap earnings prints, and $108$ AI lab events (Nvidia GTC keynotes, OAI/Anthropic/GDM model releases, Apple WWDC, etc.).

We then have an audit document that maps each entry to primary sources, and we had subagents run multiple pass-throughs to be thorough. In particular, we had to be very careful with which market dates we attributed to which events. For instance, the 2025 government shutdown delayed several CPI and NFP releases (and of course Trump decided to just not do an October 2025 CPI), and Good Friday occasionally (but not always!) messes with the NFP, the list goes on. Then across the board, events occurring outside market hours had to be mapped to the next day’s market—these overnight factors must be additionally accounted for because the $\text{KXNASDAQ100}$ series runs 24/7.

Then, for each $\textbf{(event\_type, ticker)}$ pair we computed marginal statistics—the mean, standard deviation, skewness, and excess kurtosis—over the historical reaction-day returns for that bucket. We end up with $1{,}530$ ticker-bucket marginal records. We then generated a Pearson correlation matrix across tickers from the same panel in order to have high-dimensional correlation vectors. We discover some fascinating stuff through this matrix.

Consider for instance $\text{AMGN}$ (Amgen) and $\text{TMUS}$ (T-Mobile). With $\text{COST}$ (Costco) earnings, they have a correlation of $\rho=+0.77$ . Yet with $\text{META}$ (Meta) earnings, $\rho=-0.36$ . For those curious about the statistics, $n>30$ for both, and it’s still $+0.74,-0.26$ with $\text{LOO}$ . Such behavior is intuitive. Costco prints are treated as a consumer-spending/staples signal, so defensive yield megacaps (which Amgen and T-Mobile are, as both pay solid dividends, have low beta w/r/t the cyclical tape, etc.) move in lockstep. However, Meta’s print is taken as a signal of how digital advertisements and consumer engagement are performing—advertising was literally $98\%$ of their revenue in 2025. Think of how many T-Mobile and Amgen ads you have seen—for me, probably hundreds versus zero. So, the behavior aligns with our heuristics. Note how important the Pearson correlation matrix is—if we had simply calculated $1$ -d correlation, hoping to run some mean-reversion strategy, like every other unsophisticated algotrading project, we lose this nuance entirely!

The universe for a given event type is then the intersection of tickers with the cached returns for every reaction date, which keeps the series equal-length so we can do correlation estimation. Otherwise, unequal-length pairwise estimates would mean our covariance matrix would lose its positive semidefiniteness ( $\text{PSD}$ ).

To forecast on dates with multiple events, we treat each event’s shock as an independent additive contribution. Obviously, these events are not actually uncorrelated, but we don’t have sufficient sample size to train our model as if they are correlated, so it is what it is—this affects $23.1\%$ of event-days. Adding them works out neatly because the per-event $\mu$ vectors sum without an issue, and the $\Sigma$ matrices are $\text{PSD}$ (the sum of $\text{PSD}$ matrices remains $\text{PSD}$ ). We care about the $\text{PSD}$ property because it means that Cholesky decomposition stays well-defined. We use Cholesky decomposition since it considerably speeds up our computation; it factorises in $\tfrac{1}{3} n^3 + O(n^2)$ cf. eigendecomposition’s $O(9 n^3)$ .

Finally, for the actual Monte Carlo simulation, we draw $n=100{,}000$ joint component-return vectors, then aggregate each draw into the actual $\text{NDX}$ using $\text{NNLS}$ -estimated index weights for each company, then apply this to the current price. The result is a Monte Carlo distribution over the closing level.

A brief aside: component companies about which we are agnostic we simulate with Geometric Brownian Motion (GBM), as is standard practice. The GBM is fine-tuned such that $\sigma$ is the annualised realised volatility of $\text{NDX}$ log-returns over the $504$ trailing trading days, again, as is standard practice. We are content with this simplistic model because they make up a sliver of market cap ( $13.15\%$ ). The equation we use is as follows, with $\sigma$ calculated based on window:

dS_t \;=\; \mu\, S_t\, dt \,+\, \sigma\, S_t\, dW_t \quad\Longrightarrow\quad S_T \;=\; S_0 \exp\!\Bigl(\bigl(\mu - \tfrac{1}{2}\sigma^{2}\bigr)\,T + \sigma\sqrt{T}\, Z\Bigr), \quad Z \sim \mathcal{N}(0, 1).

Backtest

The model has two particularly relevant hyperparameters which we fine-tune in the backtest: $\textbf{mean\_shrinkage}$ (how aggressively to bias towards zero, which we do for bias-variance trade-off reasons), $\textbf{stdev\_shrink\_weight}$ (how much to shrink per-ticker $\sigma$ ’s). There are some more but they are relatively immaterial.

We score a forecast with Continuous Ranked Probability Score (CRPS). The scoring rule penalises a predictive CDF $F$ for being far (in $L^2$ ) from the point mass at the true realised outcome $y$ :

\text{CRPS}(F, y) \;=\; \int_{-\infty}^{\infty} \bigl(F(x) - \mathbf{1}\{x \geq y\}\bigr)^{2}\, dx.

In particular for a sample-based forecast

\{X^{(k)}\}_{k=1}^{N}

(which is what our Monte Carlo simulation is), the empirical estimator becomes:

\widehat{\text{CRPS}}\bigl(\{X^{(k)}\},\, y\bigr) \;=\; \frac{1}{N} \sum_{k=1}^{N} \bigl|X^{(k)} - y\bigr| \;-\; \frac{1}{2 N^{2}} \sum_{j=1}^{N}\sum_{k=1}^{N} \bigl|X^{(j)} - X^{(k)}\bigr|.

A lower CRPS is better. The first term rewards accuracy (it represents the mass near

y

) while the second term rewards sharpness (it prefers sharper distributions). When

N=1

, CRPS is equivalent to mean absolute error, as one would expect. It is reasonably conceptualised as the generalisation of a point-forecast

\text{MAE}

to a full distributional forecast. We also use confidence intervals because they are easy to use and easy to understand.

For backtesting, we use three periods. The first is 2019-01-01 through 2021, the second is 2022-01-01 through 2023, and the third is 2024-01-01 to now. We use the first to calculate training statistics (the per-event marginals and the Pearson correlation matrices), so the model conditions on strictly historical data. The next window is validation, on which we test those aforementioned hyperparameters: $\textbf{mean\_shrinkage} \in [0, 1]$ and $\textbf{stdev\_shrink\_weight} \in [0, 0.5]$ . We first sweep a $200$ -cell grid ( $20 \times 10$ ) to find a coarse optimum, then refine via Nelder-Mead, seeded from the best grid cell with tolerances $x_{\text{atol}} = 0.003,\; f_{\text{atol}} = 0.05$ . It’s worth noting (I didn’t notice this initially) that we cannot use a single window to both calculate the statistics and the hyperparameters, since the hyperparameters correct the overfitting of the training statistics, and would both fine-tune to $0$ tautologically if we had run this on a single window. The refined configuration is $\textbf{mean\_shrinkage} = 0.9869$ , $\textbf{stdev\_shrink\_weight} = 0.0009$ .

On the held-out test window, the tuned model beats the untuned default by $-6.93\%$ mean CRPS, with a $95\%$ percentile bootstrap confidence interval of $[-11.66\%, -2.48\%]$ . The CI excludes zero, so our tuning lift is statistically significant! Unfortunately, against the GBM baseline on the same window, it returns $+1.46\%$ CRPS with a $95\%$ CI of $[-0.83\%, +3.69\%]$ —a statistical tie, since zero is in the CI. To be fair, this isn’t any serious indictment; if we beat GBM, that would probably mean we can directly trade on the $\text{NDX}$ . We are certainly good enough to trade on Kalshi! Moreover, upon some further analysis, our consideration of kurtosis means we predict days with fat tails incredibly well—significantly better than GBM, in fact.

Tail Diagnostics

An aggregate CRPS tie could mean a lot of things. Perhaps both models are interchangeable at every part of the distribution (in which case we have reinvented the wheel), or, hopefully, we have fatter tails that only pay off on the days when the realised closing level lands in them. Since our entire alpha thesis hinges on the latter, it’d be really nice if the latter is true. And it is! If we restrict CRPS analysis to the days where the realised $\text{NDX}$ move was large, we get some interesting data.

$\Delta \text{NDX}$ at least	$n$	our CRPS	GBM CRPS	$\Delta$ CRPS	$95\%$ Interval
$0.0\%$ (all days)	$156$	$174.42$	$171.91$	$+1.5\%$	$[-0.8\%, +3.7\%]$
$1.0\%$	$67$	$279.64$	$286.43$	$-2.4\%$	$[-4.3\%, -0.6\%]$
$1.5\%$	$39$	$364.61$	$376.45$	$-3.1\%$	$[-5.5\%, -0.9\%]$
$2.0\%$	$22$	$480.00$	$497.19$	$-3.5\%$	$[-6.3\%, -0.6\%]$
$2.5\%$	$11$	$644.78$	$671.35$	$-4.0\%$	$[-6.9\%, -0.7\%]$

The improvement grows monotonically with how extreme the $\text{NDX}$ move was, and, in particular, the $95\%$ CI excludes zero at all levels, indicating statistical significance. In plain English: on the days we care about—those where $\text{NDX}$ moved substantially in response to an event—our model dominated GBM by a statistically significant margin, and that margin strictly increases as $\text{NDX}$ movement increases.

Implications

In summation, aggregate CRPS on event days is approximately a tie because GBM's narrower distribution beats us on quiet days, but we win substantially on days where realised moves are large—precisely the regime in which we hope to profit on Kalshi.

Materialising our Edge

Kalshi sells daily range contracts on the $\text{NDX}$ close. They pay a dollar if it settles in the bucket and nothing otherwise. The full set of buckets (which include catch-alls on both ends, e.g. “ $18999.99$ or below”) partition the price axis, so they collectively define a market-implied probability distribution—though we need to strip vig (subtract half-spread) and normalise because of how poorly the market is priced. By bucketing the probability distribution implied by our Monte Carlo simulation, we can compare the two. If our model's bucket probability sits meaningfully (not that fees are particularly high—around a percent) above the ask on a $\text{YES}$ contract, buying $\text{YES}$ has positive EV, and vice versa. Doing so is fairly straightforward. We bin our $100{,}000$ Monte Carlo draws into the bucket boundaries pulled from Kalshi’s $\text{REST API}$ , then we compare per-bucket model probability against the current top-of-book bids and asks. If our edge is sufficiently high, we buy (where edge threshold depends on risk appetite).

Execution and Infrastructure

Our codebase revolves around a few load-bearing pieces.

Our data layer caches $162$ of the $192$ $\text{NDX}$ constituents from our time scope—about $84\%$ of tickers and $\approx95\%$ of index value—as daily $\text{OHLCV}$ $\text{CSV}$ s. We verified historical membership per-date with the $\textbf{n100tickers}$ package. Then their index weights are estimated by running a non-negative least squares regression of the $\text{NDX}$ daily return on component daily returns, subject to $w_{i} \geq 0$ (weights are nonnegative) and normalised so they sum to $1$ . Our result has an $R^2=0.998$ , so our basket faithfully reproduces $\text{NDX}$ . This process was necessary because historical weights are paywalled—and our recreation is excellent.

Our event-statistics pipeline has three sequential scripts. The first, $\textbf{build\_event\_returns.py}$ , computes the intersection of tickers with the cached returns for every reaction date (keeping series equal-length). Then $\textbf{build\_event\_marginals.py}$ computes the marginal statistics (mean, standard deviation, skew, and kurtosis). Finally, $\textbf{correlation.py}$ ingests the outputs of the prior steps, building a Pearson correlation matrix per event type.

Then live signal generation ( $\textbf{main.py signals}$ ) pulls the current $\text{NDX}$ from yfinance, runs our model on that price, then fetches the Kalshi orderbook to find per-bucket signals.