Why test multiple horizons in an event study?

The same pattern can have completely different behavior across time horizons. An oversold signal might show strong 3-bar momentum (mean reversion starting), flat 10-bar returns (the bounce dissipated), and negative 20-bar returns (the oversold condition was signaling a larger trend breakdown). Testing a single horizon gives you a single data point. Testing five horizons (1, 3, 5, 10, 20 bars) gives you the full return profile — when the edge appears, when it peaks, and when it fades. This shapes everything about how you trade the pattern: hold period, target placement, and whether to trade it at all.

What is ATR normalization in event studies?

ATR (Average True Range) normalization expresses forward returns in units of volatility rather than raw percentage points. A 2% move in a low-volatility environment means something very different from a 2% move during a crisis. By dividing returns by the ATR at the time of each signal, event studies can fairly compare events that occurred in different volatility regimes. This also makes results comparable across different assets — a 1-ATR move in natural gas and a 1-ATR move in Treasury bonds represent similar risk-adjusted magnitudes, even though their raw percentage moves are very different.

What are the four edge verdicts in VARRD's event study?

VARRD classifies event study results into four categories based on two statistical tests — significance versus zero (does the pattern predict non-zero returns?) and significance versus market baseline (does it beat the market's average return?). STRONG EDGE means the pattern passes both tests: returns are significantly different from zero AND significantly better than the market baseline. MARGINAL means significant versus zero only — there is a real signal, but it does not clearly beat passive exposure. PINNED means significant versus market only — flat absolute returns but behaving differently from the market. NO EDGE means neither test passed — the pattern does not predict returns.

Can I run an event study on the same pattern across multiple markets?

Yes. Multi-market event studies test the same pattern formula across 2 to 10 markets in parallel. This is one of the most powerful validation techniques available because a pattern that works across correlated and uncorrelated markets is far more likely to represent a real phenomenon than one that only works on a single instrument. If an RSI oversold reversal pattern shows a strong edge on ES, NQ, and YM but fails on CL and GC, that tells you the pattern is equity-specific. VARRD runs multi-market event studies from a single command and ranks markets by strength.

Why can't I just measure forward returns in a spreadsheet?

You can calculate raw forward returns in a spreadsheet, but you will miss critical statistical safeguards. Spreadsheet analysis typically ignores overlapping signals (events that fire while a previous event is still in its measurement window), does not correct for multiple comparisons (testing many patterns inflates false positives), lacks proper significance testing (you need t-tests or bootstrap methods, not just averages), and cannot normalize by volatility regime. Most critically, a spreadsheet gives you a number — '2.3% average 5-day return' — without telling you whether that number is statistically distinguishable from random chance. An event study with proper inference tells you both the magnitude and the confidence level.

Event Studies in Trading

Q: What is an event study in trading?

An event study in trading is a statistical method that measures what happens to asset returns after a specific condition or pattern occurs. You define the 'event' (e.g., RSI drops below 30 while price is above the 200-day moving average), identify every historical instance where it fired, then measure the forward returns at multiple horizons (1, 3, 5, 10, 20 bars). The method originated in academic finance for measuring the impact of corporate events like earnings announcements and mergers, but has been adapted for systematic trading to answer: does this pattern actually predict returns?

Q: How is an event study different from a backtest?

An event study answers 'does this pattern predict returns?' while a backtest answers 'does this strategy make money with specific risk management?' An event study measures raw forward returns after each signal — no stops, no targets, just what the market does. A backtest simulates an actual trading strategy with entry rules, stop losses, take profits, and position sizing. They are complementary: the event study tells you if the underlying pattern has predictive power, and the backtest tells you if that power can be captured with real-world execution constraints.

Forward Return Analysis: The Rigorous Way to Know If a Pattern Actually Predicts Anything

Last updated: May 2026

TL;DR: An event study measures what actually happens after a trading pattern fires. You define a condition, find every historical instance, and measure forward returns at multiple horizons (1, 3, 5, 10, 20 bars) with statistical significance testing. Unlike backtesting, which simulates a strategy with stops and targets, an event study answers a more fundamental question: does this pattern predict returns at all? If forward returns after the signal are not statistically distinguishable from random chance, no amount of stop-loss optimization will create an edge. Event studies originated in academic finance for measuring the impact of corporate events. VARRD adapts the methodology for systematic trading research, automating the entire process from natural language idea to statistically validated result.

What Is an Event Study?

An event study is a statistical method that isolates a specific condition in market data and measures its impact on future returns. The idea is straightforward: if a pattern has predictive power, the returns that follow it should be measurably different from returns on an average day.

The methodology was developed in academic finance during the 1960s and 1970s, originally to study the impact of corporate events on stock prices. Researchers wanted to know: when a company announces a merger, what happens to the stock over the following days and weeks? When earnings beat expectations, how much of the move happens on day one versus day five versus day twenty?

The foundational insight was that you could not just look at raw price changes. You needed to measure abnormal returns — the difference between what actually happened and what would have happened if the event had not occurred. This requires a baseline, a statistical test, and a framework for handling multiple events that might overlap or cluster.

Systematic traders adapted this methodology for a different purpose. Instead of studying corporate announcements, they study technical and quantitative conditions: RSI dropping below a threshold, a moving average crossover, a volatility squeeze, a volume spike on a gap down. The question is the same. After this condition occurs, are future returns statistically different from normal?

How an Event Study Works

The process has four steps, each of which matters:

1. Define the Event

The "event" is a boolean condition — something that is either true or false on any given bar. It could be simple (RSI(14) < 30) or complex (a multi-condition formula involving price relative to moving averages, volume ratios, and volatility measures). What matters is that the condition is unambiguous and can be evaluated on historical data without lookahead bias.

The quality of the event definition determines everything downstream. A vague event ("price looks oversold") is untestable. A precise event (RSI(14) < 30 AND close > SMA(200) AND ATR(14)/close < 0.02) is a concrete, falsifiable hypothesis.

2. Identify Every Historical Instance

Once the condition is defined, you scan the historical data and flag every bar where it evaluated to true. These are your "event dates." The number of events matters enormously. Five events prove nothing — the sample is too small for statistical inference. Fifty events start to become interesting. Two hundred events give you a dataset with real statistical power.

This is also where you confront practical questions. If the condition fires on three consecutive bars, is that one event or three? How you handle clustered signals affects your sample size and the independence assumption underlying your statistical tests.

3. Measure Forward Returns

For each event date, you measure what happened after the signal. This is forward return analysis. You do not look at what the market was doing before the signal or during the signal. You look exclusively at what happened next.

The critical design choice is measuring at multiple horizons. A standard set might be 1, 3, 5, 10, and 20 bars forward. For each event instance, you record the return at each horizon. This gives you a distribution of returns at each time scale.

Why multiple horizons? Because the same pattern can have radically different behavior at different time scales:

A momentum signal might show strong 1-bar continuation, flat 5-bar returns (profit-taking), and resumed trend at 20 bars.
An oversold reversal might show no edge at 1 bar (the selling continues), a strong 3-5 bar bounce, and negative 20-bar returns (the oversold condition was a symptom of a larger breakdown).
A volatility contraction pattern might show nothing at 1-3 bars, then a significant directional move at 10-20 bars as the squeeze resolves.

Testing a single horizon gives you one data point. Testing five horizons gives you the complete return profile of the pattern — when the edge appears, when it peaks, and when it fades. This shapes every downstream decision: how long to hold, where to place targets, and whether the pattern is worth trading at all.

4. Statistical Significance Testing

The final step is the one that separates event studies from napkin math. You have an average forward return at each horizon. The question is not whether the average is positive or negative. The question is whether it is statistically distinguishable from zero.

An average 5-bar return of +0.8% means nothing by itself. If the standard deviation of those returns is 4%, you have a t-statistic of about 0.2 with most sample sizes — indistinguishable from noise. But if the standard deviation is 0.5%, the same average becomes highly significant. The dispersion matters as much as the mean.

Proper event studies run two significance tests at each horizon:

Versus zero: Are the average forward returns statistically different from zero? This tells you whether the pattern predicts any directional move at all.
Versus market baseline: Are the returns statistically different from the market's average return over the same horizon? A pattern that produces +0.5% average 5-bar returns sounds good until you realize the market averages +0.4% over any random 5-bar window. The signal is barely beating passive exposure.

Both tests matter. A signal that passes both has genuine predictive power. A signal that passes only the first test might just be capturing the market's background drift.

ATR Normalization: Comparing Across Volatility Regimes

Raw percentage returns have a fundamental problem: they are not comparable across different market environments. A 1% daily move in the S&P 500 during a quiet summer week is a significant event. A 1% move during a VIX spike is unremarkable. If your event study spans five years of data, some signals fired during calm markets and some fired during crises. Averaging raw returns across these regimes obscures what is actually happening.

ATR (Average True Range) normalization solves this by expressing returns in units of contemporaneous volatility. Instead of saying "the average forward return was +1.2%," you say "the average forward return was +0.7 ATR." This means the market moved 0.7 times its recent daily range in the expected direction — regardless of whether that daily range was 0.3% or 3%.

The benefits are substantial:

Fair comparison within a single market: Events from 2017 and 2020 can be meaningfully averaged.
Fair comparison across markets: A 0.5 ATR move in natural gas and a 0.5 ATR move in Treasury bonds represent comparable risk-adjusted magnitudes, even though their raw percentage moves differ by an order of magnitude.
Better risk management: ATR-normalized results translate directly into stop-loss and target distances. If the event study shows a 1.5 ATR average profit at the 10-bar horizon, you know where to place your target relative to current volatility.

Event Study vs. Backtest: Different Questions, Complementary Answers

Event studies and backtests are not competing methodologies. They answer different questions, and the strongest research uses both.

Dimension	Event Study	Backtest
Core question	Does this pattern predict returns?	Does this strategy make money with real execution?
Risk management	None (raw forward returns)	Stop loss, take profit, position sizing
What you learn	Predictive power of the signal itself	P&L, drawdown, Sharpe of a tradeable strategy
Horizons	Multiple (1, 3, 5, 10, 20 bars)	Variable (trade-by-trade, exit-dependent)
Overfitting risk	Lower (fewer free parameters)	Higher (SL, TP, hold time are all tunable)
Best for	Initial validation: is there anything here?	Strategy design: can I capture it?

The natural workflow is to run the event study first. If the pattern shows no statistically significant forward returns at any horizon, there is nothing to capture — no stop-loss optimization, no entry timing adjustment, and no clever position sizing will create an edge from a pattern that does not predict returns. The event study is the gatekeeper.

If the event study does show an edge, the backtest determines whether that edge survives the friction of real trading. A pattern with a beautiful 5-bar forward return profile might still lose money after accounting for slippage, the cost of getting stopped out on false signals, and the opportunity cost of holding through drawdowns.

An event study tells you if the fish are in the lake. A backtest tells you if your rod, line, and technique can actually catch them.

Why You Cannot Do This in a Spreadsheet

Forward returns are easy to calculate. =INDEX(Close, ROW()+5) / Close - 1 gives you a 5-bar forward return. So why not just do this in Excel?

You can compute the numbers. What you will miss is everything that determines whether those numbers mean anything:

Overlapping signals: If your pattern fires on Monday and again on Wednesday, the 5-bar forward return windows overlap. The returns are not independent observations. Treating them as independent inflates your sample size and makes insignificant results look significant. Proper event studies handle this with clustering corrections or minimum gap requirements between signals.
Statistical inference: A spreadsheet gives you an average. It does not give you a p-value, a confidence interval, or a t-statistic. An average forward return of +0.6% is meaningless without knowing whether that number could easily have occurred by chance. You need proper hypothesis testing: t-tests, bootstrap confidence intervals, or non-parametric tests.
Multiple comparisons: If you test 20 patterns, one of them will appear significant at the 5% level purely by chance. This is the multiple comparisons problem, and ignoring it is the single largest source of false discoveries in trading research. Proper event study frameworks apply corrections — Bonferroni, Holm-Bonferroni, or false discovery rate control — to account for the number of hypotheses tested.
Volatility normalization: A spreadsheet calculates raw returns. It does not automatically adjust for the volatility regime at the time of each signal. Events during calm and volatile periods get equal weight, distorting the distribution.
Lookahead bias: It is surprisingly easy to introduce lookahead in a spreadsheet. Using today's ATR to normalize a signal from today's close creates a subtle forward leak. Using a moving average that includes the current bar to define entry timing is another. These errors are invisible in a spreadsheet but systematically corrupt results.

None of these problems are exotic. They are fundamental to any honest forward return analysis. A spreadsheet computes numbers. An event study framework computes answers.

The Four Edge Verdicts

Not all statistically significant results are equally useful. The two significance tests (versus zero, versus market baseline) create a natural four-category classification:

STRONG EDGE: Significant versus zero AND versus market baseline. The pattern predicts returns and those returns beat the market. This is the gold standard — the pattern has genuine directional predictive power above and beyond passive exposure.
MARGINAL: Significant versus zero only. The pattern predicts non-zero returns, but does not clearly beat the market's average drift. There is a real signal, but it may not justify the complexity and risk of an active strategy when you could simply hold the asset.
PINNED: Significant versus market only. Absolute returns are near zero, but the behavior is statistically different from the market baseline. This can indicate a volatility pattern (the market becomes unusually flat or unusually wild after the signal) or a timing effect worth investigating further.
NO EDGE: Neither test is significant. The pattern does not predict returns. This is a complete and valuable result — you now know not to trade this idea, saving you from losses and wasted time optimizing a strategy built on noise.

"No edge" is not a failure. It is the most common honest answer in quantitative research, and it is the answer that prevents the most damage. A trader who discovers that their favorite pattern does not predict returns has learned something genuinely useful. Most traders never run this test at all — they go straight to backtesting, optimize until the equity curve looks good, and then discover the hard way with real money that the pattern was noise.

Multi-Market Event Studies

One of the strongest forms of validation in event study research is testing the same pattern across multiple markets. The logic is straightforward: a pattern that captures a real market phenomenon should work on more than one instrument. An RSI oversold reversal that works on the S&P 500 but fails on the Nasdaq, the Dow, and the Russell is probably not capturing a universal oversold dynamic — it is capturing something idiosyncratic to the S&P's specific return distribution over the test period.

Conversely, a pattern that shows a strong edge on five equity index futures and no edge on gold and crude oil tells you something valuable about the pattern's scope. It works in equities specifically. That is useful information for portfolio construction and risk management.

Multi-market testing also functions as a partial guard against overfitting. A pattern that was accidentally curve-fit to one market's noise is unlikely to simultaneously fit the noise of four other uncorrelated markets. If it shows significance across all of them, you have much stronger evidence of a real phenomenon.

VARRD runs multi-market event studies from a single command, testing the same pattern formula across 2 to 10 markets in parallel. Each market gets its own significance tests, its own forward return profile at every horizon, and its own edge verdict. The results are then ranked by strength so you can see which markets respond most powerfully to the pattern and which are indifferent.

How VARRD Automates Event Studies

The traditional event study workflow requires a researcher who can write code, manipulate dataframes, implement statistical tests, handle edge cases around overlapping signals, normalize by volatility, and correct for multiple comparisons. This is not rocket science, but it is a multi-hour process for each pattern — and most people with trading domain knowledge do not have this specific skill set.

VARRD collapses this into a single natural language interaction. You describe your trading idea in plain English — "test whether RSI below 30 with price above the 200 SMA predicts a bounce on ES futures" — and the system handles everything:

Pattern definition: Translates your idea into a precise boolean formula and visualizes it on a chart so you can verify that it matches your intent before testing.
Data loading: Loads historical market data for the instrument and timeframe you specify.
Signal identification: Evaluates the formula across the full history and identifies every instance where the condition was true.
Forward return calculation: Measures ATR-normalized returns at 1, 3, 5, 10, and 20 bar horizons for every signal.
Statistical testing: Runs significance tests at each horizon (versus zero and versus market baseline), produces exact p-values, and applies multiple comparison corrections to account for the total number of hypotheses tested.
Edge verdict: Classifies the result as STRONG EDGE, MARGINAL, PINNED, or NO EDGE based on the dual significance framework.
Trade setup: If an edge is found, generates exact entry, stop-loss, and take-profit levels based on the validated statistical model and current market conditions.

The entire process — from idea to statistically validated result — takes about thirty seconds. If the event study shows an edge and you want to go further, you can run a backtest with stops, optimize stop-loss and take-profit parameters, or test the same pattern across additional markets. Each step builds on the last.

VARRD is accessible as a web application, through the MCP protocol at app.varrd.com/mcp for AI agents and tools like Claude Desktop and Cursor, and as a CLI for developers (pip install varrd). The event study engine is the same regardless of how you access it.

The Honest Answer Is the Valuable One

Most trading research tools are optimized to produce a "yes." Backtesting platforms let you tweak parameters until the equity curve points up. Optimizers find the stop-loss and take-profit combination that maximizes profit factor on historical data. The implicit assumption is that finding a profitable result is the goal.

Event studies invert this. The goal is not to find a profitable pattern. The goal is to know whether a pattern is profitable. That is a fundamentally different objective, and it leads to a different relationship with "no edge" results.

When an event study returns no statistically significant forward returns, the correct response is not disappointment. It is gratitude. You just learned — in thirty seconds, with rigorous statistics — that a pattern you might have spent months trading does not predict returns. Every dollar you do not lose on a non-edge is a dollar available for a real one.

The traders who build durable, long-term profitability are not the ones who find the most patterns. They are the ones who most efficiently discard the patterns that do not work.

Frequently Asked Questions

What is an event study in trading?

An event study is a statistical method that measures what happens to asset returns after a specific condition or pattern occurs. You define the event (a boolean condition on price, volume, or indicator data), identify every historical instance, and measure forward returns at multiple horizons with statistical significance testing. The method originated in academic finance for corporate events and has been adapted for systematic trading research.

How is an event study different from a backtest?

An event study measures raw forward returns after a signal fires — no stops, no targets, no position sizing. It answers "does this pattern predict returns?" A backtest simulates an actual trading strategy with entry rules, stop losses, take profits, and execution constraints. It answers "does this strategy make money?" They are complementary: use the event study first to validate the signal, then backtest to design the strategy around it.

Why test at multiple time horizons?

The same pattern can behave very differently at different time scales. An oversold signal might show a strong 3-bar bounce but negative 20-bar returns. A momentum signal might show nothing at 1 bar but a significant move at 10-20 bars. Testing at 1, 3, 5, 10, and 20 bars reveals the full return profile — when the edge appears, when it peaks, and when it fades — which shapes hold period, target placement, and whether the pattern is tradeable at all.

What is ATR normalization and why does it matter?

ATR normalization expresses forward returns in units of recent volatility rather than raw percentage points. A 1% move during low volatility is very different from a 1% move during a crisis. Normalizing by ATR makes events from different volatility regimes and different markets directly comparable, and translates results into actionable stop and target distances.

How many signal occurrences do I need for a reliable event study?

More is better, but as a rough guideline: fewer than 20 events makes statistical inference unreliable. 30-50 events is the minimum for basic significance testing. 100+ events gives you meaningful statistical power and the ability to detect moderate effect sizes. If your pattern fires only 5 times in 10 years of data, the result — whether positive or negative — will have wide confidence intervals and limited reliability.

What does "no edge" mean in an event study?

"No edge" means the forward returns after the signal are not statistically distinguishable from zero or from the market baseline at any horizon tested. The pattern does not predict returns. This is a complete and valuable result — it tells you not to build a strategy around this pattern, saving you from losses on a non-edge. Most patterns tested honestly return "no edge," and accepting that is a core part of rigorous trading research.

Can I use VARRD's event study from my own code or AI agent?

Yes. VARRD exposes its event study engine through MCP (Model Context Protocol) at app.varrd.com/mcp, which is compatible with Claude Desktop, Cursor, and any MCP client. Developers can also use the Python CLI (pip install varrd) or the REST API. The event study runs the same statistical engine regardless of the access method.

The Edge Library: Validated Edges Running 24/7

Beyond testing your own ideas, VARRD maintains a growing library of statistically validated edges across futures, equities, and crypto — monitored against live market data around the clock. When an edge fires, you see the market, direction, entry, stop, target, hold period, and the complete audit trail of how it was discovered and validated.

What you get at each tier:

Free — see which markets have edges firing right now
$0.50 — direction, win rate, expected value, stop/target, entry date for ALL active edges
$1/edge — full methodology: formula, discovery story, per-horizon p-values, Monte Carlo, regime analysis, edge decay
$5 — everything on every edge at full depth

Every edge shows its post-discovery performance tracked separately from in-sample results — so you can see whether the edge is holding up in real time or decaying. Full transparency.

Access via MCP (varrd_edges tool), CLI (varrd edges), or web app (app.varrd.com).

See What\'s Firing Right Now

Describe your idea in plain English. Get forward returns, p-values, and an edge verdict.
Event study, backtest, multi-market — one platform, rigorous statistics.
$2 free credits on signup. ~$0.30 per research session.

Open Web App View on GitHub

MCP: app.varrd.com/mcp | CLI: pip install varrd

This guide is maintained by VARRD Inc. and reflects VARRD's approach to event study methodology in systematic trading research. Last updated May 2026.