How We Built a Global Market Intelligence Engine

Global ETFs

716

Days Analysed

62.3%

Directional Accuracy

2,000

Experiments Run

The Problem: When Your Dashboard Lies

We started with a Macro Sentiment Index — five Indian market inputs with hand-picked weights. It showed a number between 0 and 100. It looked professional. Subscribers saw it on the dashboard and trusted it.

It was wrong. Not slightly wrong — structurally wrong. The weights were arbitrary. The data sources broke without warning. And most critically: it only looked at India, while Indian markets follow global flows with a one-day lag.

By the time Indian VIX, FII flows, and crude prices moved, the signal was already old. The real information was in what happened overnight in New York, London, and Tokyo.

The Approach: Let Data Pick the Weights

Instead of guessing which inputs matter, we asked the data. We loaded 716 days of daily returns from 31 global ETFs — covering every major asset class, geography, and risk factor — and measured which ones actually predict where Indian markets go the next day.

Then we applied Karpathy's AutoResearch methodology: an AI agent tested 2,000 different weight combinations, keeping only the configurations that improved risk-adjusted returns. No human bias. No favourite indicators. Just data.

What We Found

The Surprise: Credit Markets Beat Equities

The strongest predictor of Indian market direction isn't the S&P 500 or VIX — it's high yield corporate bonds. When global credit appetite is strong (investors buying riskier bonds), India tends to follow the next day. When credit markets freeze, India sells off.

This makes sense: credit markets are where institutional money moves first. By the time equity markets react, bond traders have already priced the risk.

The Zones: Five States of the World

We defined five market regimes based on 350 days of historical calm periods. The boundaries aren't arbitrary — they're statistical thresholds (standard deviations from the calm baseline).

Zone	Frequency	Nifty Next Day	Up Rate
RISK-OFF	3% of days	-0.83%	15%
CAUTION	9%	-0.30%	38%
NEUTRAL	77%	+0.08%	55%
RISK-ON	10%	+0.28%	72%
EUPHORIA	2%	+0.41%	82%

The spread is 67 percentage points between RISK-OFF (15% up rate) and EUPHORIA (82% up rate). That's not noise. That's a signal.

The Validation: Three Independent Tests

We didn't just test on the training data. We subjected the engine to three independent challenges:

Test	Result	Verdict
Null Model (1,000 random simulations)	100th percentile	Strong edge — not luck
Jackknife (remove top-3 inputs)	60.0% accuracy survives	Moderate — diversified but not fragile
Walk-Forward (12 rolling windows)	CI [55.3%, 60.5%]	Confirmed — edge persists across time

The Reality Check: Transaction Costs

Here's where we got honest. The directional accuracy is real — 62.3%. But when we added realistic trading costs (brokerage, taxes, slippage), the edge in normal markets was too thin to trade profitably on short timeframes.

The solution: don't trade every day. The engine identifies extreme zones — RISK-OFF and RISK-ON — where the edge is wide enough to survive friction. And it uses a dynamic exit rule (hold until the regime changes) rather than a fixed calendar, which improved net returns significantly.

The honest conclusion: This engine doesn't predict every day. It identifies the 23% of trading days where the signal is strong enough to overcome real-world costs. On those days, the edge is substantial.

The ML Layer: Teaching Machines to Detect Regime Breaks

The Fragility Model

Beyond the ETF composite, we built a machine learning model to detect correlation regime breaks — moments when the historical relationship between two stocks shifts. When Defence and IT normally move together but suddenly diverge, something structural has changed.

The original model was an XGBoost classifier. It achieved 89.8% accuracy — a number we initially published with pride. Then we looked deeper.

The accuracy trap: With 19:1 class imbalance (5,707 stable days vs 294 break events), a model that simply predicts "stable" every day gets 76% accuracy. Our 89.8% was barely better than a coin flip on the events that actually matter. Precision on break detection: 2.2%. F1 score: 3.8%.

74 Autonomous Experiments

We applied Karpathy's AutoResearch pattern: an AI agent ran 74 experiments overnight, testing different model architectures, sampling strategies, and problem framings. Each experiment modified the training code, ran it, measured F1 score, and kept improvements.

The breakthrough was a fundamental reframing. Instead of teaching the model what a "break" looks like (hard with only 294 examples), we taught it what "stability" looks like using thousands of stable examples. Anything that doesn't look stable gets flagged.

Metric	Before (XGBoost)	After (StackingClassifier)	Change
F1 Score	3.8%	32.3%	+750%
Precision	2.2%	71.4%	+3,145%
Recall	12.5%	20.8%	+67%

The final model is a Stacking Classifier combining Random Forest, Extra Trees, and a Neural Network as base learners, with Logistic Regression as the meta-learner. It uses SMOTE oversampling, isotonic calibration, and minority-class weighting to handle the extreme imbalance.

The Fragility Overlay

This model runs daily as a risk overlay. It doesn't generate trade signals — it adjusts how aggressively we trade them. When fragility is detected (correlation structure shifting), position sizes halve and stops widen. On 97% of days, it says "all clear" and trading proceeds normally.

Backtesting this overlay across 869 historical trades: cumulative P&L improved from +199.8% to +212.8%, max drawdown reduced from -142.4% to -130.7%, and fragile-day losses were cut in half.

What It Means for Traders

Every morning before Indian markets open, the engine reads what happened overnight across 31 global markets. It computes a single score. If that score is in the extreme zones, it fires a signal with specific trade recommendations, holding periods, and position sizes.

If the score is neutral — which is 77% of the time — it says so. No false confidence. No forced trades.

This replaces opinion with measurement. The engine doesn't care about headlines or narratives. It measures credit appetite, volatility flows, currency positioning, and institutional behaviour across the world's largest markets. Then it tells you what that means for Indian equities tomorrow.

What's Next

The engine is currently in a 4-week live shadow validation. Every day, it logs its prediction before markets open. After 20 trading days, we'll compare predictions to outcomes and publish the scorecard.

If the live results match the backtested edge, the engine moves to production. If they don't, we'll know exactly why and what to fix.

That's the difference between a dashboard that looks good and a system that works.

Methodology: 31 global ETFs, 716 trading days, 89 engineered features, 2,000 weight optimisation experiments, null-model validation, jackknife sensitivity testing, 12-fold walk-forward cross-validation, cost-aware backtesting with realistic transaction costs.

Anka Research — askanka.com
This is research, not investment advice.