The Problem: When Your Dashboard Lies
We started with a Macro Sentiment Index — five Indian market inputs with hand-picked weights. It showed a number between 0 and 100. It looked professional. Subscribers saw it on the dashboard and trusted it.
It was wrong. Not slightly wrong — structurally wrong. The weights were arbitrary. The data sources broke without warning. And most critically: it only looked at India, while Indian markets follow global flows with a one-day lag.
The Approach: Let Data Pick the Weights
Instead of guessing which inputs matter, we asked the data. We loaded 716 days of daily returns from 31 global ETFs — covering every major asset class, geography, and risk factor — and measured which ones actually predict where Indian markets go the next day.
Then we applied Karpathy's AutoResearch methodology: an AI agent tested 2,000 different weight combinations, keeping only the configurations that improved risk-adjusted returns. No human bias. No favourite indicators. Just data.
What We Found
The Surprise: Credit Markets Beat Equities
The strongest predictor of Indian market direction isn't the S&P 500 or VIX — it's high yield corporate bonds. When global credit appetite is strong (investors buying riskier bonds), India tends to follow the next day. When credit markets freeze, India sells off.
This makes sense: credit markets are where institutional money moves first. By the time equity markets react, bond traders have already priced the risk.
The Zones: Five States of the World
We defined five market regimes based on 350 days of historical calm periods. The boundaries aren't arbitrary — they're statistical thresholds (standard deviations from the calm baseline).
| Zone | Frequency | Nifty Next Day | Up Rate |
|---|---|---|---|
| RISK-OFF | 3% of days | -0.83% | 15% |
| CAUTION | 9% | -0.30% | 38% |
| NEUTRAL | 77% | +0.08% | 55% |
| RISK-ON | 10% | +0.28% | 72% |
| EUPHORIA | 2% | +0.41% | 82% |
The Validation: Three Independent Tests
We didn't just test on the training data. We subjected the engine to three independent challenges:
| Test | Result | Verdict |
|---|---|---|
| Null Model (1,000 random simulations) | 100th percentile | Strong edge — not luck |
| Jackknife (remove top-3 inputs) | 60.0% accuracy survives | Moderate — diversified but not fragile |
| Walk-Forward (12 rolling windows) | CI [55.3%, 60.5%] | Confirmed — edge persists across time |
The Reality Check: Transaction Costs
Here's where we got honest. The directional accuracy is real — 62.3%. But when we added realistic trading costs (brokerage, taxes, slippage), the edge in normal markets was too thin to trade profitably on short timeframes.
The solution: don't trade every day. The engine identifies extreme zones — RISK-OFF and RISK-ON — where the edge is wide enough to survive friction. And it uses a dynamic exit rule (hold until the regime changes) rather than a fixed calendar, which improved net returns significantly.
The ML Layer: Teaching Machines to Detect Regime Breaks
The Fragility Model
Beyond the ETF composite, we built a machine learning model to detect correlation regime breaks — moments when the historical relationship between two stocks shifts. When Defence and IT normally move together but suddenly diverge, something structural has changed.
The original model was an XGBoost classifier. It achieved 89.8% accuracy — a number we initially published with pride. Then we looked deeper.
74 Autonomous Experiments
We applied Karpathy's AutoResearch pattern: an AI agent ran 74 experiments overnight, testing different model architectures, sampling strategies, and problem framings. Each experiment modified the training code, ran it, measured F1 score, and kept improvements.
The breakthrough was a fundamental reframing. Instead of teaching the model what a "break" looks like (hard with only 294 examples), we taught it what "stability" looks like using thousands of stable examples. Anything that doesn't look stable gets flagged.
| Metric | Before (XGBoost) | After (StackingClassifier) | Change |
|---|---|---|---|
| F1 Score | 3.8% | 32.3% | +750% |
| Precision | 2.2% | 71.4% | +3,145% |
| Recall | 12.5% | 20.8% | +67% |
The final model is a Stacking Classifier combining Random Forest, Extra Trees, and a Neural Network as base learners, with Logistic Regression as the meta-learner. It uses SMOTE oversampling, isotonic calibration, and minority-class weighting to handle the extreme imbalance.
The Fragility Overlay
This model runs daily as a risk overlay. It doesn't generate trade signals — it adjusts how aggressively we trade them. When fragility is detected (correlation structure shifting), position sizes halve and stops widen. On 97% of days, it says "all clear" and trading proceeds normally.
Backtesting this overlay across 869 historical trades: cumulative P&L improved from +199.8% to +212.8%, max drawdown reduced from -142.4% to -130.7%, and fragile-day losses were cut in half.
What It Means for Traders
Every morning before Indian markets open, the engine reads what happened overnight across 31 global markets. It computes a single score. If that score is in the extreme zones, it fires a signal with specific trade recommendations, holding periods, and position sizes.
If the score is neutral — which is 77% of the time — it says so. No false confidence. No forced trades.
This replaces opinion with measurement. The engine doesn't care about headlines or narratives. It measures credit appetite, volatility flows, currency positioning, and institutional behaviour across the world's largest markets. Then it tells you what that means for Indian equities tomorrow.
What's Next
The engine is currently in a 4-week live shadow validation. Every day, it logs its prediction before markets open. After 20 trading days, we'll compare predictions to outcomes and publish the scorecard.
If the live results match the backtested edge, the engine moves to production. If they don't, we'll know exactly why and what to fix.
That's the difference between a dashboard that looks good and a system that works.
Methodology: 31 global ETFs, 716 trading days, 89 engineered features, 2,000 weight optimisation experiments, null-model validation, jackknife sensitivity testing, 12-fold walk-forward cross-validation, cost-aware backtesting with realistic transaction costs.
Anka Research — askanka.com
This is research, not investment advice.