For more than 15 years, I used a traditional approach to decision-making: collect information, evaluate options, and choose the “best” outcome. Sometimes it worked. More often, it resulted in weeks of overanalyzing, followed by second-guessing once the results unfolded.
That all shifted when I began to learn about how modern A.I. systems actually make decisions.
The most advanced AI systems don’t attempt to select the perfect choice. They are optimizers for regret minimization, a game-theoretic concept commonly used in machine learning systems. That change from focusing on maximizing expected reward to minimizing regret has completely changed how I think about everything from pricing strategy to hiring decisions.
Interestingly, this isn’t just an AI concept. In decision theory, humans naturally anticipate regret and adjust their choices to avoid it, even when those choices contradict so-called “optimal” expected-value decisions. Modern AI systems don’t ignore this reality; they formalize and optimize for it.
Table of contents
Maximizing Expected Reward vs. Minimizing Regret
Traditional decision-making is analogous with a common AI objective function: maximize your expected reward. Then, work to build predictive models regarding expected future behavior by customers, competitors and the broader marketplace.
In machine learning parlance, it is akin to supervised learning systems, which depends a lot on historical data. The shortcoming is clear: Models are only as good as their assumptions, and real-world environments are noisy and unpredictable.
Regret minimization is a different approach, akin to reinforcement learning and game-theoretic systems. No longer predicting one right answer, it evaluates its decisions among many possible future states and reduces the distance (measured
This concept is formalized in Counterfactual Regret Minimization (CFR), where algorithms simulate repeated decisions, calculate “regret” for not choosing alternative actions, and iteratively adjust strategies. Over time, these systems converge toward Nash equilibrium, where no participant can significantly improve their outcome by changing strategy alone. Additionally, such a strategy cannot be exploited by any adversary.
Unlike predictive models, regret-minimizing systems are:
- More robust to uncertainty
- Less dependent on accurate forecasting
- Harder to exploit in adversarial environments
That distinction is not theoretical; it has been proven in practice. Systems like Libratus used CFR-based approaches to defeat top human players in imperfect-information games like poker, where uncertainty and hidden information dominate.
Why This Matters in Real-World Decision Systems
Modern AI applications, whether recommendation engines or ad bidding platforms, typically work in environments where outcomes depend on the behavior of other agents. In these situations, optimization on a single predicted outcome is brittle.
Instead, decision systems increasingly rely on iteratively learning from mistakes and updating themselves periodically; feedback loops are used to improve decisions over time.
This is of course directly contextual for business decisions. As such, you do not exist in a vacuum. This makes regret minimization a more effective model than attempting to forecast an ideal result.
How I Apply This Paradigm
Pricing Strategy
A SaaS company I worked with was caught in a pricing war. Their instinct was to react to competitors by lowering prices mirroring a reactive algorithm chasing short-term reward signals.
We reframed the problem using a regret-minimization lens:
“What pricing decision would you regret least across all competitor scenarios?”
This is conceptually similar to robust optimization in AI systems, where models are designed to perform reasonably well across a range of uncertain conditions rather than perfectly under a single assumption.
The conclusion was clear: maintain premium pricing. Their customers had high switching costs, making aggressive price cuts unnecessary. By avoiding reactive decisions, revenue stabilized within a quarter.
Hiring Decisions
Hiring is a classic case of decision-making under uncertainty with limited data, similar to small-sample machine learning problems.
Previously, I tried to “optimize” hires using scoring models, spreadsheets, and structured evaluations. In hindsight, this resembled overfitting, trying to extract certainty from insufficient data.
Now, I apply regret minimization:
“Would I regret hiring this person more, or not hiring them?”
This mirrors decision-making policies in reinforcement learning, where actions are evaluated not only on expected reward but also on long-term outcome stability.
The result is faster decisions, reduced cognitive load, and fewer second-guessing loops.
Negotiation Strategy
A fellow purchaser followed a known categorization during the negotiation that is no different than an unconfounded algorithm, one you can back engineer.
He later became less predictable by introducing variability (mixed strategies) into his approach. This relates square on with game theory, requiring adversarial AI systems to deploy randomization to keep rivals from advantageously tracing patterns.
In the world of AI, this is analogous to stochastic policy selection as a way to enhance long-term performance in adversarial settings. He had 6% lower procurement costs on a per-deal basis, not much, but meaningful at scale.
The Compounding Effect in Iterative Systems
One of the most important lessons from AI systems is that small advantages compound over repeated decisions.
Regret-minimizing systems don’t win every time. They often achieve only marginal improvements, sometimes 53% vs. 47%. But across thousands of iterations, those small edges accumulate into dominant outcomes. This is the same principle behind:
- Reinforcement learning training loops
- Online learning systems
- Continuous optimization in ad platforms
In business, this translates to consistency over perfection. You don’t need every decision to be correct; you need a system that performs slightly better over time.
Information Asymmetry and Signal Tracking
The other principle borrowed from AI systems is continuous model updating based on new data.
Every action on competitor pricing, hiring trends, product launches gives us signals. In machine learning, this means we update our model weights based on new incoming data.
By debugging these signals over time, you reduce uncertainty and improve the quality of your decisions.
I keep a fairly simple quarterly log of competitor activity. Not complicated, but a lightweight data pipeline to strategic insight that helps polish my mental model of the market.
The Real Challenge: Accepting Uncertainty
The hardest part of adopting this framework is not technical; it’s psychological. AI systems operate under uncertainty by design. They don’t wait for perfect information. They iterate, adapt, and improve. Humans tend to do the opposite. We either:
- Over-analyze (seeking perfect data)
- Or rely entirely on intuition
Regret minimization offers a middle ground. It accepts uncertainty and focuses on making decisions that remain defensible across a range of outcomes.
Conclusion
Optimizing for perfection assumes that the future is predictable. This assumption, however, rarely holds for modern AI systems.
In contrast, optimizing for regret recognizes uncertainty and incorporates resilience into decision-making. It matches more closely how sophisticated machine-learning systems work, iterative, adaptive and resilient in the face of changing conditions.
You might not make the right decision every time. But you will make faster, more consistent and eventually better ones over time.” That’s more than a philosophical shift. It’s a systems-level advantage.











