Robustness changes when systems become agents

April 2026

As machine learning systems move from passive prediction to autonomous decision-making, the notion of robustness begins to change.

A classifier can be robust in the traditional sense, yet still behave unpredictably when placed inside a system that acts, reacts, and evolves over time.

In many settings, robustness is framed as a worst-case problem.

A common formulation is:

\[ \min_{\theta} \; \mathbb{E} \left[ \max_{\delta \in \Delta} L(\pi_{\theta}(s + \delta)) \right] \]

where the model is trained to perform well even under adversarial perturbations.

This formulation works well for static systems.

But in agentic settings, the problem becomes more complex.

The model is no longer producing a single output. It is taking actions that influence future states.

Small changes in the input can propagate through time, interacting with other agents and the environment.

This creates a different type of instability.

Not just incorrect predictions, but cascading effects.

A common way to stabilize such systems is to limit sensitivity.

In practice, this is often done by enforcing a global constraint:

\[ \sup_s \|J_{\theta}(s)\| \leq \gamma \]

which bounds how much the model’s output can change with respect to its input.

This improves stability, but at a cost.

By restricting sensitivity everywhere, the model becomes less expressive.

It reacts less strongly not only to harmful perturbations, but also to meaningful changes in the environment.

This trade-off is often treated as unavoidable.

More robustness implies less flexibility.

However, in agentic systems, this assumption may be too pessimistic.

Adversarial effects do not occur in all directions equally.

They emerge along specific trajectories, driven by the interaction between the model and its environment.

Instead of constraining the system globally, it is possible to control sensitivity only along these critical directions.

That is, instead of limiting:

\[ \|J_{\theta}(s)\| \]

everywhere, we focus on:

\[ \|J_{\theta}(s + \delta_t)u_t\| \]

where \(u_t\) represents the direction in which the system is actually being pushed.

This shift is small in formulation, but significant in effect.

It allows the system to remain expressive in most directions, while still being stable where it matters.

In other words, robustness does not need to suppress behavior everywhere.

It only needs to suppress behavior where instability emerges.

This perspective changes how we think about robust training.

The goal is no longer to make the system uniformly insensitive, but to make it selectively stable.

As systems become more agentic, this distinction becomes increasingly important.

Because failures are no longer isolated.

They propagate.

This idea is explored more formally in Robustness of Agentic AI Systems via Adversarially-Aligned Jacobian Regularization, where we show that directional constraints can improve stability without sacrificing expressivity.