Managing Agents Is the Most Important Skill Nobody's Teaching

Box's CTO on the shift from doing work to managing AI that does work — and why it changes everything.

The most jarring moment at the Berkeley Haas AI Conference came from Ben Kus, CTO of Box. He described how cutting-edge programmers work today, and it sounds nothing like what most people imagine.

The New Morning Routine

A programmer on the bleeding edge of AI-assisted development gets to work and kicks off 10 to 20 agents to do things for them. Then they spend the rest of the day in a completely different mode: evaluating outputs.

"This agent did it wrong — that took thirty minutes. Nope, cancel that, throw that away, revert the branch. Oh, this one's good — continue down this path."

It's not writing code anymore. It's managing a small army of AI workers, each of which can produce code 10-100x faster than a human, but with varying quality and direction. The skill isn't coding. The skill is orchestrating.

Why This Isn't Just About Programming

Ben's argument is that this pattern will expand to every role. If AI agents can already write code, draft contracts, analyze data, create presentations, and respond to customer inquiries — then every knowledge worker will eventually need to manage agents that do their job.

"I almost think of it like management skills. At some point, probably every single person will be managing AI agents. They will have to get them to do work. They'll have to figure out if they did the right thing. They'll have to be responsible for it."

This reframes the entire conversation about AI and jobs. The question isn't "will AI replace my job?" It's "can I effectively manage AI that does my job faster than someone else can manage their AI?"

The Surprising Parallels to People Management

Here's what I found most counterintuitive: managing agents is weirdly similar to managing people.

Give an agent vague, lazy instructions? You get vague, lazy output. Give it detailed context, clear objectives, and sophisticated framing? You get sophisticated results. Ben noted that even using "please" actually changed model behavior — the models pattern-match on the sophistication of the input.

"If you give it really stupid instructions, it'll give you a stupid result, because it's like — it must be the kind of thing I know how to talk to people like you. And so if you give it really sophisticated instructions, it'll give you sophisticated outputs."

The one big difference from people management: micromanagement works. Agents don't get frustrated. They don't quit. They don't gossip about you. You can be incredibly specific, check their work constantly, and demand revisions — and they just keep going. For people who've been told their entire career that micromanagement is toxic, this is a mental shift.

The Skill Gap Is Real

Ben compared the current moment to the early days of Google: "There was a world before Google was well known, and some people who knew Google really well — you'd get them to search stuff for you. Those people were very valuable."

Using AI agents effectively is currently that kind of rare, weird skill. Some people are 10x more productive with agents than others, not because of intelligence, but because of practice and technique.

His advice was blunt: "If you can't get a frontier model to do something you want, you're probably not managing it properly." GPT, Claude, Gemini — the frontier models are genuinely capable at the graduate or PhD level across many topics. If you're getting bad results, the problem is likely your instructions, not the model.

What to Do About It

The uncomfortable truth: there's no class for this yet. No MBA curriculum teaches "agent management." No certification exists. The only way to get good at it is to use agents constantly — for real work, not toy examples.

Some practical starting points:

- **Give agents real tasks.** Not "write me a poem" but "analyze this competitive landscape and identify three positioning opportunities."

Treat bad output as a prompt problem. Before blaming the model, ask: did I give it enough context? Did I specify the format? Did I define what "good" looks like?
Manage multiple agents simultaneously. The power move isn't one agent doing one thing — it's orchestrating many agents in parallel, like Ben described with programmers.
Develop evaluation skills. The bottleneck isn't generation — it's judgment. Can you quickly assess whether an agent's output is good, fixable, or needs to be scrapped?

The people who figure this out in the next 1-2 years will have the kind of career advantage that early internet adopters had in the late 90s. The people who don't will find themselves managed by people who did.