Lucky Blog · AI Report

Agents at Work

How AI Agents Actually Replace WorkReal Workflows from the Office and the Field

Klarna's AI did the work of 700 people, Harvey compressed two weeks of a lawyer's labor into a day, and a hospital's AI cut physician burnout by 13 percentage points. Yet a year later Klarna was hiring people back. Draw real corporate cases out as step-by-step workflows, and you can see exactly what an agent takes and where it stops.

Published 2026·06·12 · 12 min read · by Lucky Blog Editorial

Thesis

Not the 'Job,' but One 'Layer' of It

"AI eliminates jobs" is far too blunt a sentence. Look closely at how it actually gets deployed, and an agent rarely swallows a person's whole job. Instead, of the several layers that make up that job, it absorbs one specific layer — the repetitive, standardized processing of information. The support rep's "check the order status," the lawyer's "first draft," the developer's "simple migration," the doctor's "charting" are all that kind of layer.

So the outcome splits by role. Where standardized processing is close to the whole job (frontline support, simple document work), headcount shrinks. Where standardized processing is only part of the job (strategy, disputes, in-person care), people move up to the layer above it. This piece draws four real cases as workflows, showing at a glance which stage hands off to the agent and which stage the human keeps. And what happens when you draw that boundary wrong.

Support · Klarna

700 people's work

2.3M chats/mo · 11 min → 2 min

Legal · Harvey

2 weeks → a day

Deposition summaries · contract review 2 days→2 hours

Engineering · Stripe

Months → days

50-million-line Ruby migration

Healthcare · Abridge

Burnout -13pp

150+ providers · automated charting

Common pattern

A layer replaced

Routine info-processing → agent

What stays

Judgment · relationships · accountability

People move up a layer

Case · 01 · Customer Support

Klarna — Replacing 700 People, Then Calling Humans Back

In February 2024 the fintech Klarna switched on an OpenAI-based AI support agent worldwide. In the first month alone it handled 2.3 million chats — work the company said was equivalent to roughly 700 full-time agents. Resolution time dropped from 11 minutes to 2 minutes, repeat inquiries fell by 25%, and customer satisfaction matched human levels. The company projected a $40 million profit improvement in 2024 alone (against a build cost of $2-3 million). Total headcount fell from about 5,000 to 3,500 (mostly through natural attrition).

Seen as a workflow, the frontline-support role was relocated wholesale.

Customer Inquiry → Resolution (Klarna)

AGENT

① Intake & intent — Chat comes in, language is detected (35 languages), and the question is classified.

AGENT

② Lookup & action — Queries the order/payment systems directly and executes actions like refunds and rescheduling. "Where's my order?" and "When's my payment due?" end right here.

AGENT

③ Reply & close — Responds and closes within 2 minutes. Roughly two-thirds of all cases wrap up at this line.

HUMAN

④ Escalation — Complex disputes, fraud claims, and hardship (financial distress) cases are handed to a person. The layer where emotion, exceptions, and accountability are on the line.

That's the half that gets quoted everywhere. The other half matters more. In May 2025, CEO Sebastian Siemiatkowski publicly admitted the company "cut too deep" on people and reopened hiring for premium support staff. Over six months customer satisfaction had slipped: on simple inquiries the AI matched humans, but on complex disputes and hardship cases its resolution quality was noticeably lower. In the end, layer ④ of the workflow above turned out to be thicker than expected.

What Klarna proved was not "AI replaces 700 people," but a more precise proposition: "the frontline-support layer gets replaced, but the layer above it still belongs to humans." — The lesson of Case 01

Case · 02 · Legal

Harvey — Turning a Legion of Junior Lawyers' Two Weeks Into a Day

The legal AI Harvey took hold fast in law, where standardized document labor makes up a large share of the job. The asset manager Bridgewater saw more than 95% time savings on large-scale contract review, cutting vendor-contract review from an average of 2 days to 2 hours. The firm A&O Shearman rolled it out across the company to 4,000 people in 43 jurisdictions, saving 2-3 hours a week and trimming contract-review time by 30%. The most striking figure is on the litigation side. In one matter, deposition summaries and theme analysis that several junior lawyers would take two weeks to do were finished in under a day.

Litigation Discovery Workflow (Harvey)

AGENT

① Collect & classify — Reads thousands of pages of testimony, contracts, and emails and tags them by issue.

AGENT

② Summarize & draft — Generates deposition summaries, theme analysis, first-pass memos, and contract-review drafts. Even attaches source citations.

HUMAN

③ Verify & strategize — The lawyer checks citations, stress-tests weaknesses, and builds deposition strategy. The accountability layer that catches AI hallucinations and misquotes.

HUMAN

④ Judgment & relationships — Client counsel, negotiation, courtroom advocacy. The final layer where credentials and trust are staked.

What's worth noting is the place of the junior lawyer. At one firm (Lynn Pinker), associates reported that instead of spending time on first drafts and bulk document review, they engage in case strategy earlier and more deeply — stress-testing arguments and preparing depositions. As layers ① and ② moved to the agent, people were pushed up into layers ③ and ④. The work didn't disappear; its center of gravity shifted upward.

Case · 03 · Engineering

Stripe · TELUS — Where the Developer Becomes a 'Conductor'

Coding is the area agents fought over first, because grading the answer is automatic (tests, compilation). The payments company Stripe compressed the migration of a 50-million-line Ruby codebase from the months it was scheduled to take down to days (per Anthropic). The telecom TELUS adopted agentic coding tools internally and reported shipping engineering code 30% faster, saving a cumulative 500,000 hours and more.

Large-Scale Code Migration (Stripe-style)

HUMAN

① Set goals & constraints — "Move this codebase to the new framework. Preserve behavior 100%." The engineer defines the what and the why.

AGENT

② Decompose & run in parallel — A conductor agent splits the work into hundreds of branches and distributes them to sub-agents. Each one edits files and runs tests.

AGENT

③ Self-correct — Repeats the test-fail → diagnose-error → re-fix loop without a human. The layer where pure repetitive labor evaporates entirely.

HUMAN

④ Review, merge & own it — The engineer inspects the changes, catches subtle regressions, and takes responsibility for the production rollout.

The pattern Anthropic's agentic-coding report observed is two-sided. Time spent per task fell (automation), but output per person rose by far more (amplification). It means the same headcount produces more, which feeds both the fear that "juniors disappear" and the optimism that "the engineer is promoted from code author to the conductor of an agent legion." But strip out layer ④ — review and accountability — and the agent becomes a machine for mass-producing plausible errors at speed.

Case · 04 · The Field

Abridge — In the Field, Not 'Replacement' but 'Backing Up'

Outside the office, on the ground where people collide with each other directly, the picture differs. Medicine is the prime example. The AI ambient scribe in the exam room (Abridge and others) doesn't replace the doctor. It listens to the conversation the doctor has with the patient and automatically writes the chart and visit notes. It peels off only the most draining administrative layer of the doctor's work.

Visit → Documentation Workflow (Abridge-style)

HUMAN

① See & talk — The doctor sees, listens to, and diagnoses the patient. Empathy, physical exam, clinical judgment — the irreplaceable layer.

AGENT

② Listen & record — Listens to the conversation in real time and auto-drafts the SOAP note and chart. Lets the doctor look at the patient instead of the screen.

HUMAN

③ Sign & finalize — The doctor reviews, edits, and signs the note. The accuracy of the medical record and the legal accountability stay with the human.

The numbers are clear. Abridge has signed contracts with more than 150 healthcare providers, and a study of 1,800 clinicians across 5 academic centers found savings of 16 min on documentation and 13 min on the electronic record per 8-hour clinical day. In a study of 263 physicians, burnout fell from 51.9% to 38.8% in 30 days, and at St. Luke's after-hours documentation dropped 35% while face time with patients rose 15%.

The Pattern

The Boundary of Replacement, on One Page

Put the four cases on the same table and the boundary between the layer the agent took and the layer people kept comes into sharp relief.

Role	Layer the agent took	Layer the human kept	Measured
Customer support	Lookup · simple resolution · frontline reply	Disputes · fraud · hardship · empathy	700 people's work · 11→2 min
Legal	Doc review · summary · first draft	Strategy · stress-testing · advocacy · accountability	2 weeks → a day
Engineering	Migration · repetitive fixes	Design · review · merge · accountability	Months→days · +30%
Healthcare (field)	Chart · note writing (admin)	Diagnosis · exam · empathy · signature	Burnout 51.9→38.8%

Sources: each company's announcements and studies (notes below). The common thread is sharp. The agent takes the "standardized, repetitive, information-processing" layer and leaves the "judgment, exception, relationship, accountability" layer to the human. Which roles shed headcount comes down to how large the front layer's share is in that role.

Bottom Line

So What Happens to My Job

The most accurate name for today's agents is not "replacement" but "a capable but supervision-hungry, infinitely scalable legion of juniors." That legion devours the routine-labor layer fast and pushes people up into the layer above — judgment, relationships, accountability. The good news and the bad news come from the same fact. Roles where the front layer was most of the job lose headcount, and for anyone ready to climb to the upper layer, a lever appears that lets one person conduct the work of ten.

The practical lessons are three. First, drawing the boundary wrong is expensive — Klarna underestimated layer ④ and ended up calling people back. Second, the review layer is non-negotiable — Harvey, Stripe, and Abridge all leave the human's signature, merge, and verification for the end. Third, value moves upward — what disappears is the first draft, not the advocacy; the charting, not the diagnosis. The career strategy of the agent era is simple. What can you do on top of the layer the agent eats.

References · Sources

Klarna, "Klarna AI assistant handles two-thirds of customer service chats in its first month" (2024.02) — klarna.com (2.3 million chats · 700 people · 11→2 min · -25% · $40M, primary)
CX Dive, "Klarna changes its AI tune and again recruits humans for customer service" (2025) — customerexperiencedive.com (rehiring · premium support)
Harvey, "How Harvey Saves Lawyers Time" / customer cases — harvey.ai (Bridgewater 2 days→2 hours · A&O Shearman 4,000 people/43 jurisdictions · deposition summaries 2 weeks→a day)
Anthropic, "Claude Fable 5 / Opus 4.8" announcement — Stripe 50-million-line migration (months→days), primary
Anthropic, "2026 Agentic Coding Trends Report" — TELUS +30% · 500,000 hours, time per task↓, output↑ — resources.anthropic.com
Clinical studies (1,800 clinicians/5 academic centers; 263 physicians/6 hospitals, burnout 51.9→38.8%) and Abridge's 150+ provider adoption — medical and industry press, compiled
STAT, "Large AI scribe study finds modest time savings, inconsistent use" (2026.04) — statnews.com (balancing data)
St. Luke's · AMA report (after-hours documentation -35% · face time +15% · roughly 15,000 hours saved) — healthcare innovation press, compiled

Disclaimer · This article is an analysis for informational purposes, compiling public announcements from the companies and institutions involved together with press and academic reporting; it does not recommend any specific adoption, investment, or hiring decision. Productivity and savings figures reflect the conditions of the company or study in question and do not generalize to every organization. Some figures are self-measured by the announcing party. The workflow diagrams are generalized reconstructions based on publicly available cases. Last updated: 2026.06.12.

Not the 'Job,' but One 'Layer' of It

Klarna — Replacing 700 People, Then Calling Humans Back

Harvey — Turning a Legion of Junior Lawyers' Two Weeks Into a Day

Stripe · TELUS — Where the Developer Becomes a 'Conductor'

Abridge — In the Field, Not 'Replacement' but 'Backing Up'

Balance — It's No "Miracle"

The Boundary of Replacement, on One Page

So What Happens to My Job

References · Sources