The AI Upgrade Tax

WHAT YOU'LL LEARN

  • Why frontier model updates ripple into every legal AI tool built on top of them

  • How to think about the time AI actually frees up and whether you're using it well

  • Why visibility and control matter more than capability when choosing tools for a legal function

I am sure you didn’t miss Claude’s release of legal plugins

Between the social media coverage and the webinars, it has been hard to escape. Now, I love technology and get jazzed when new features are released. But, when frontier models release new features, the affects don't stop at the frontier models. They ripple into every legal tech AI tool built on top of them. 

So when Claude or ChatGPT releases new functionality, it’s only a matter of sort time until Harvey, Legora, Wordsmith, et al have it, too. That sounds awesome, but can be painful in practice.

How the Upgrade Tax Has Been Applied to My Wordsmith Setup

I use Wordsmith to connect to Slack since that is where my business works. Wordsmith pulls from Google Drive and Notion for context. It sits on top of the same underlying models powering everything else. So when the frontier shifts, Wordsmith shifts too.

So what does that look like in practice and how frustrating is it? Not that this is a leading question…

Wordsmith is connected to my #legal Slack channel. I have three repositories plugged into that channel, which are like projects where you connect to folders and documents and have custom instructions. So, I have: (1) general (default repo), (2) customer contracts and (3) corporate info, all plugged into the #legal Slack channel.

Originally, you asked a question in #legal Slack channel and then picked from the drop-down which repo you wanted to answer your question. As frontier models evolved, Wordsmith introduced autorouting, removed the drop-down, so that the question asker only had to ask the question. With that came a change in behavior, where the users had to tag Wordsmith in their question. Overall, it’s a small change in behavior and reduction of friction. No problem.

Then Wordsmith released skills, so then I needed to figure out what skills I wanted to apply to which repositories. A skill is a static configuration. You point it at a repository, give it instructions, and it waits to be called. For example I created a Labor and Employment Law Scanner skill. I triggered that skill with a Slack workflow so every Monday morning I have a customized regulatory update in the #legal Slack channel awaiting for me. Note that skills are transferable between Wordsmith, Claude and other AI tools.

Then Wordsmith launched agents. An agent is a decision-maker. Instead of you choosing which repository to use, the agent reads the question and decides which repository to route it to. It can also do things proactively like running my weekly Labor and Employment Law Scanner without anyone asking it to.

So now I have autorouting Slack channels with repositories triggering certain skills. Cool. It works. The business loves it.

So, I decided level up my game and to try out agents. So I thought why not turn my Labor and Employment Law Scanner into an agent and get rid of the Slack automated workflow. I connected it to my #legal Slack channel.

Then the answers in my #legal Slack channel got weird and were wrong. Now, there was no error message. No obvious failure. Just outputs that weren't quite right, routing that wasn't quite landing, custom instructions that weren't quite firing. The kind of broken that takes a minute to recognize because nothing is visibly on fire. I went back to the Wordsmith team and worked through it together.

That is when I noticed that the Labor and Employment Law Scanner agent was answering ALL the questions!

So the agent was like - hold my beer, I got this!

So I had to disconnect it.



This is the upgrade tax.

You are not saving time. You are reallocating it.

Every time an AI platform ships something new -- agents, plugins, updated connection types, auto-routing -- you pay in attention and rebuild time. Not because you did anything wrong. Because the platform changed under you. New integrations, updated behavior. If you have something running, plan to revisit it.

I am spending more time on operations than I expected. Building. Maintaining. Diagnosing things that worked last week and don't work this week.

But I have that time. Because AI is handling so much of the substantive work, I have genuine capacity for the operational layer. The time that used to go into first drafts, initial contract reviews, and standard research is now available for architecture decisions and maintenance.

AI is changing how we work and where we spend our time.

The question worth sitting with is whether you are reallocating it well. Not all maintenance is investment. Some is just churn -- reactive fixes for decisions a vendor made that didn't account for your specific configuration. Intentional builds are worth the time. Reactive fixes are tax.

What the architecture actually looks like

Working through all of this has pushed me to think more clearly about structure.

I see my AI stack now as a tiered system. At the top: company-wide context. Who we are, key roles, the rules that apply in every setting. This layer doesn't change much. It is the foundation.

Below that, it breaks out by tool, task, and communication channel.

For me: I primarily build in Wordsmith. The communication happens in Slack where I have multiple Slack channels connected to Wordsmith. For example, the #legal Slack channel has its own internal structure -- repositories for customer contracts, corporate documents, and general/default, plus a dedicated skill that runs a weekly labor and employment regulation scan triggered by a Slack workflow. The general/default repository in the channel acts as orchestrator. It routes to those internal repos and redirects out to a set of dedicated channels for higher-volume or ongoing work: worker agreements, worker classification, due diligence and compliance, NDA automation, and our US employer of record channel (all individual Slack channels connected to Wordsmith).

Thinking about AI behavior

I wanted a single top-level layer. Something all my repos and agents would inherit from automatically. A true default that governed everything. Wordsmith has memories, which come close. But memories aren't always tapped into. They don't fire reliably in every context across all AI tools.

So I repeat the high-level behavioral instructions and context in every repository. For example, all the custom instructions include:

  • No tables in Slack because they don't render.

  • Don't guess.

  • Be short and direct.

  • Don't preemptively draft unless the user asks.

  • Keep answers short, concise and in plain English for non-legal team members.

These aren't substantive instructions. They're behavioral. I want a consistent behavior, so has to be present everywhere you want it to apply. When you want to change a behavioral rule, you update it across every repository that holds it. I assume this will soon be fixed, but until then…

Visibility is the deciding factor

Claude and other tools could do the same thing as Wordsmith. So, I could build parts of this stack in something other than Wordsmith. Claude with a Slack integration. Google's ecosystem, since we have Gemini. Our CTO suggested a combination of Gems and NotebookLM. All technically viable.

But I keep coming back to Wordsmith. Not because it wins on every dimension, but because I can see what it's doing. I can see which documents are connected, what the custom instructions say, how the agent routes a query. I can test in Slack, make a change, and test again. The whole system is visible to me end to end. And it auto-connects to Slack without an additional setup. Plus, there is a the additional security and compliance layer that gives me confidence.

With other tools, that control layer is less visible. The security settings are different. For a legal function, that gap matters more than the feature gap.

The same logic applies to vibe-coded solutions. I can build something impressive in an afternoon. But a working prototype is not a deployable system. The distance between the two is exactly where unseen problems live.

Visibility is a precondition for control. Control is a precondition for using any of this seriously in a legal function.

Two things I'd tell someone starting this now

  1. Budget rebuild time the same way you budget build time. Upgrades will come. Plan for them.

  2. Separate your behavioral instructions from your substantive instructions. The behavioral layer -- how you want the AI to respond, not just what you want it to know -- needs to live in every repository. It will not carry over automatically.

The architecture is the work. Not the prompts. Not the individual tools. The system you build around them requires your human thought - the rest AI can do for you.

Speak soon.

The full architecture, the custom instructions template, and a breakdown of how I think about the behavioral vs. substantive split across a multi-repo Slack setup are in The Field Guide.


Next
Next

The 12 Claude Legal Plugins Explained