Finance Got Lucky. Legal Didn't.
Field Note #3
What you’ll learn:
Why finance can verify AI outputs and legal can't
How I'm building interactive visualizations with Claude Code and Vercel
Where I am in the legal AI tool evaluation process, and what's next
AI for Everyone!
Wednesday morning, I overheard our VP of Finance talking about how he was using Claude Code and Codex to build visualization dashboards for finance. Of course I walked over and joined the conversation. I asked what works better, terminal versus Claude Code versus Codex, which model handles what. One of the amazing aspects of AI is that all that it requires is curiosity and experimentation.
Then I asked the question I always ask when someone tells me they're relying on AI for something with real consequences.
How do you know it's not hallucinating?
Finance got lucky.
His answer was immediate. He knows what the numbers should be. At the end, if they match, it's right. If they don't, something's wrong.
I said: so it's binary. Like coding. Either it works or it doesn't.
Legal doesn't have that.
Finance operates inside a closed system. Numbers reconcile. If AI produces something wrong, it breaks the math — just like a bug in code breaks a build. You can ground it in truth that exists completely independently of what the AI told you. The verification mechanism is built into the domain itself.
Legal works in the gray. When legal analysis IS binary — is this a contract or not, is this clause standard or not — we're usually not the ones spending time on it. We spend time on the areas that are genuinely contested. The judgment calls. Which is exactly where AI is hardest to verify.
I had a live example of this recently. I was testing two legal AI tools against each other on sanctions screening — entities operating across the US, UK, and Europe. The outright banned countries were easy. Both tools handled it. But as the questions got more nuanced — is this a KYC/AML issue, a trade issue, a Cuba/USD issue, do you only need to screen individuals from those countries - the tools started diverging. Getting more confident. Getting less reliable.
I resolved this by running both outputs against one another, had them critique each other, and applied my own judgment to what came back. That works. But it's slow, and it requires me to already know enough to evaluate what I'm reading. The bottleneck hasn't moved. It's just faster now.
Building with Claude Code and Vercel
Worksome has an all-hands coming up where every department will show how they're using AI, followed by training and upskilling. We're trying to move from enthusiastic individuals picking up tools to something more systematic. Function-level design. Which is exactly the conversation I've been having about legal.
In the meantime, I've been doing my own building.
Labor and employment law across multiple jurisdictions is complicated. Worker classification even more so. When I need to communicate how these things work — workflows, decision trees, layers of nuance across different legal frameworks — I used to build it in Canva. Static slides. Fine, but it doesn't scale.
This week I connected GitHub, Vercel, and Claude together and started working entirely in terminal — using natural language, automatically pushing updates live without leaving the environment. The output is interactive, functional, and communicates complex legal concepts in a way a static slide never could.
Because I can’t share our internal information, I created an example - Find Your Legal Tech Stack — a short quiz that points you toward the tools most likely to fit your use case. Try it. Tell me what you think.
Where we are on the legal AI tool evaluation
I've now spoken to Ivo, Harvey, Legora, Wordsmith, GC AI, TermScout, Luminance, and Sandstone. I'm also actively using ChatGPT and Claude as part of the evaluation.
Luminance is interesting because I inherited it. I used it about 14 months ago and didn't love the experience. I agreed to take another look because they've repositioned — they don't want to be seen as a legacy CLM, they want to be considered a full AI platform. My conclusion: I'm not moving forward with it because I can get what it offers from tools I already have.
Which is the broader point about tool evaluation. Your tech stack is the starting point. Look at where it falls short and find tools that fill those specific gaps. Don't buy a platform when you need a plugin. And given how fast this space is moving — do not sign anything for more than a year. Monthly subscriptions where possible. You need room to change your mind.
The full evaluation report is coming for these ten tools. If you want early access and the chance to ask questions directly, that's what the Field Guide is for.
Speak soon!
And enjoy the week!
If you have questions or want to follow up on anything, hit reply.
Want the Field Notes to hit your inbox each week? Subscribe here.
Want to dive deeper? Become a member of The Field Guide.