Skip to main content
GitHub

Latest entries

Claude got scarily good at faking deterministic output

2025-03-12 in Posts

The other day, I accidentally got Claude 3.7 Sonnet to generate a thousand-line, Req-based Elixir API client from the OpenAPI spec for Fly Machines.

I was exploring the options for structuring such a module, more out of curiosity than any urgent need, and asked Claude on a whim if it could produce functions for all the endpoints. It did. Forty-odd functions, complete with typespecs and detailed docstrings. It hit its message-length limit and I had to ask it to carry on in a new message. It picked up and finished without incident.

The output looks solid. It looks consistent. Claude doesn’t seem to have so much as supplemented the request parameters with a popular one from some other API.

I was really impressed that Claude sustained such an uncanny impersonation of a dumb but reliable Python script for that long—despite the Vaudeville hook dragging it offstage partway through the act.

Also: this is terrible. Now I can sneeze and a nondeterministic black box barfs out a thousand lines of independent functions and docs oozing respectability and discipline? I’m going to be so tempted to trust this!

If I’d planned an exhaustive client for this API, I’d have asked my machine buddy for help writing a boring Python script (or Mix task) to convert the OpenAPI spec directly, reproducibly, and with a lot less electricity.

I should note that the docstrings I ended up with are richer than what’s in the spec. Some of the example material identifiably originated in Fly.io developer docs. So, there’s that.

Now that I have all that code

This may be the first time an LLM tool has bitten off more than I can chew without losing its own bearings. The vibes were practically telepathic.

I don’t approve of using an LLM for tasks that need repeatability and can be done in seconds on a CPU. I can’t be sure there are no surprises until I’ve checked every function, and if I ask again, there’s no guarantee any deficiencies will be the same ones.

But here’s the module, fully formed. What to do?

I do want to talk to a subset of this API from my Elixir apps, and this gives me a starting point for incorporating the schema validation I’ve been playing with.

I could “ship” it and fix it if I ever find out it’s broken—the stakes couldn’t be lower. But all those untried functions, all that unexamined documentation in my module—it was stressing me out.

I took the easy way out: the reusable moving parts and a couple endpoint functions go into a module and the rest get banished into a slush file outside my project. I don’t have to look at them, but I can grab them as needed, if they ever are. Instant relief.

More things I think about LLMs, March 2025 edition

As always, the quality of the output of an LLM chat tool is entangled with the quality of my side of the conversation. It’s all moving targets and YMMV.

Posts index