Building beyond my skills with an LLM

2024-10-24 in Posts

I build things beyond my skills all the time. That’s how I get better at stuff. This time I added “working with LLMs” to the set of skills to improve.

I had a nebulous idea—it’s difficult to express how slippery it seemed to me at the time—for something I wanted to do in my Elixir app. I thought I’d like to be able to validate (and maybe later build) request bodies for a specific API, which can contain cascades of nested configuration options. I had access to a JSON OpenAPI spec from docs.machines.dev, and ckreiling’s lovely, tidy Req-powered client for the Fly Machines API.

I didn’t know where to start. I was missing a variety of expertise. Whole abstract patterns. Best practices. Common tools.

The one fixed parameter was that I was going to try to lean hard on an LLM for help. I started a subscription to Claude Pro, so I could ask lots of dumb questions.

Defining the task

I asked Claude some very general questions about things that I might want to do.

Here’s my biggest bang-for-buck prompt:

I've got a library that generates API requests. But it doesn't validate with any schema or type for the parameters. What kind of tool should I have for that

Claude gave me an example of an Ecto schema with a changeset and some code to check parameters against the changeset. This is a strong signal, if you trust the judgement of the clue-giver!

It also suggested ExJsonSchema, Vex, Dashbit’s Nimble Options, or writing a custom validation module.

In the same session, I asked about Elixir tools for working with an OpenAPI spec and got back OpenApiSpex, OpenAPI Generator (or did it mean this one?), ExOpenApi (which I couldn’t find), and Swagger Codegen.

An embarrassment of riches! The hard fact is that I had to do some reading and testing to sort out what I wanted from here. I could have asked Claude for more guidance, and I’ll explore that in the future.

Once the dust had settled, I’d decided to use Ecto schemas and changesets to hold the rules my request bodies have to obey.

Notes on some of the options for my project:

Ecto embedded schemas provide validation through changesets, using a library already present in a lot of projects. Nested schemas seem to work robustly. You do need something to generate the Ecto schemas and changeset definitions from the JSON spec. This option was the winner.

Nimble Options needs schemas to come from somewhere, too, but it does validation and handles nested options without depending on Ecto. I didn’t follow this very far, but I’m very curious to play with it more, given its light weight and its documentation possibilities.

OpenApiSpex has validation functionality and can import an existing schema file, but it didn’t immediately fill in all of the nested schemas, and I moved on in my explorations. I would look at this lib again if I were writing my own API.

The OpenAPI Generator for Elixir doesn’t seem, for one thing, to support allOf, which I needed. I don’t need a client library so I’d probably have gone with a simpler solution in any case.

For existing schemas in the form of structs or maps, Ecto schemaless changesets look like a relatively painless way to add validation if you’re already familiar with changesets and have Ecto in the project.

Seeing how much Claude will build for me

The problem was now much better defined: I wanted to write a tool to read the spec.json file for the API, and spit out all the Ecto.Schema modules I needed, complete with changesets.

I still didn’t know how to do it. I didn’t properly understand Ecto changesets, so I didn’t understand how to use them without a database. I hadn’t read through the JSON spec for the API I want to interact with (though I’m very familiar with parts of the API as a user) and hadn’t thought about the OpenAPI or Swagger API specification as a format.

Instead of going and figuring all those things out, I uploaded the spec.json file into the project and asked Claude to just write me the whole Elixir module. I’d test it and report back with the latest compilation error or problem with its output. That’s basically all I did until I noticed that fixing one thing would lose the previous fix. It was time to start helping Claude steer.

Taking my bearings

So I wasn’t going to reach a working product just by prompting an LLM, and then read it to learn how it worked. :P

But now there was a buggy version of the whole tool, all roughed in.

Normally, I’d start with some minimal testable function and build out toward the bigger picture. But by returning to natural language, I’d been able to scope the project even while large areas of my mental map of the implementation were blank. An initial scope was not only suggested, but planted and taking root in my editor.

I suspect this will prove a powerful tool to prevent rabbit-holing, to which I’m susceptible when I’m uncertain which parts of the map I should fill in. Sure, this mode of work will have its own kinds of tangents to fly off on. But the scoping effect is exciting.

Drilling down

At this point it was time to consolidate my understanding and start driving what the individual functions do.

I’d been learning as I worked with Claude’s suggestions and so was better equipped to think critically about the different things the module should do and how to do them.

This was much more straightforward work. The LLM was still a great help to figure out things that I couldn’t immediately glean from documentation or a Google search for blog posts or forum discussions.

In this phase it can be handy to keep a tab open with ChatGPT or Gemini for quick second opinions.

So…success?

Now I have a Mix task that reads the spec.json for the Fly Machines API and spits out Ecto schema modules. It’s not pretty, and I bet there are a bunch of edge cases I haven’t accounted for, but through the process I’ve learned enough to make it good, if I want to. I’m now doing a little bit of validation in another project using these schemas.

You can read more about the end product, how it works, and its foibles at GitHub: clutterstack/openapi-json-to-ecto.

I could certainly have muddled my way to this point without an LLM, and I certainly muddled my way to this point with one too. I did all the things in a different order from usual. It’s pretty hard to compare.

I did enjoy having a wad of flawed code that gave me a shape and scope to work with.

The goalposts will keep moving, both on the side of the models and tools available and in my own brain, but I think if there’s one rule of thumb to keep in mind for “next time” it’s “as soon as you make any progress, consciously consolidate your understanding of what you’re doing.”

Which makes sense in most contexts.

Appendix 1: An algorithm I may follow “next time”:

Start by spending a bit of time prompting at the most general level, if there’s any uncertainty about the goal or the broadest direction.
Ask for a whole program to do something. Test it out, ask followup questions to fix unexpected results and errors, for a bit.
When the new program runs without errors, and its behaviour is anything like the right thing, STOP and figure out everything it’s doing.
If there’s no obvious architectural change to make, start prompting for possible approaches to parts of the program. Instead of the whole program, start new chats with just a function or two, some input (or a description thereof), what you wanted it to do, and the error or unexpected result.

Appendix 2: Further notes about using Claude in October 2024:

Even when working with projects, start new chats when a significant part of the history in the current one isn’t needed in the context anymore. Claude warned me (Oct 2024) that I was sending the whole chat back and forth every time, burning my allowance. This very handy info was pushed to me through the UI, and is also easy to find on Anthropic’s website. Colour me impressed with Anthropic’s cohesive content and documentation strategy.
You can get a helpful, correct suggestion with one prompt, and then a weird, wrong suggestion with a very similar one. When this happened to me, I hadn’t known the good approach existed beforehand. So if I’d seen the bad approach first, I might have tried to go with that.
As others have observed, once a program does enough stuff, you can ask Claude to fix one behaviour and it’ll cause a regression in another one you just fixed. Then it’s probably time to work with a smaller chunk of code, not least to ensure that unrelated working code doesn’t get changed capriciously.

Posts index