How I write generative AI specs
My template for shipping AI features inside a real product. Program-type taxonomy first, prompt schema second, output schema third, UI flow last. The order matters because if you start at the UI you usually end up with a chatbot bolted onto a settings page.
Most generative AI features I see in B2B SaaS start with the UI. Someone draws a magic wand icon, the team agrees it should generate something, and engineering goes off to wire up the model. Six weeks later there is a button in the corner of the dashboard that produces text nobody trusts.
I have shipped a few AI features now, most recently the AI wizard inside Recruit, and the order I work in is the opposite. I start at the data layer and work my way up to the UI. Here is the template I use.
1. Program-type taxonomy first
Before anything else, I write down the categories of work the feature has to handle. Not personas. Not user stories. The actual buckets of tasks the model has to be good at.
For Recruit's AI wizard, the categories were things like brand ambassador programs, paid affiliate campaigns, UGC casting calls, gifting programs, and content licensing partnerships. Each one needed slightly different inputs from the user, different questions on the page, and different defaults.
Writing this taxonomy down does three things. It surfaces what the model has to handle. It gives me a checklist for testing. And it tells me which examples to put in the prompt.
If I cannot write the taxonomy in plain text on one page, the feature is not ready to scope. The hand-wavy "the AI will figure it out" answer is where bad AI products come from.
2. Prompt schema second
Now I write the actual prompt template. Not in code yet. In a Google Doc with placeholders.
The prompt has to do five things, in this order:
1. Tell the model what role it is playing. 2. Give it the program type from the taxonomy. 3. Inject whatever the user typed. 4. Inject the user's brand context (colors, voice, prior pages). 5. Tell it exactly what shape to return.
I write all five sections out as a single block of text and read it back to myself. If a section is doing more than one job, I split it. If I find myself adding instructions like "be helpful and concise," I delete them. The model is helpful by default. The interesting instructions are the ones the model would not infer on its own.
This is also where I decide which few-shot examples go into the prompt. The examples come from the taxonomy. One per category, ideally.
3. Output schema third
This is the part most generative features skip.
The model has to return data in a shape the UI can render. Free-text is the worst possible output for a B2B feature because every downstream system has to parse it. So I write the output schema before anything else gets built.
For Recruit, the output schema looked something like this:
- page_title (string)
- subtitle (string)
- hero_image_suggestion (string, one of: stock_lifestyle, brand_product, custom_upload)
- form_sections (array of section objects, each with name, fields, order)
- thank_you_message (string)
Once that schema exists, the prompt can tell the model exactly what to produce. The UI can be built against the schema, not against guessed JSON. And the eval suite can test each field independently.
If the model cannot reliably produce the schema, I either simplify the schema or move to a structured output mode where the schema is enforced. The point is that the schema comes before the UI, not after.
4. UI flow last
Only now do I get into the actual screens.
The UI has to do three jobs. Give the user the shortest possible way to express intent. Surface the model's output in a way the user can verify and edit. Make rejection cheap.
The last part is what most AI features get wrong. If the user has to undo a whole wizard and start over every time the model gets it wrong, they stop using the feature. The good version is to make every generated section editable in place, and give the user a way to regenerate just that one section without losing everything else.
For Recruit, the wizard lets the user describe a campaign in one sentence, see the full generated page, and then accept, edit, or regenerate any section independently. The user is never trapped inside the AI flow.
Why this order works
When the spec is built bottom-up (taxonomy, prompt, schema, UI), the engineering team can parallelize. Backend works against the schema. Prompt engineering tunes against the taxonomy. Frontend builds against the schema's shape. Each layer has a contract with the layer above and below it.
When it is built top-down (UI first), every layer has to be retrofitted. The model keeps producing data the UI cannot render. Engineering keeps catching errors and patching prompts. Three months in, the team is rebuilding the schema while pretending it is a fix.
The order is not the only thing that matters. But it is the one part of the process I refuse to compromise on anymore. Get the taxonomy and the schema right and the rest of the spec writes itself.