Not Every AI Task Deserves Your Best Model
The Reflex to Use the Best
When you're building with large language models, there's a natural gravitational pull toward using the most capable model for everything. It feels safe. The reasoning is simple โ why risk a worse result when the price difference per call is just a few cents?
But those cents compound. When your system makes six to ten AI calls per user interaction, and each user onboards a business that generates a batch of posts, you start to see the bill climb. We were running every single Claude call through Sonnet โ our most capable model โ and it was working, but we were paying for horsepower we didn't need on half the tasks.
Sorting Tasks by Intelligence Required
The breakthrough wasn't technical. It was organizational. We sat down and categorized every AI call in our pipeline by the type of thinking it required.
Some tasks are genuinely creative. Writing Instagram captions that sound natural in Bahasa Indonesia, that weave a product into an influencer's real lifestyle, that hit the right emotional tone โ that's hard. The difference between a good caption and a great one is the difference between engagement and scrolling past. These tasks earn their keep on a capable model.
But other tasks are structured extraction. Pulling a color palette from CSS hex codes and a brand description? Scoring influencers against a checklist of criteria and returning a JSON array? Generating a content calendar with dates, times, and slot assignments? These are tasks with clear inputs, clear outputs, and well-defined rubrics. A smaller, faster model handles them just as well.
The Two-Tier Approach
We added a simple model selector to our AI wrapper โ every call can now specify whether it needs the full Sonnet model or the lighter Haiku model. No complex routing logic, no A/B testing framework. Just a parameter.
Six calls moved to Haiku: brand color analysis, influencer-brand matching, content calendar generation, mood board style analysis, post quality scoring, and carousel slide text generation. Three calls stayed on Sonnet: business profile analysis (the core intelligence extraction), caption writing (user-facing creative work), and image prompt engineering (where prompt quality directly determines visual output).
The decision framework was straightforward. If the output is user-facing creative text, keep it on Sonnet. If it's structured data extraction, scoring, or planning โ switch to Haiku.
Here's exactly how every AI call in our pipeline maps out:
| AI Task | Model | Why |
|---|---|---|
| Business profile analysis | Sonnet | Core intelligence โ extracts products, keywords, trends from raw website data |
| Caption + hashtag writing | Sonnet | User-facing creative text in Bahasa Indonesia / English |
| Image prompt engineering | Sonnet | Prompt quality directly determines visual output |
| Brand color analysis | Haiku | Structured extraction from CSS hex codes |
| Influencer-brand matching | Haiku | Scoring against a defined rubric |
| Content calendar generation | Haiku | Slot-filling with dates, times, categories |
| Mood board style analysis | Haiku | Style classification with clear criteria |
| Post quality scoring | Haiku | Numerical scoring against a checklist |
| Carousel slide text | Haiku | Short overlay text, structured 6-slide format |
The Full Cost Picture
Before optimizing, here's what every business onboarding actually costs us โ every AI call, every image generation, broken down step by step.
Onboarding Cost (per business)
| Step | Model | Vision? | Before (Sonnet) | After (Haiku) |
|---|---|---|---|---|
| Business analysis | Claude Sonnet | Yes (if images) | ~$0.02โ0.05 | ~$0.02โ0.05 (kept on Sonnet) |
| Theme analysis | Claude Sonnet โ Haiku | No | ~$0.01 | ~$0.003 |
| Mood board analysis | Claude Sonnet โ Haiku | Yes (conditional) | ~$0.01โ0.03 | ~$0.003โ0.008 |
| Product analysis | Claude Sonnet | No | ~$0.01 | ~$0.01 (kept on Sonnet) |
| Influencer matching | Claude Sonnet โ Haiku | No | ~$0.01 | ~$0.003 |
| Calendar generation | Claude Sonnet โ Haiku | No | ~$0.01 | ~$0.003 |
| Vibe images (ร3) | fal flux/dev | No | ~$0.075 | ~$0.075 (unchanged) |
| Total onboarding | ~$0.12โ0.18 | ~$0.08โ0.12 |
Post Generation Cost (per batch of ~6 posts)
| Step | Model | Before | After |
|---|---|---|---|
| Captions + hashtags (1 call) | Claude Sonnet | ~$0.02 | ~$0.02 (kept on Sonnet) |
| Quality scoring (1 call) | Claude Sonnet โ Haiku | ~$0.01 | ~$0.003 |
| Image gen ร6 (flux/dev) | fal.ai | ~$0.15 | ~$0.15 (unchanged) |
| Face gen (flux-pulid, if UGC) | fal.ai | ~$0.04/each | ~$0.04/each (unchanged) |
| Product composite (flux-general) | fal.ai | ~$0.035/each | ~$0.035/each (unchanged) |
| Background removal (birefnet) | fal.ai | ~$0.01/each | ~$0.01/each (unchanged) |
| Total ~6 posts | ~$0.20โ0.40 | ~$0.19โ0.39 |
Carousel Slides (per carousel post)
| Step | Model | Before | After |
|---|---|---|---|
| Slide content (1 call) | Claude Sonnet โ Haiku | ~$0.01 | ~$0.003 |
| Text overlay (Sharp, local) | No AI | Free | Free |
| Total per carousel | ~$0.01 | ~$0.003 |
Post Customization (per edit)
| Step | Model | Before | After |
|---|---|---|---|
| Caption rewrite | Claude Sonnet | ~$0.01 | ~$0.01 (kept on Sonnet) |
| Image regen (if needed) | fal.ai | ~$0.025โ0.04 | ~$0.025โ0.04 (unchanged) |
| Total per customization | ~$0.01โ0.05 | ~$0.01โ0.05 |
Summary per Business (full onboarding + first batch)
| Component | Before (All Sonnet) | After (Mixed) | Savings |
|---|---|---|---|
| Onboarding analysis | ~$0.12โ0.18 | ~$0.08โ0.12 | ~33% |
| First 6 posts generation | ~$0.20โ0.40 | ~$0.19โ0.39 | ~5% |
| Total per business | ~$0.32โ0.58 | ~$0.27โ0.51 | ~15โ25% |
The Claude-only savings are much more dramatic โ roughly fifty percent โ but image generation (fal.ai) dominates total cost and stays unchanged. The pricing difference between model tiers tells the story:
| Model | Input Cost | Output Cost |
|---|---|---|
| Claude Sonnet | $3.00 / 1M tokens | $15.00 / 1M tokens |
| Claude Haiku | $0.80 / 1M tokens | $4.00 / 1M tokens |
At scale, those fractions compound. A thousand onboardings per month saves fifty to seventy dollars on Claude alone โ and that's before factoring in the speed improvements from Haiku's faster response times.
Why This Isn't Just About Cost
The savings are real, but there's a subtler benefit: speed. Haiku responds faster than Sonnet. For tasks in the onboarding pipeline that run sequentially, shaving a second off each call adds up. Users see their content calendar and influencer matches faster.
There's also a resilience argument. During peak usage or rate limit scenarios, having some calls on a different model tier distributes load. If one model is experiencing latency, the other might not be.
The Tasks That Surprised Us
Quality scoring was the call we debated longest. It evaluates whether a generated caption feels authentic โ is the product placement natural or forced? Does the influencer's voice come through? You'd think this requires sophisticated judgment.
It turns out, when you give a smaller model a well-structured rubric with clear criteria and examples, it scores almost identically to the larger model. The rubric does the heavy lifting, not the model size. This is a pattern worth remembering: a clear prompt can compensate for a smaller model.
Calendar generation was the opposite surprise โ we expected it to be trivial, but the first few Haiku attempts occasionally produced invalid product IDs or missed the content category distribution rules. We solved this by tightening the prompt constraints, and now it performs reliably.
The Principle
The lesson generalizes beyond our specific use case. If you're building AI-powered products, resist the default of routing everything through your most expensive model. Audit each call. Ask: what kind of thinking does this actually require? Creative generation, nuanced analysis, and open-ended reasoning deserve your best model. Structured extraction, scoring against criteria, and slot-filling work fine with a lighter one.
The best model for the job isn't always the most capable one. Sometimes it's the one that's fast, cheap, and exactly smart enough.