One Dollar, Two Minutes: SLA-Driven AI Development

The Invisible Tax on AI Products

Most AI products don't fail because the model is bad. They fail because nobody noticed the cost creeping up or the latency getting worse until a user complained — or worse, quietly left.

When we first launched Jiwa AI's onboarding pipeline, every feature improvement came with an unspoken question: "Did this make it slower? Did this make it more expensive?" We'd check manually, eyeball the cost logs, and move on. That worked when we had ten users. It doesn't work when you're trying to onboard hundreds of small businesses who expect results in the time it takes to make a cup of coffee.

Two Numbers That Changed How We Build

We drew two lines in the sand. Every onboarding run must cost less than one dollar. Every onboarding run must complete in under two minutes. Not guidelines — hard test failures.

These numbers aren't arbitrary. The one-dollar threshold comes from our unit economics: at our target price point, each onboarding needs to cost well under a dollar for the business to be viable at scale. The two-minute threshold comes from user behavior data — drop-off rates spike sharply after that mark. These are the boundaries where user experience and business sustainability intersect.

Why Tests, Not Dashboards

We could have built a monitoring dashboard. We could have set up alerts. But dashboards get ignored, and alerts get snoozed. What doesn't get ignored is a test that blocks your pull request.

Our end-to-end integration test now runs a real onboarding against a real website, then queries the cost database to sum every API call — every AI inference, every image generation, every WhatsApp message — for that specific business. If the total exceeds a dollar, the test fails. If the clock exceeds two minutes, the test fails. The error message tells you exactly which threshold you breached and by how much.

This means every engineer working on the pipeline gets immediate feedback. Adding a new AI call to improve caption quality? Great — but if it pushes the total cost to $1.03, you'll know before you merge. Introducing a new image processing step? If it adds 15 seconds that tips you over the two-minute mark, the test catches it.

The Surprising Effect on Design Decisions

Having hard SLA tests changed how we approach feature development in ways we didn't expect. Instead of asking "Should we add this?" the question became "Can we add this within the budget?"

That constraint turns out to be liberating. When a new AI analysis step costs twelve cents, you start looking at whether an existing step can be removed or combined. When a processing stage takes twenty seconds, you investigate whether it can run in parallel with something else. The SLA budget forces creative problem-solving rather than unconstrained feature accumulation.

We've found that our best architectural improvements — like parallelizing the pipeline into concurrent waves — were motivated not by abstract engineering goals but by the concrete pressure of staying under two minutes. The SLA made the right optimization obvious.

Cost Transparency as a Feature

Tracking cost per onboarding also gave us something unexpected: confidence in pricing. We know exactly what each onboarding costs us, broken down by service — AI inference, image generation, messaging delivery. When we set a price, we're not guessing at margins. When a customer asks why our service costs what it does, we can explain the value chain honestly.

For a Southeast Asian market where small businesses are price-sensitive and trust is earned through transparency, this matters more than any technical benchmark.

What We Measure, We Improve

Since adding the SLA tests, our average onboarding cost has dropped from roughly forty cents to thirty-five cents — not because we set out to cut costs, but because every PR that touches the pipeline gets automatic feedback on its cost impact. Engineers naturally gravitate toward efficient solutions when the budget is visible.

The same applies to duration. Our average onboarding time has steadily decreased because the two-minute ceiling creates healthy pressure to optimize without anyone needing to file a performance ticket.

Beyond Onboarding

We're now extending this approach to other parts of the platform. Content regeneration, post scheduling, and WhatsApp delivery each have their own cost and latency profiles that deserve the same rigor. The pattern is simple: define the user experience threshold, define the business viability threshold, encode both as tests, and let them guide every decision.

AI development doesn't have to be a black box of unpredictable costs and mysterious latency. Treat your AI pipeline like the production service it is — with SLAs, budgets, and tests that enforce them — and you'll build something users can rely on.