JiwaAI
Blog
โ†All posts
cost-optimization
architecture
ai

Six AI Calls, One Business Profile

Jiwa AI Teamยท

The Temptation of Twenty Calls

When you build an AI pipeline, the natural instinct is to break everything into small, focused prompts. One call to extract the business name. Another for the industry. A third for products. One more for brand voice. Before you know it, you have twenty API calls, each returning a neat JSON fragment, and your onboarding takes two minutes and costs a dollar.

We went the other direction. Our entire onboarding โ€” from a raw URL to a publishable content calendar with AI-generated images โ€” uses exactly six Claude calls. Not because we were optimizing prematurely, but because we discovered that fewer, richer calls produce better results than many narrow ones.

The Call Graph

The six calls form a dependency chain. Each one builds on the outputs of the previous steps, and each one extracts structured JSON in a single pass.

The first call is the heaviest. It ingests everything we know โ€” scraped website text, image descriptions, Instagram posts, follower data โ€” and produces a complete business profile in one shot. Business name, industry, products, brand voice, target audience, trending topics, visual mood. A single system prompt with detailed extraction instructions, a single response with a typed JSON schema.

The next two calls run in parallel: color theme extraction and mood board analysis. Both take the business profile as input. The theme analyzer also ingests CSS colors and meta tags from the website. The mood board analyzer looks at Instagram engagement patterns. Neither depends on the other.

Then comes influencer matching, which needs both the profile and the theme. Then calendar generation, which needs the profile, the matched influencers, and the product list. Finally, batch caption generation โ€” all six post captions in a single call, with quality scoring folded into the response.

Why Batching Beats Splitting

The counterintuitive finding: Claude produces more consistent, higher-quality output when it has full context. When we split product extraction from brand voice analysis, the products came back generic. When we combined them, the AI could see that a sourdough biscuit brand with an artisan tone should have products named after their actual sourdough varieties, not generic categories like "biscuit" and "cookie."

Similarly, generating all six captions in one call means Claude can ensure diversity across the batch. Different opening words, varied sentence structures, no repeated adjectives. When we generated captions one at a time, posts three through six started sounding like variations of post one.

The Cost Math

Six calls at roughly fifteen thousand tokens total costs about ten cents. The image generation โ€” six to eighteen calls to our image API depending on content types โ€” costs another twelve to twenty-four cents. Total onboarding cost: thirty to fifty-six cents per business.

At that price point, a small business owner in Jakarta paying the equivalent of three dollars a month gets content that would cost hundreds from a human agency. The six-call architecture is what makes that math work. Twenty calls would triple the LLM cost and double the latency.

Where We Parallelized

Not every call needs to wait for the previous one. Theme analysis and mood board analysis run simultaneously because they share the same input but produce independent outputs. Image generation for all six posts runs in parallel because each post's visual specification is already fully determined by the calendar.

The bottleneck is the sequential dependency chain: scrape, analyze, match, calendar, generate. Each step genuinely needs the output of the previous one. We experimented with speculative execution โ€” starting influencer matching before analysis completes, using the URL's industry as a rough signal โ€” but the quality degradation wasn't worth the two seconds saved.

The Structured JSON Pattern

Every call uses the same wrapper. System prompt defines the role and output schema. User message contains the data payload. Response comes back as JSON, sometimes wrapped in markdown fences that we strip. Type-safe parsing catches malformed responses early.

This uniformity means adding a new intelligence layer โ€” say, competitor analysis or seasonal trend detection โ€” is a matter of writing one system prompt and one type definition. The infrastructure is the same six-call pattern, potentially expanded to seven.

We'll probably stay at six for a while. Each additional call adds latency that users feel. And honestly, six calls that each do something meaningful beats twelve calls that each do something trivial.