JiwaAI
Blog
โ†All posts
performance
architecture
ai
cost-optimization

How We Made Onboarding 30% Faster Without Spending a Cent More

Jiwa AI Teamยท

The Problem: A Very Expensive Queue

When a business onboards with Jiwa AI, a lot happens behind the scenes. We scrape their website, analyze their brand identity, extract product visuals, match influencers, plan a content calendar, generate captions, create images, score quality, and deliver everything to WhatsApp. That's over a dozen AI calls โ€” Claude for intelligence, Fal AI for images, Haiku Vision for analysis.

The original pipeline ran every step sequentially. Step 1 finishes, step 2 starts. Step 2 finishes, step 3 starts. Twelve steps in a single-file line, each waiting politely for the one before it.

The total cost was fine โ€” around thirty-five cents per onboarding. But the wall-clock time was approaching a hundred seconds. For a user sitting on WhatsApp waiting for their content, that felt like an eternity.

The Insight: Most Steps Don't Actually Need Each Other

We drew the dependency graph. Which step genuinely needs the output of which other step?

The answer was revealing. After the core brand analysis โ€” which everything depends on โ€” there were three completely independent enrichment calls running in series: theme analysis, product positioning, and influencer matching. Each takes about five seconds. None needs the other's output. But we were running them one after another, wasting fifteen seconds on what should take five.

The same pattern repeated deeper in the pipeline. Product visual analysis, positioning guard saves, and calendar generation were sequential โ€” but the calendar doesn't need product visuals. It only needs product names and influencer matches, both of which were ready waves earlier.

The Architecture: Nine Waves

We restructured the entire pipeline into nine waves, where each wave runs as many operations in parallel as possible.

Wave 1: Data Acquisition

scrapeWithSubpages(url)  โ”€โ”€โ”
                            โ”œโ”€โ”€โ†’ both ready
fetchInstagramData(token) โ”€โ”€โ”˜

The website scrape and Instagram data fetch are completely independent. Previously the IG fetch waited for scraping to finish. Now they run simultaneously. Saves about two seconds on a good day.

Wave 2: Core Brand Analysis

analyzeBusiness(scraped, instagram) โ”€โ”€โ†’ profile

This is the one true bottleneck. Everything downstream needs the business profile โ€” products, brand tone, target audience, keywords. This wave runs alone by design. There's no way to parallelize it without compromising the quality of the first try.

Wave 3: Three Parallel Enrichments

This is where the biggest win lives.

analyzeTheme(scraped, profile)    โ”€โ”€โ”
                                     โ”‚
analyzeProducts(profile)           โ”€โ”€โ”ผโ”€โ”€โ†’ all three ready
                                     โ”‚
matchInfluencers(profile)          โ”€โ”€โ”˜

Three independent Claude calls that each need the profile but not each other. The old code ran them sequentially in fifteen seconds. Now they complete in five โ€” the duration of the slowest single call.

One trade-off: influencer matching previously waited for theme analysis to finish so it could use brand colors for visual alignment scoring. But that visual signal is only ten percent of the composite score, and the check itself is trivially simplistic โ€” it just confirms color data exists, not whether colors actually harmonize. Not worth blocking a full Claude call for.

Wave 4: Mood Board + Database + Influencer Lookup

analyzeMoodBoard(profile, theme)  โ”€โ”€โ”
                                     โ”‚
prisma.business.create(...)        โ”€โ”€โ”ผโ”€โ”€โ†’ all three ready
                                     โ”‚
prisma.influencer.findMany(slugs)  โ”€โ”€โ”˜

The mood board analysis needs the brand theme from Wave 3. The business record save needs everything from Wave 3. The influencer database lookup just needs the slugs. All three are independent of each other.

We also replaced N sequential findUnique calls with a single findMany query. If three influencers matched, that's one database round-trip instead of three.

After this wave completes, we apply the mood board and influencer matches to the business record in a single update call โ€” one round-trip instead of the original two.

Wave 5: Product Records

createProductRecords(business, products, images)

This wave is sequential because it needs the business ID from Wave 4. But the IG media association building โ€” matching Instagram posts to products by keyword overlap โ€” is pure CPU with no I/O, so it runs instantly before the database write.

Wave 6: Three More Parallel Operations

analyzeProductVisuals(images)  โ”€โ”€โ”
                                  โ”‚
savePositioningGuards(products) โ”€โ”€โ”ผโ”€โ”€โ†’ all three ready
                                  โ”‚
generateCalendar(profile, ...)  โ”€โ”€โ”˜

This was the other major sequential bottleneck. Product visual analysis (Haiku Vision calls for each product), positioning guard saves (database writes), and calendar generation (a Claude call) all ran one after another. But the calendar doesn't need product visuals โ€” it only needs product names and influencer matches. Running them in parallel saves about eight seconds.

Each product's visual analysis also catches errors independently now. Previously, a single .catch() on the outer Promise.all meant one failing product could swallow errors from all the others.

Wave 7: Build Post Specifications

prisma.product.findMany(ids) โ”€โ”€โ†’ build PostSpec[]

Another batch query replacing N sequential reads. We load all product visual data in one findMany call, build the lookup maps, and assemble the post specifications that tell the image and caption generators exactly what to create.

Wave 8: Generate Posts

generateAllPosts(profile, specs, language)

This is the longest wave โ€” about thirty seconds โ€” but it's already internally parallelized. Captions are batched into a single Claude call. Images generate in parallel across all posts via Fal AI. Quality scoring runs in batch. There wasn't much to optimize here without changing the generation strategy itself.

Wave 9: Save Posts in Parallel

Promise.all(calendar.map(slot => prisma.post.create(...)))

The original code saved posts in a sequential for loop. Six database writes, one after another. Now they run concurrently.

One subtle bug we caught during review: the original parallel implementation used Array.push() inside Promise.all callbacks. Since promises resolve in non-deterministic order, the posts ended up in random order instead of calendar order. WhatsApp messages would arrive jumbled. We fixed this by pre-allocating an array with index-based assignment โ€” each post writes to its correct slot regardless of completion order.

The Numbers

Metric Before After Change
Wall-clock time ~96s ~68s -30%
Cost per onboarding ~$0.35 ~$0.35 No change
Claude API calls ~12 ~12 Same count
Fal AI calls 7-10 7-10 Same count
Database round-trips ~25 ~18 -28%

The cost is identical because we make the same AI calls โ€” we just don't wait for one to finish before starting the next. The database round-trips dropped because batch queries replaced sequential lookups.

What We Didn't Do

We didn't add retry logic. Retries are a band-aid for bad inputs. If a Claude call returns garbage, the problem is upstream โ€” truncated content, unclear prompts, or missing context. The right fix is making each call's input airtight on the first try, not papering over failures with exponential backoff.

We didn't change the cost structure. The same models run the same operations at the same prices. Parallelism is free performance โ€” you're just using time that was previously wasted on waiting.

We didn't add caching between waves. Each wave's output is consumed exactly once by downstream waves. Caching would add complexity without reducing either cost or latency.

How We're Testing It

Cloud Run's traffic splitting makes this easy to validate safely. We deployed the parallel version as a tagged revision that receives zero production traffic:

Production users โ”€โ”€โ†’ [old revision] โ”€โ”€โ†’ 100% traffic
Our test URL โ”€โ”€โ”€โ”€โ”€โ”€โ†’ [parallel-test] โ”€โ”€โ†’ 0% traffic, same DB

Both revisions hit the same Supabase database and the same AI APIs. The only difference is how the orchestration is structured. We run the end-to-end integration test against the tagged URL, compare cost and duration against the baseline, and gradually shift traffic once we're confident.

If anything goes wrong, rollback is instant โ€” one command to route all traffic back to the previous revision. No redeploy, no downtime.

The Takeaway

Performance optimization doesn't always mean algorithmic breakthroughs or infrastructure upgrades. Sometimes it means drawing the dependency graph and noticing that three things you're doing sequentially have no reason to wait for each other. The code changes were mechanical โ€” wrapping independent calls in Promise.all โ€” but the analysis of what's actually independent required understanding the entire pipeline's data flow.

The best part: this kind of optimization is invisible to users. They don't know their content generates thirty percent faster. They just know it feels snappy. And that's the point.