The Complete Cost Anatomy of a Jiwa AI Onboarding

What It Actually Costs to Generate a Month of Content

When a business owner submits their website URL to Jiwa AI, they trigger a nine-wave pipeline that runs brand analysis, influencer matching, image generation, caption writing, quality scoring, and WhatsApp delivery — all within three to four minutes. The question we get asked most often is: how much does all of that actually cost?

The answer is somewhere between $1.90 and $3.00 per onboarding, depending on post count, content type mix, and whether a video reel is generated. That number is not an estimate — it comes directly from the cost log every onboarding writes to the database, tagged by wave, operation, and service.

Here's where every cent goes.

The Nine-Wave Pipeline

The onboarding pipeline is structured as sequential waves, with parallelism within each wave wherever possible. Some waves are intelligence-heavy (text AI, brand analysis), others are generation-heavy (images, video). The cost profile shifts dramatically as the pipeline progresses.

flowchart TD
    A[URL Submitted] --> W1

    W1["⚡ Wave 1 — Scrape + Social Fetch\n(Parallel)\nWebsite scraper · IG media fetch · TikTok data\nCost: $0.00 — HTTP only"]
    W1 --> W2

    W2["🧠 Wave 2 — Core Brand Analysis\nDeep brand profile: voice, audience, tone, themes\nCost: ~$0.040 — Premium AI call"]
    W2 --> W3

    W3["⚡ Wave 3 — Three Enrichments (Parallel)\nTheme analysis · Product DNA · Influencer matching\nCost: ~$0.015 — Fast AI calls"]
    W3 --> W4

    W4["⚡ Wave 4 — Mood Board + Business Save (Parallel)\nVisual style analysis · DB record created\nCost: ~$0.003 — Fast AI call"]
    W4 --> W5

    W5["💾 Wave 5 — Product Records\nCreates Product DB rows\nCost: $0.00 — DB writes only"]
    W5 --> W6

    W6["⚡ Wave 6 — Brand DNA + Calendar (Parallel)\nContent strategy · 30-day calendar\nCost: ~$0.005 — Fast AI calls"]
    W6 --> W7

    W7["📋 Wave 7 — Post Specifications\nMaps calendar to influencer + product pairings\nCost: $0.00 — Uses Wave 6 output"]
    W7 --> W8

    W8["🖼️ Wave 8 — Generate Posts (Images + Captions)\nCaption batch · Image pipeline · Quality gate\nCost: ~$1.76 for 6 posts — Dominant cost centre"]
    W8 --> W9

    W9["📲 Wave 9 — Save + Deliver\nDB save · WhatsApp · IG publish\nCost: ~$0.007 — Fonnte messaging"]
    W9 --> W10

    W10["🎬 Wave 10 — TikTok Reel (Optional)\nIf TikTok connected + products exist\nCost: $1.13 — Per 15-second reel"]
    W10 --> DONE["✅ Delivered"]

    style W2 fill:#fef3c7,stroke:#d97706
    style W8 fill:#fee2e2,stroke:#dc2626
    style W10 fill:#ede9fe,stroke:#7c3aed

The Intelligence Budget: Waves 2–7

Text AI calls — brand analysis, enrichment tasks, calendar generation, captions — account for roughly $0.065 per onboarding. It feels like a small number, but the architecture decisions that keep it small are deliberate.

Wave 2: The One Premium Call

Wave 2 is the most expensive intelligence step, and intentionally so. This is the full brand analysis: the pipeline reads every scraped page, processes Instagram media if available, and produces a brand profile that every downstream wave depends on. The quality of this single call determines caption tone, influencer selection criteria, calendar theme distribution, and visual style direction.

Using a premium-tier reasoning model here is not indulgent — it's architectural. A weaker analysis at Wave 2 propagates errors through all nine subsequent waves. We pay more for this call because fixing downstream errors would cost far more in retries, quality gate failures, and regeneration cycles.

Wave 2 cost: ~$0.040

Waves 3–6: The Parallel Fast-AI Layer

Every other intelligence call in the pipeline uses a smaller, faster, cheaper model. Theme extraction, product positioning analysis, influencer matching, mood board interpretation, brand DNA synthesis, and calendar construction are all focused, structured tasks — they need speed and efficiency, not deep reasoning.

Running them on a fast model instead of the premium model reduces per-call cost by roughly 80%. Running them in parallel (Waves 3 and 4 each fire multiple tasks simultaneously) means the clock time for six AI calls is similar to one sequential call. Both savings compound: cheaper calls, fewer serial hops.

pie title Intelligence Cost Distribution per Onboarding
    "Wave 2 — Brand Analysis (premium)" : 40
    "Wave 3 — Enrichments (theme, product, matching)" : 15
    "Wave 4 — Mood board analysis" : 3
    "Wave 6 — Brand DNA + calendar" : 5
    "Wave 8 — Captions (batched)" : 3
    "Vision quality checks" : 7

Waves 3–8 intelligence cost: ~$0.025 combined

The caption generation step is worth highlighting: instead of seven separate API calls to write seven captions, the pipeline sends all seven requests in a single batched call. One call, one context payload, seven outputs. This eliminates six round-trips of brand context transmission and cuts caption latency by roughly 6x.

The Image Generation Budget: Wave 8

This is where the money is. Image generation accounts for approximately $1.76 of a typical six-post onboarding — roughly 90% of the total compute spend. Understanding why requires understanding how each image is produced.

The Two-Step Image Pipeline

Every new post image goes through two consecutive inference passes.

Step 1 — Multi-Reference Generation ($0.072 per image)

The pipeline assembles up to ten reference images in priority order: the influencer photograph anchors slot zero (when available), product images fill the next slots, and mood board images — sorted by engagement — fill remaining capacity. A full brand-aware prompt is submitted alongside these references, and the generation model synthesises them into a new scene.

The pricing scales with output resolution. At 1024×1024 pixels (approximately one megapixel), the first megapixel costs $0.070, plus a small additive charge per additional megapixel. Portrait and landscape crops cost marginally more.

Step 2 — Naturalisation ($0.040 per image)

The composition output from Step 1 is processed by a second model with a single instruction: enhance photorealism without changing the subject, composition, or product placement. Add natural skin texture, realistic lighting falloff, subtle depth imperfections. Remove any CGI-smooth surfaces or waxy highlights. This pass costs a flat $0.040 regardless of resolution.

The two-step architecture exists because composition fidelity and photorealism are currently optimised in different model families. A model that excels at multi-reference synthesis tends to produce images that place products and influencers correctly but look slightly rendered. A model trained for photorealism excels at visual naturalism but cannot synthesise from multiple references. Running them in sequence delivers both.

sequenceDiagram
    participant P as Post Spec
    participant S1 as Step 1 — Multi-Reference
    participant QC as Vision Quality Check
    participant S2 as Step 2 — Naturalise
    participant DB as Database

    P->>S1: Prompt + up to 10 reference images
    Note over S1: ~$0.072 at 1024×1024
    S1->>QC: Generated image
    QC-->>S1: Product not visible? → Retry S1
    S1->>S2: Composition output
    Note over S2: $0.040 flat
    S2->>DB: Photorealistic final image
    Note over DB: Total: ~$0.113 per image

Vision quality checks run in parallel with the generation pipeline, assessing whether the influencer reference is correctly rendered and whether the product is actually visible and unoccluded. These checks cost approximately $0.001 per image. If the product fails the visibility check, Step 1 is retried with a more explicit product placement directive — adding another $0.072 to that specific image's cost.

Standard per-image cost: $0.113 With one product-visibility retry: $0.185 With optional face-lock refinement pass: $0.193

Post Type Mix and Image Counts

Not all posts generate the same number of images. The content type mix significantly affects the total image budget.

bar
  title Images Generated per Post Type (6-post free tier)
  x-axis ["UGC Posts", "Product Posts", "Carousel Posts"]
  y-axis "Images per post" 0 --> 7
  bar [1, 1, 6]

Carousel posts are the cost multiplier. Each carousel generates six images: a hook cover, a UGC moment, three content/feature slides, and a CTA with a hero product shot. The six-slide structure is deliberately fixed — it follows the content formula that performs best for Indonesian SMEs, where educational carousel content consistently outperforms single-image posts by 2–3x on saves.

The free-tier default is six posts. With products, the content ratio allocates roughly two UGC posts, two product posts, and two carousels. That means:

2 UGC posts × 1 image = 2 images
2 product posts × 1 image = 2 images
2 carousel posts × 6 images = 12 images
Total: ~16 images

pie title Image Generation Cost — 6-Post Free Tier (~$1.76 total)
    "Carousel images (12 × $0.113)" : 1.356
    "UGC images (2 × $0.113)" : 0.226
    "Product images (2 × $0.113)" : 0.226

The Revision Economics

When a user requests a revision — "change the background to a coffee shop" or "make the product larger" — the pipeline routes differently. Revisions skip Step 1 entirely and run only the instruction-edit naturalisation model against the previous output.

Revision cost: $0.040 — roughly 35% of original generation cost.

This makes the generate-review-iterate loop economically viable. A post that goes through three revisions before approval costs:

$0.113 (original) + $0.040 + $0.040 + $0.040 = $0.233 total — still under twenty-five cents.

flowchart LR
    subgraph NEW["New Generation — $0.113"]
        direction TB
        A1[Step 1 — Multi-Reference\n$0.072] --> A2[Step 2 — Naturalise\n$0.040]
        A2 --> A3[Vision Check\n$0.001]
    end

    subgraph REV["Revision — $0.040"]
        direction TB
        B1[Instruction Edit\n$0.040]
    end

    NEW --> APPROVE{Approved?}
    APPROVE -- Yes --> DONE[Published]
    APPROVE -- No, needs changes --> REV
    REV --> APPROVE

Video Reel Generation: The Premium Step

Video generation is the most expensive single operation in the pipeline, and it's optional — triggered only when a TikTok account is connected and products exist in the system.

Architecture: Three Beats, Fifteen Seconds

A Jiwa AI reel is constructed from three five-second beats, each independently generated and then stitched together. Each beat follows the same two-step process: generate a still image using the standard image pipeline, then animate it using an image-to-video model.

sequenceDiagram
    participant P as Pipeline
    participant IMG as Image Pipeline
    participant VID as Animation Model
    participant FF as Video Stitcher

    P->>IMG: Beat 1 prompt + references
    P->>IMG: Beat 2 prompt + references
    P->>IMG: Beat 3 prompt + references

    IMG-->>VID: Beat 1 image ($0.113)
    IMG-->>VID: Beat 2 image ($0.113)
    IMG-->>VID: Beat 3 image ($0.113)

    VID-->>FF: Beat 1 video 5s (~$0.26)
    VID-->>FF: Beat 2 video 5s (~$0.26)
    VID-->>FF: Beat 3 video 5s (~$0.26)

    FF-->>P: 15-second reel ($0.00 — local ffmpeg)
    Note over P: Total reel cost: $1.13

Per reel breakdown:

3 still images × $0.113 = $0.339
3 animation calls × ~$0.26 = $0.789
ffmpeg stitching = $0.000 (local compute)
Total: $1.13 per 15-second reel

The animation model is charged per second of output video. At five seconds per beat and three beats per reel, the animation cost dominates the reel budget. This is the current reality of high-quality image-to-video generation — motion synthesis remains significantly more compute-intensive than image generation.

Only one reel is generated per onboarding run (using the first influencer and first product match). Subsequent reels, if requested from the dashboard, are charged at the same $1.13 rate.

Delivery: Wave 9

WhatsApp delivery via Fonnte costs $0.001 per message. At six posts per onboarding, the delivery step costs $0.006. Not worth optimising — but every service in the pipeline logs costs, so it appears in the admin dashboard alongside every other line item.

Total Cost Summary

bar
  title Onboarding Cost Breakdown — 6-Post Free Tier
  x-axis ["Wave 2 Brand Analysis", "Waves 3-7 Enrichment", "Wave 8 Captions+Checks", "Wave 8 Images (16)", "Wave 9 WhatsApp", "Wave 10 Reel (optional)"]
  y-axis "Cost (USD)" 0 --> 1.8
  bar [0.040, 0.020, 0.004, 1.760, 0.006, 1.130]

Cost Centre	Detail	USD
Brand analysis (Wave 2)	1 premium AI call	$0.040
Enrichment + calendar (Waves 3–6)	~8 fast AI calls, parallel	$0.020
Caption batch + quality scoring	1 batched call + 16 vision checks	$0.035
Image generation (16 images)	Step 1 + Step 2 × 16	$1.760
WhatsApp delivery	6 messages × $0.001	$0.006
Subtotal (no reel)		$1.861
TikTok reel (optional)	3-beat × image + animation	$1.130
Total with reel		$2.991

The cost floor — if every image generates cleanly without retries — is approximately $1.86. The ceiling, accounting for product-visibility retries across multiple posts and a reel, is approximately $3.20.

The User-Facing Pricing Model

Jiwa AI charges for content generation at the moment generation occurs — not at publish time. When you click "Generate," your credit balance decrements immediately, and the full AI pipeline runs. Publishing is free. Revisions cost 35% of original generation. This model means your credit balance is always exact: 10 credits means you can generate exactly 10 posts.

Content Packs

Pack	Posts	Stories	IDR	SGD
Free	6	0	—	—
Coba	3	2	99,000	$9
Starter	15	10	375,000	$35
Business	30	20	700,000	$65
Premium	60	40	1,200,000	$115

xychart-beta
    title "Content Pack: Posts Included vs Price (SGD)"
    x-axis ["Coba", "Starter", "Business", "Premium"]
    y-axis "SGD" 0 --> 120
    bar [9, 35, 65, 115]
    line [9, 35, 65, 115]

The Business tier (30 posts / $65 SGD) is the most popular. At $2.17 per post slot in user-facing pricing versus approximately $0.12 in compute cost per post, the margin funds platform costs, infrastructure, ongoing model improvements, and the human QA layer that monitors generation quality across businesses.

What Higher Tiers Unlock

Beyond post volume, the tiers unlock capabilities:

Coba / Starter — Direct publishing, WhatsApp delivery, community support
Business — All above + unlimited revisions, caption translations, multiple visual styles, priority support
Premium — All Business features + mood board customisation, dedicated support channel

The Economic Architecture in One View

flowchart TD
    User["Brand Owner\n(Purchases Content Pack)"] --> CreditCheck

    CreditCheck{"Credits ≥ posts\nrequested?"}
    CreditCheck -- No --> Block["Request blocked\n→ Purchase prompt"]
    CreditCheck -- Yes --> Deduct["Credits decremented\natomically"]

    Deduct --> Pipeline

    subgraph Pipeline["9-Wave Pipeline (~$1.86–$3.20)"]
        INT["Intelligence Layer\n~$0.065\nBrand · Calendar · Captions"]
        IMG["Image Layer\n~$1.76 (16 images)\nGenerate → Naturalise → QC"]
        VID["Video Layer (optional)\n~$1.13\n3-beat animation pipeline"]
        DEL["Delivery\n$0.006\nWhatsApp + IG publish"]
        INT --> IMG --> DEL
        INT --> VID --> DEL
    end

    Pipeline --> Log["ApiCostLog\n(every service, model, call logged)"]
    Pipeline --> User2["Content in WhatsApp\nwithin 3–4 minutes"]

    Log --> Admin["Admin Cost Dashboard\n/admin/costs"]

What the Numbers Mean

The cost structure of Jiwa AI reflects a deliberate set of trade-offs: expensive at the image layer (because photorealism requires multi-step inference), cheap at the intelligence layer (because batching and model tiering compound), and optional at the video layer (because reel generation is roughly 10x more expensive per creative unit than still image generation).

The goal is not the lowest possible cost per onboarding — it's the highest quality output that stays within a margin that makes the business model viable. At current pricing, generating a month of influencer content for an Indonesian SME costs Jiwa AI between $1.86 and $3.20. The human cost of equivalent photography, influencer coordination, and caption writing would be several orders of magnitude higher.

That gap is the value proposition. The cost breakdown is how we protect it.

Interested in how the image pipeline evolved from single-model to two-step? Read Making AI Images Look Real and Chasing the Last Mile of Photorealism.