The Complete Cost Anatomy of a Jiwa AI Onboarding
What It Actually Costs to Generate a Month of Content
When a business owner submits their website URL to Jiwa AI, they trigger a nine-wave pipeline that runs brand analysis, influencer matching, image generation, caption writing, quality scoring, and WhatsApp delivery โ all within three to four minutes. The question we get asked most often is: how much does all of that actually cost?
The answer is somewhere between $1.90 and $3.00 per onboarding, depending on post count, content type mix, and whether a video reel is generated. That number is not an estimate โ it comes directly from the cost log every onboarding writes to the database, tagged by wave, operation, and service.
Here's where every cent goes.
The Nine-Wave Pipeline
The onboarding pipeline is structured as sequential waves, with parallelism within each wave wherever possible. Some waves are intelligence-heavy (text AI, brand analysis), others are generation-heavy (images, video). The cost profile shifts dramatically as the pipeline progresses.
flowchart TD
A[URL Submitted] --> W1
W1["โก Wave 1 โ Scrape + Social Fetch\n(Parallel)\nWebsite scraper ยท IG media fetch ยท TikTok data\nCost: $0.00 โ HTTP only"]
W1 --> W2
W2["๐ง Wave 2 โ Core Brand Analysis\nDeep brand profile: voice, audience, tone, themes\nCost: ~$0.040 โ Premium AI call"]
W2 --> W3
W3["โก Wave 3 โ Three Enrichments (Parallel)\nTheme analysis ยท Product DNA ยท Influencer matching\nCost: ~$0.015 โ Fast AI calls"]
W3 --> W4
W4["โก Wave 4 โ Mood Board + Business Save (Parallel)\nVisual style analysis ยท DB record created\nCost: ~$0.003 โ Fast AI call"]
W4 --> W5
W5["๐พ Wave 5 โ Product Records\nCreates Product DB rows\nCost: $0.00 โ DB writes only"]
W5 --> W6
W6["โก Wave 6 โ Brand DNA + Calendar (Parallel)\nContent strategy ยท 30-day calendar\nCost: ~$0.005 โ Fast AI calls"]
W6 --> W7
W7["๐ Wave 7 โ Post Specifications\nMaps calendar to influencer + product pairings\nCost: $0.00 โ Uses Wave 6 output"]
W7 --> W8
W8["๐ผ๏ธ Wave 8 โ Generate Posts (Images + Captions)\nCaption batch ยท Image pipeline ยท Quality gate\nCost: ~$1.76 for 6 posts โ Dominant cost centre"]
W8 --> W9
W9["๐ฒ Wave 9 โ Save + Deliver\nDB save ยท WhatsApp ยท IG publish\nCost: ~$0.007 โ Fonnte messaging"]
W9 --> W10
W10["๐ฌ Wave 10 โ TikTok Reel (Optional)\nIf TikTok connected + products exist\nCost: $1.13 โ Per 15-second reel"]
W10 --> DONE["โ
Delivered"]
style W2 fill:#fef3c7,stroke:#d97706
style W8 fill:#fee2e2,stroke:#dc2626
style W10 fill:#ede9fe,stroke:#7c3aed
The Intelligence Budget: Waves 2โ7
Text AI calls โ brand analysis, enrichment tasks, calendar generation, captions โ account for roughly $0.065 per onboarding. It feels like a small number, but the architecture decisions that keep it small are deliberate.
Wave 2: The One Premium Call
Wave 2 is the most expensive intelligence step, and intentionally so. This is the full brand analysis: the pipeline reads every scraped page, processes Instagram media if available, and produces a brand profile that every downstream wave depends on. The quality of this single call determines caption tone, influencer selection criteria, calendar theme distribution, and visual style direction.
Using a premium-tier reasoning model here is not indulgent โ it's architectural. A weaker analysis at Wave 2 propagates errors through all nine subsequent waves. We pay more for this call because fixing downstream errors would cost far more in retries, quality gate failures, and regeneration cycles.
Wave 2 cost: ~$0.040
Waves 3โ6: The Parallel Fast-AI Layer
Every other intelligence call in the pipeline uses a smaller, faster, cheaper model. Theme extraction, product positioning analysis, influencer matching, mood board interpretation, brand DNA synthesis, and calendar construction are all focused, structured tasks โ they need speed and efficiency, not deep reasoning.
Running them on a fast model instead of the premium model reduces per-call cost by roughly 80%. Running them in parallel (Waves 3 and 4 each fire multiple tasks simultaneously) means the clock time for six AI calls is similar to one sequential call. Both savings compound: cheaper calls, fewer serial hops.
pie title Intelligence Cost Distribution per Onboarding
"Wave 2 โ Brand Analysis (premium)" : 40
"Wave 3 โ Enrichments (theme, product, matching)" : 15
"Wave 4 โ Mood board analysis" : 3
"Wave 6 โ Brand DNA + calendar" : 5
"Wave 8 โ Captions (batched)" : 3
"Vision quality checks" : 7
Waves 3โ8 intelligence cost: ~$0.025 combined
The caption generation step is worth highlighting: instead of seven separate API calls to write seven captions, the pipeline sends all seven requests in a single batched call. One call, one context payload, seven outputs. This eliminates six round-trips of brand context transmission and cuts caption latency by roughly 6x.
The Image Generation Budget: Wave 8
This is where the money is. Image generation accounts for approximately $1.76 of a typical six-post onboarding โ roughly 90% of the total compute spend. Understanding why requires understanding how each image is produced.
The Two-Step Image Pipeline
Every new post image goes through two consecutive inference passes.
Step 1 โ Multi-Reference Generation ($0.072 per image)
The pipeline assembles up to ten reference images in priority order: the influencer photograph anchors slot zero (when available), product images fill the next slots, and mood board images โ sorted by engagement โ fill remaining capacity. A full brand-aware prompt is submitted alongside these references, and the generation model synthesises them into a new scene.
The pricing scales with output resolution. At 1024ร1024 pixels (approximately one megapixel), the first megapixel costs $0.070, plus a small additive charge per additional megapixel. Portrait and landscape crops cost marginally more.
Step 2 โ Naturalisation ($0.040 per image)
The composition output from Step 1 is processed by a second model with a single instruction: enhance photorealism without changing the subject, composition, or product placement. Add natural skin texture, realistic lighting falloff, subtle depth imperfections. Remove any CGI-smooth surfaces or waxy highlights. This pass costs a flat $0.040 regardless of resolution.
The two-step architecture exists because composition fidelity and photorealism are currently optimised in different model families. A model that excels at multi-reference synthesis tends to produce images that place products and influencers correctly but look slightly rendered. A model trained for photorealism excels at visual naturalism but cannot synthesise from multiple references. Running them in sequence delivers both.
sequenceDiagram
participant P as Post Spec
participant S1 as Step 1 โ Multi-Reference
participant QC as Vision Quality Check
participant S2 as Step 2 โ Naturalise
participant DB as Database
P->>S1: Prompt + up to 10 reference images
Note over S1: ~$0.072 at 1024ร1024
S1->>QC: Generated image
QC-->>S1: Product not visible? โ Retry S1
S1->>S2: Composition output
Note over S2: $0.040 flat
S2->>DB: Photorealistic final image
Note over DB: Total: ~$0.113 per image
Vision quality checks run in parallel with the generation pipeline, assessing whether the influencer reference is correctly rendered and whether the product is actually visible and unoccluded. These checks cost approximately $0.001 per image. If the product fails the visibility check, Step 1 is retried with a more explicit product placement directive โ adding another $0.072 to that specific image's cost.
Standard per-image cost: $0.113 With one product-visibility retry: $0.185 With optional face-lock refinement pass: $0.193
Post Type Mix and Image Counts
Not all posts generate the same number of images. The content type mix significantly affects the total image budget.
bar
title Images Generated per Post Type (6-post free tier)
x-axis ["UGC Posts", "Product Posts", "Carousel Posts"]
y-axis "Images per post" 0 --> 7
bar [1, 1, 6]
Carousel posts are the cost multiplier. Each carousel generates six images: a hook cover, a UGC moment, three content/feature slides, and a CTA with a hero product shot. The six-slide structure is deliberately fixed โ it follows the content formula that performs best for Indonesian SMEs, where educational carousel content consistently outperforms single-image posts by 2โ3x on saves.
The free-tier default is six posts. With products, the content ratio allocates roughly two UGC posts, two product posts, and two carousels. That means:
- 2 UGC posts ร 1 image = 2 images
- 2 product posts ร 1 image = 2 images
- 2 carousel posts ร 6 images = 12 images
- Total: ~16 images
pie title Image Generation Cost โ 6-Post Free Tier (~$1.76 total)
"Carousel images (12 ร $0.113)" : 1.356
"UGC images (2 ร $0.113)" : 0.226
"Product images (2 ร $0.113)" : 0.226
The Revision Economics
When a user requests a revision โ "change the background to a coffee shop" or "make the product larger" โ the pipeline routes differently. Revisions skip Step 1 entirely and run only the instruction-edit naturalisation model against the previous output.
Revision cost: $0.040 โ roughly 35% of original generation cost.
This makes the generate-review-iterate loop economically viable. A post that goes through three revisions before approval costs:
$0.113 (original) + $0.040 + $0.040 + $0.040 = $0.233 total โ still under twenty-five cents.
flowchart LR
subgraph NEW["New Generation โ $0.113"]
direction TB
A1[Step 1 โ Multi-Reference\n$0.072] --> A2[Step 2 โ Naturalise\n$0.040]
A2 --> A3[Vision Check\n$0.001]
end
subgraph REV["Revision โ $0.040"]
direction TB
B1[Instruction Edit\n$0.040]
end
NEW --> APPROVE{Approved?}
APPROVE -- Yes --> DONE[Published]
APPROVE -- No, needs changes --> REV
REV --> APPROVE
Video Reel Generation: The Premium Step
Video generation is the most expensive single operation in the pipeline, and it's optional โ triggered only when a TikTok account is connected and products exist in the system.
Architecture: Three Beats, Fifteen Seconds
A Jiwa AI reel is constructed from three five-second beats, each independently generated and then stitched together. Each beat follows the same two-step process: generate a still image using the standard image pipeline, then animate it using an image-to-video model.
sequenceDiagram
participant P as Pipeline
participant IMG as Image Pipeline
participant VID as Animation Model
participant FF as Video Stitcher
P->>IMG: Beat 1 prompt + references
P->>IMG: Beat 2 prompt + references
P->>IMG: Beat 3 prompt + references
IMG-->>VID: Beat 1 image ($0.113)
IMG-->>VID: Beat 2 image ($0.113)
IMG-->>VID: Beat 3 image ($0.113)
VID-->>FF: Beat 1 video 5s (~$0.26)
VID-->>FF: Beat 2 video 5s (~$0.26)
VID-->>FF: Beat 3 video 5s (~$0.26)
FF-->>P: 15-second reel ($0.00 โ local ffmpeg)
Note over P: Total reel cost: $1.13
Per reel breakdown:
- 3 still images ร $0.113 = $0.339
- 3 animation calls ร ~$0.26 = $0.789
- ffmpeg stitching = $0.000 (local compute)
- Total: $1.13 per 15-second reel
The animation model is charged per second of output video. At five seconds per beat and three beats per reel, the animation cost dominates the reel budget. This is the current reality of high-quality image-to-video generation โ motion synthesis remains significantly more compute-intensive than image generation.
Only one reel is generated per onboarding run (using the first influencer and first product match). Subsequent reels, if requested from the dashboard, are charged at the same $1.13 rate.
Delivery: Wave 9
WhatsApp delivery via Fonnte costs $0.001 per message. At six posts per onboarding, the delivery step costs $0.006. Not worth optimising โ but every service in the pipeline logs costs, so it appears in the admin dashboard alongside every other line item.
Total Cost Summary
bar
title Onboarding Cost Breakdown โ 6-Post Free Tier
x-axis ["Wave 2 Brand Analysis", "Waves 3-7 Enrichment", "Wave 8 Captions+Checks", "Wave 8 Images (16)", "Wave 9 WhatsApp", "Wave 10 Reel (optional)"]
y-axis "Cost (USD)" 0 --> 1.8
bar [0.040, 0.020, 0.004, 1.760, 0.006, 1.130]
| Cost Centre | Detail | USD |
|---|---|---|
| Brand analysis (Wave 2) | 1 premium AI call | $0.040 |
| Enrichment + calendar (Waves 3โ6) | ~8 fast AI calls, parallel | $0.020 |
| Caption batch + quality scoring | 1 batched call + 16 vision checks | $0.035 |
| Image generation (16 images) | Step 1 + Step 2 ร 16 | $1.760 |
| WhatsApp delivery | 6 messages ร $0.001 | $0.006 |
| Subtotal (no reel) | $1.861 | |
| TikTok reel (optional) | 3-beat ร image + animation | $1.130 |
| Total with reel | $2.991 |
The cost floor โ if every image generates cleanly without retries โ is approximately $1.86. The ceiling, accounting for product-visibility retries across multiple posts and a reel, is approximately $3.20.
The User-Facing Pricing Model
Jiwa AI charges for content generation at the moment generation occurs โ not at publish time. When you click "Generate," your credit balance decrements immediately, and the full AI pipeline runs. Publishing is free. Revisions cost 35% of original generation. This model means your credit balance is always exact: 10 credits means you can generate exactly 10 posts.
Content Packs
| Pack | Posts | Stories | IDR | SGD |
|---|---|---|---|---|
| Free | 6 | 0 | โ | โ |
| Coba | 3 | 2 | 99,000 | $9 |
| Starter | 15 | 10 | 375,000 | $35 |
| Business | 30 | 20 | 700,000 | $65 |
| Premium | 60 | 40 | 1,200,000 | $115 |
xychart-beta
title "Content Pack: Posts Included vs Price (SGD)"
x-axis ["Coba", "Starter", "Business", "Premium"]
y-axis "SGD" 0 --> 120
bar [9, 35, 65, 115]
line [9, 35, 65, 115]
The Business tier (30 posts / $65 SGD) is the most popular. At $2.17 per post slot in user-facing pricing versus approximately $0.12 in compute cost per post, the margin funds platform costs, infrastructure, ongoing model improvements, and the human QA layer that monitors generation quality across businesses.
What Higher Tiers Unlock
Beyond post volume, the tiers unlock capabilities:
- Coba / Starter โ Direct publishing, WhatsApp delivery, community support
- Business โ All above + unlimited revisions, caption translations, multiple visual styles, priority support
- Premium โ All Business features + mood board customisation, dedicated support channel
The Economic Architecture in One View
flowchart TD
User["Brand Owner\n(Purchases Content Pack)"] --> CreditCheck
CreditCheck{"Credits โฅ posts\nrequested?"}
CreditCheck -- No --> Block["Request blocked\nโ Purchase prompt"]
CreditCheck -- Yes --> Deduct["Credits decremented\natomically"]
Deduct --> Pipeline
subgraph Pipeline["9-Wave Pipeline (~$1.86โ$3.20)"]
INT["Intelligence Layer\n~$0.065\nBrand ยท Calendar ยท Captions"]
IMG["Image Layer\n~$1.76 (16 images)\nGenerate โ Naturalise โ QC"]
VID["Video Layer (optional)\n~$1.13\n3-beat animation pipeline"]
DEL["Delivery\n$0.006\nWhatsApp + IG publish"]
INT --> IMG --> DEL
INT --> VID --> DEL
end
Pipeline --> Log["ApiCostLog\n(every service, model, call logged)"]
Pipeline --> User2["Content in WhatsApp\nwithin 3โ4 minutes"]
Log --> Admin["Admin Cost Dashboard\n/admin/costs"]
What the Numbers Mean
The cost structure of Jiwa AI reflects a deliberate set of trade-offs: expensive at the image layer (because photorealism requires multi-step inference), cheap at the intelligence layer (because batching and model tiering compound), and optional at the video layer (because reel generation is roughly 10x more expensive per creative unit than still image generation).
The goal is not the lowest possible cost per onboarding โ it's the highest quality output that stays within a margin that makes the business model viable. At current pricing, generating a month of influencer content for an Indonesian SME costs Jiwa AI between $1.86 and $3.20. The human cost of equivalent photography, influencer coordination, and caption writing would be several orders of magnitude higher.
That gap is the value proposition. The cost breakdown is how we protect it.
Interested in how the image pipeline evolved from single-model to two-step? Read Making AI Images Look Real and Chasing the Last Mile of Photorealism.