What a Single Jiwa AI Post Actually Costs
The Number First
Each post Jiwa AI generates โ including the brand analysis that powers it, the influencer matching, the caption, the image, and the WhatsApp delivery โ costs approximately $0.12.
Across a typical seven-post onboarding, the total AI spend is under ninety cents.
That number is not an approximation or a best-case scenario. It comes directly from the cost logging every onboarding writes to the database: service, model, operation, tokens consumed, pixels generated, duration. Here is how those numbers break down.
The Two Cost Centres
Every onboarding has two distinct spending categories: intelligence (text AI calls for brand analysis, captions, quality scoring) and image generation (the multi-step visual pipeline). They have very different cost profiles.
Intelligence is cheap. Image generation is where money actually goes.
Intelligence: ~$0.065 per Onboarding
The intelligence pipeline runs across Waves 2 through 8 and involves roughly twelve AI calls total.
Wave 2 is the only heavyweight call: a full brand analysis that reads the scraped website content, the Instagram feed if available, and produces the brand profile that every downstream step depends on. This is the one place where we use a premium-tier model, because the quality of this single call determines the quality of everything else. Typical cost: around $0.04.
Waves 3 through 6 run entirely on a faster, cheaper model. Theme extraction, product positioning analysis, influencer matching, mood board analysis, product visual analysis โ each is a focused, structured task that does not need the reasoning depth of the premium model. These calls run in parallel where possible. Total cost for all of them combined: around $0.015.
Wave 8 captions generate all seven post captions in a single batched call rather than seven separate requests. One call, one context payload, seven outputs. This is the most direct cost-saving architectural decision we made: batching eliminates repeated transmission of brand context and cuts latency by roughly 6x compared to sequential calls. Cost: around $0.003.
Vision quality checks โ assessing whether each generated image meets brand standards and whether the product is actually visible โ run in parallel with image generation and add roughly $0.001 per image.
Image Generation: ~$0.113 per Image
This is where the money is. Every new image goes through two inference steps.
Step 1 is a multi-reference generation pass. The pipeline assembles up to ten reference images in priority order: the influencer photograph first (if present), then product references, then mood board images sorted by engagement. A detailed prompt โ constructed from the brand DNA, the content calendar spec, and the visual format directive โ is sent alongside these references. The model synthesises them into a new scene.
At 1024ร1024 pixels, the cost for this step is approximately $0.072. The pricing scales with megapixels: $0.07 for the first megapixel, $0.03 per additional megapixel. A square 1024-pixel image is just barely above one megapixel, so the base rate applies.
Step 2 is a naturalisation pass. The output of Step 1 is processed by a second model with a single instruction: enhance photorealism while changing nothing. Add skin pores, realistic hair strand detail, subtle lighting imperfections. Remove waxy or CGI-smooth surfaces. Do not touch faces, products, clothing, or background. This pass costs a flat $0.04 regardless of image size.
The two-step architecture exists because the best multi-reference generation models are trained for composition fidelity โ getting all the references correctly into the scene โ while the best naturalisation models are trained to make the result look like a real photograph. Asking one model to do both produces results that look good in reference adherence but fall short on photorealism. Using two models in sequence gets both.
Combined per-image cost: $0.072 + $0.040 + $0.001 = $0.113
For a seven-post calendar, that is $0.79 in image generation alone.
WhatsApp: $0.007
Seven messages at $0.001 per send. Not worth optimising.
Total: $0.86
| Line item | Cost |
|---|---|
| Premium brand analysis (ร1) | $0.040 |
| Fast LLM calls โ enrichment, captions, quality (ร11) | $0.025 |
| Image generation Step 1 (ร7) | $0.504 |
| Image generation Step 2 naturalise (ร7) | $0.280 |
| Vision quality checks (ร7) | $0.007 |
| WhatsApp delivery (ร7) | $0.007 |
| Total | $0.863 |
Time: 3โ4 Minutes
Cost is one axis. Latency is the other.
The intelligence waves (1 through 7) take roughly 50โ90 seconds in total. Most of that is Wave 2 โ the Sonnet brand analysis โ with the parallel enrichment waves adding another 20โ30 seconds.
Image generation is the bottleneck. Each image takes 25โ40 seconds: Step 1 averages around 20 seconds, Step 2 around 15 seconds. We run three images concurrently (pLimit(3)) to avoid overwhelming the API. Seven images across three batches of parallel work takes roughly 2โ2.5 minutes.
Wave 9 โ saving posts to the database and sending WhatsApp previews โ adds another 10 seconds.
End-to-end: 3โ4 minutes from URL submission to WhatsApp delivery.
What Revisions Cost
When a user regenerates or revises a post, the pipeline routes differently. Revisions skip Step 1 entirely and run only the instruction-edit naturalisation model against the previous output. One call, one step.
Revision cost: $0.04 โ roughly 35% of the original generation cost.
This makes the economics of "generate, review, iterate" viable. A user who requests three revisions before approving a post pays $0.04 + $0.04 + $0.04 = $0.12 in revision costs on top of the original $0.113. Total for that post: $0.233. Still under twenty-five cents.
Why the Pipeline Is Two Steps Instead of One
The obvious question is why we pay for two API calls per image instead of finding one model that handles everything.
The answer is precision. Multi-reference generation models are optimised to synthesise content from multiple inputs โ they excel at composition and reference adherence. Naturalisation models are optimised for a single task: making an image look less like a render and more like a photograph. These are different capabilities that currently live in different architectures.
The cost of running both โ $0.072 + $0.040 โ is the cost of not having to choose between an image that correctly shows the product and influencer versus an image that looks like it was taken with a camera. Both criteria matter for content that gets posted publicly under a brand.
The Ceiling
At current pricing, the hard floor for a seven-post onboarding is around $0.75 โ even if intelligence became free, image generation accounts for 90% of the cost. The ceiling under our architecture is around $1.05 on a bad day with retries.
That band โ seventy-five cents to a dollar โ is the economic reality of producing seven pieces of original, brand-consistent, photorealistic influencer content. Compared to the cost of a single human-shot lifestyle photo, it is not a close comparison.