JiwaAI
Blog
โ†All posts
image-generation
cost-optimization
ai
design

We Made Our AI Images Photorealistic โ€” Here's What It Cost

Jiwa AI Teamยท

The Problem Was Obvious

Look at any AI-generated influencer content long enough and the tells jump out. Skin that's too smooth. Lighting that's too even. A background that can't quite decide if it's a cafe or a living room. And then there's the text โ€” a dark rounded rectangle slapped over the image like a PowerPoint title slide.

We were using Flux Dev for all our image generation. It's a solid model โ€” fast, affordable, decent quality. But "decent" doesn't cut it when your output sits next to real photography on someone's Instagram feed. The gap between our generated content and a real influencer photo was visible at a glance.

Two Changes, Not Twenty

The temptation with quality upgrades is to overhaul everything. New pipeline, new architecture, new models across the board. We resisted that. The fix came down to two targeted changes.

Change 1: Flux Realism Instead of Flux Dev

Fal AI offers a variant of Flux with a realism LoRA โ€” a fine-tuning layer specifically trained to push output toward photographic realism. Same API, same parameters, different model endpoint. The switch was a one-line change.

But we didn't stop at the model swap. We also updated every prompt builder in the pipeline with photographic anchors โ€” specific camera and lens references that prime the model toward photorealistic output.

Instead of ending prompts with "Professional lifestyle scene, natural interaction, vibrant colors," we now close with "Photorealistic photograph, shot on Canon EOS R5, 85mm f/1.4 lens, natural window lighting, shallow depth of field." The model responds to these cues. Specifying a real camera, a real lens, and a real lighting setup produces images that inherit the visual signature of actual photography โ€” bokeh, grain structure, natural color science.

We also added explicit generation parameters: 28 inference steps with a guidance scale of 3.5. More steps means the model refines details longer. The guidance scale balances between creativity and prompt adherence. These numbers came from testing โ€” enough refinement for photorealism without wasting compute on diminishing returns.

Change 2: Gradient Overlays Instead of Background Panels

The old text overlay was functional but crude. A semi-transparent dark rectangle (rgba(0,0,0,0.65)) with rounded corners, positioned by analyzing which region of the image had the lowest visual complexity. Arial Bold. Heavy stroke outline for contrast.

It looked like what it was โ€” text placed on top of an image by an algorithm.

The new approach borrows from how professional social media designers actually work. Instead of a rectangular panel, we apply a smooth gradient that fades from transparent at the middle of the image to a subtle dark wash at the bottom. The gradient spans the full width of the image, so it feels like part of the lighting rather than something overlaid.

The text itself got a modern treatment โ€” uppercase lettering with generous letter spacing, a subtle drop shadow instead of a heavy stroke outline, and consistent bottom-left positioning. Professional Instagram content almost always places text at the bottom left. The old "auto-analyze for the calmest region" approach was technically clever but visually inconsistent โ€” text jumping between top, bottom, and center across a content batch looked random, not designed.

The Cost Math

Here's what the model switch actually costs:

Component Before (Flux Dev) After (Flux Realism) Delta
Per image $0.025 $0.035 +$0.01
6-post batch (6 images) $0.150 $0.210 +$0.06
Batch with carousels (12 images) $0.300 $0.420 +$0.12
Full onboarding (AI + images) ~$0.38 ~$0.50 +$0.12

That's a 40% increase on image generation cost per image. Sounds steep as a percentage. In absolute terms, it's one cent.

The text overlay changes are free โ€” they happen locally using Sharp image processing. No API calls, no model inference. Better output at zero marginal cost.

Even after the upgrade, a full business onboarding โ€” brand analysis, influencer matching, content calendar, six posts with images, quality scoring โ€” costs roughly fifty cents. Still well under a dollar. Still viable at the scale we're targeting.

Why One Cent Matters

The argument for photorealism isn't aesthetic vanity. It's engagement economics.

When AI-generated content looks obviously fake, it gets scrolled past. The warung owner in Surabaya isn't going to post something that makes their feed look cheap. The boutique in Jakarta isn't going to share content that undermines the premium positioning they've built.

Photorealistic output crosses the threshold from "AI content" to "content." That distinction determines whether the generated posts actually get used โ€” and whether the businesses we serve see real engagement from them.

One cent per image to cross that threshold is the cheapest upgrade we've made.

What Changed, Concretely

For the technically curious, here's the full scope:

Model: fal-ai/flux/dev โ†’ fal-ai/flux-realism

Generation parameters: Added num_inference_steps: 28, guidance_scale: 3.5

Prompt engineering: Every prompt builder now includes camera/lens/lighting anchors ("Canon EOS R5, 85mm f/1.4, natural window lighting, shallow depth of field")

Text overlay: Replaced rgba(0,0,0,0.65) rounded rectangle with full-width gradient fade. Replaced Arial Bold + heavy stroke with modern sans-serif, uppercase, drop shadow. Fixed positioning to bottom-left instead of random auto-placement.

What didn't change: The pipeline architecture, the API contracts, the content generation logic, and the total cost staying under a dollar. The upgrade was surgical โ€” better output from the same system, not a different system.