JiwaAI
Blog
โ†All posts
image-generation
architecture
quality-control
cost-optimization

Different Content, Different Strategy โ€” How We Generate Images by Post Type

Jiwa AI Teamยท

One Pipeline, Three Problems

A product post needs the actual product to look exactly right โ€” real colors, real shape, real packaging. A UGC post needs the influencer's face to look like the influencer. A carousel cover needs neither โ€” just an attention-grabbing hook.

These are fundamentally different image generation problems. Treating them the same way produces mediocre results across the board. Our composite image orchestrator routes each content type through a different strategy cascade, picking the highest-fidelity approach first and falling back to cheaper alternatives when it fails.

Here's exactly what happens for each type.

Product Posts: Fidelity Is Everything

When someone features a product on Instagram, the product has to look real. Not "AI's interpretation of the product" real. Actually real. If the packaging color is off by a shade, the brand notices immediately.

The strategy cascade for product posts:

Strategy 1: Hybrid Composite ($0.030) Generate a photorealistic background scene using Flux Realism, then remove the background from the actual product photo using BiRefNet, and composite the real product cutout on top. The product in the final image is the actual product photo โ€” every detail preserved. The AI only generates the environment around it.

Strategy 2: IP-Adapter Product Generation ($0.030) When the hybrid composite fails โ€” usually because the product cutout doesn't blend naturally with the generated background โ€” we fall back to Flux General with an IP-Adapter. This feeds the product reference image directly into the generation model, which renders an approximation of the product in context. Less faithful than a real cutout, but the product is still recognizable.

Strategy 3: Generic Scene ($0.025) When there's no product reference image at all, we generate a product-category scene without any specific product. A coffee table for a coffee brand. A vanity setup for a beauty brand. The product exists in the caption but not the image.

The cascade stops at the first success. A typical product post costs three cents. The extra half cent for hybrid over generic buys an order of magnitude more product accuracy.

UGC Posts: The Face Has to Match

User-generated content posts are the hardest to get right. The influencer's face needs to be recognizable. The product needs to be visible. The scene needs to feel natural, not staged. And the whole thing has to look like a real photo taken on a real phone.

Strategy 1: Multi-IP-Adapter ($0.030) When we have both the influencer's face reference and a product reference image, we feed both into a single Flux General call using stacked IP-Adapters. Face adapter at 0.8 weight, product adapter at 0.7. One generation, both references influencing the output. This is the most cost-efficient approach when both assets are available โ€” one call instead of two.

Strategy 2: PuLID Face-Only ($0.035) When Strategy 1 fails โ€” usually when the two adapters conflict and produce artifacts โ€” we fall back to PuLID, a model specifically designed for identity-preserving generation. It takes one face reference and generates the person in a new context. More expensive than Flux General by half a cent, but significantly more reliable for face consistency. The product appears in the prompt but not as a visual reference.

Strategy 3: Generic Scene ($0.025) When there's no face reference available โ€” the influencer hasn't uploaded assets, or the matching picked an influencer without photos โ€” we generate a generic lifestyle scene. No face, no product reference. Just a well-composed scene that matches the theme and brand colors.

The face strategies are where quality matters most. PuLID and multi-IP-Adapter both occasionally produce artifacts: slightly distorted features, extra fingers, uncanny expressions. More on how we handle that below.

Carousel Posts: Curiosity First

Carousel covers serve a different purpose. They're not about showing the product or proving the influencer uses it. They're about stopping the scroll. Making someone curious enough to swipe.

Strategy 1: PuLID Influencer Hook ($0.035) Generate the influencer in an intriguing setting related to the theme โ€” no product visible. The product reveal comes in later slides. PuLID gives us face consistency so the influencer is recognizable from their other content.

Strategy 2: Generic Cover ($0.025) When no face reference is available, we generate an eye-catching scene that matches the brand's visual style. Bold colors, strong composition, designed to interrupt the feed scroll.

Carousel covers are the cheapest content type to generate because there's only one generated image. The remaining slides โ€” content and CTA โ€” are text overlays on blurred versions of the base image, processed locally with Sharp. No additional API calls needed.

The Cost Picture

Here's what a typical six-post batch costs for image generation alone:

Post Type Strategy Used Cost
1 Product Hybrid composite $0.030
2 Product Hybrid composite $0.030
3 UGC Multi-IP-Adapter $0.030
4 UGC PuLID face-only $0.035
5 Carousel PuLID hook $0.035
6 Carousel PuLID hook $0.035
Total $0.195

Add quality scoring ($0.012), text overlay analysis ($0.004), and caption generation ($0.010), and image generation still accounts for roughly eighty percent of the per-batch cost. This is why strategy selection matters โ€” the wrong model choice doesn't just affect quality, it affects economics.

The Retry Problem We Fixed

Here's where things were quietly going wrong.

Our quality scoring system evaluates every generated image using Claude Haiku Vision. It checks for product visibility, brand color presence, AI artifacts, and composition. Posts scoring below fifty trigger an image retry.

But until recently, that retry was broken by design.

When a UGC post scored low โ€” say the PuLID face had artifacts, scoring thirty-eight โ€” the retry code threw away the entire strategy context and called generateImageHQ() with a generic prompt. That's Flux Realism with no face reference, no product reference, no IP-Adapter. The retried image might have fewer artifacts, but it also had no resemblance to the influencer. A different person in a generic scene, replacing a flawed but recognizable face.

The fix was straightforward. Every strategy now records what it did โ€” which model, which prompt, which parameters. When a retry triggers, it replays the same strategy with enhanced negative prompts based on what the scorer flagged.

If the scorer found face distortion, the retry appends explicit negative prompts: "no distorted features, natural facial proportions, correct finger count." If the product wasn't visible, it strengthens the product emphasis in the prompt. Same strategy, better guidance.

This costs nothing extra. Same models, same number of retries. Just smarter prompts the second time around.

Scoring Every Image, Not a Sample

The other quality gap was statistical. To save on Haiku Vision costs, we only scored three images out of every six-post batch โ€” first, middle, and last. The other three got an interpolated average score. If the first and last images scored eighty, the unscored middle images were assumed to also score eighty.

This meant an image with a badly distorted face could pass quality checks entirely if it happened to be at index two or four in the batch. The interpolated score would show seventy-five. No retry triggered. The bad image shipped.

We now score every image individually. Six Haiku Vision calls instead of three. The additional cost is six-tenths of a cent per batch โ€” three images at two-tenths of a cent each. For that price, every image gets its own quality assessment, its own artifact check, its own retry decision.

What This Means for Output Quality

Three changes, compounding:

Scoring all images catches defects that sampling missed. Our internal testing showed that roughly one in five batches had an unscored image that would have triggered a retry if it had been checked.

Smart retry means that when a retry does trigger, it produces a meaningfully better image instead of a generically different one. Face consistency is preserved. Product references are maintained. Brand colors stay in the prompt.

Higher thresholds (forty to fifty for images, thirty to forty for review flagging) mean we catch more borderline content before it reaches the business owner's WhatsApp preview.

The combined cost increase is roughly one and a half cents per batch. The quality improvement is most visible on UGC posts with faces โ€” the content type where businesses are most likely to reject and where a bad image does the most brand damage.

The Strategy Selection Matters More Than the Model

It's tempting to think image quality is primarily about which model you use. Upgrade to a better model, get better images. But our data tells a different story.

The biggest quality gap isn't between Flux Realism and Flux Dev. It's between "used the right strategy for this content type" and "used the wrong one." A PuLID face generation with a well-constructed prompt produces better UGC content than any generic model upgrade. A hybrid composite with a real product cutout produces more accurate product posts than the most expensive generation model running without a product reference.

Strategy selection is free. It's just routing logic. The models cost the same whether you pick the right one or the wrong one. The orchestrator's job is to make the right pick every time โ€” and when something goes wrong, to retry with the same strategy that almost worked, not start over with a different one.