JiwaAI
Blog
โ†All posts
ai
image-generation
architecture
content-generation
computer-vision
engineering

Smarter Quality Gates: Cohesion-Aware Regeneration and Visual Product Identification

Jiwa AI Teamยท

When the Quality Gate Knows Something Is Wrong But Won't Say What

Content quality systems are only useful if they tell you something you can act on. A quality score of 65 is information. A flagged post with no explanation isn't โ€” it's noise.

We discovered this the hard way when reviewing how our batch cohesion check worked in practice. The logic was sound: after generating a full set of posts for a brand's content calendar, we'd score the images as a group. If the batch felt visually incoherent โ€” mismatched lighting, inconsistent color palettes, different visual treatments that wouldn't feel like a single campaign โ€” every post would be flagged for human review. The reasoning is correct. A campaign that looks like a collection of random stock photos undermines brand identity even if each individual post is technically good.

The problem: we'd flag everything and say nothing. The cohesion reason existed internally but never made it into the feedback visible to reviewers. Business owners would see a wall of flagged posts with quality scores in the 70s and no explanation. Was the lighting wrong? Were the colors inconsistent? Was the influencer styled differently across images? The system knew. It just didn't say.

Cohesion Failure as a Diagnostic, Not Just a Flag

The first change was simple but important: surface the cohesion reasoning into the quality feedback for every affected post. When a batch fails cohesion, every post's quality reason now includes a plain-language explanation โ€” something like "Batch cohesion low (2/5): images use inconsistent lighting styles across posts." That sentence now appears in the dashboard, in the review UI, and anywhere else quality reasons surface.

The second change was more substantial: before declaring cohesion defeat and blanket-flagging everything, the system now attempts targeted regeneration of the two worst visual outliers. The intuition is that cohesion failures are rarely uniform โ€” there's usually one or two images pulling the batch down. A single post shot in a harsh studio environment surrounded by soft golden-hour lifestyle shots will torpedo the cohesion score for the whole calendar. Regenerating just those outliers, with an explicit instruction to match the campaign's visual language, can restore cohesion without touching anything that was working.

After targeted regen, the batch is re-evaluated. If cohesion is restored, the review flags are dropped. If it's still broken after the best two outliers are replaced, the flags stand โ€” but now with a clear reason attached to every post.

Why This Ordering Matters

The decision to attempt regen before flagging rather than after is deliberate. If we flagged first and let humans review before trying to fix, we'd be asking people to approve posts that the system could have resolved automatically. That adds friction without value.

The reverse risk is that we over-generate โ€” regenerating images that were fine individually, spending API budget on a problem that didn't need solving. The two-outlier cap keeps this bounded. We're not regenerating the whole calendar, just the posts most likely to be causing the visual dissonance, and only once.

This is a general principle we try to follow throughout the pipeline: attempt automated remediation before escalating to human review, but be explicit about what you tried and what happened. A reviewer who sees "Cohesion restored to 4/5 after targeted regen" has context. A reviewer who just sees a clean green checkmark doesn't know anything was ever wrong.

Seeing Products That Were Never Described

The second improvement addresses a different part of the pipeline โ€” earlier, at brand analysis time.

When a business onboards, Jiwa AI analyzes everything it can find: their website, their product catalog, their Instagram captions and descriptions. The goal is to understand what the business actually sells well enough to generate authentic content about it.

For text-heavy businesses โ€” e-commerce stores with detailed product descriptions, services firms with thorough web copy โ€” this works well. For image-first businesses, it often didn't. A restaurant with beautiful food photography but sparse menu descriptions. A fashion brand whose Instagram communicates everything through visuals but whose website says little beyond "Shop the collection." A handcraft business whose products are easier to recognize than to describe.

In those cases, our brand analysis was essentially working with one eye closed. We could see the captions and descriptions, but not the actual product photos. Claude would make its best guesses about what the products were from limited textual signals, and those guesses would propagate through the entire pipeline โ€” influencing which influencer scenes made sense, which visual contexts to use, which product attributes to highlight.

What Instagram Photos Add to Brand Understanding

The fix is to include the top Instagram post images directly in the Wave 2 brand analysis โ€” the stage where Claude builds the foundational product understanding that everything else downstream depends on.

With real product photos in hand, the analysis changes substantially. A photo of a green smoothie in a branded cup tells Claude more about a cafรฉ's aesthetic, product format, and target occasion than three paragraphs of menu description. A photo of a handmade ceramic bowl on a marble surface communicates texture, finish, scale, and lifestyle context in a single glance.

This matters most for two downstream outputs: product identification accuracy and scene selection. When Claude can see that a business sells distinctive matte-finish ceramics rather than glossy factory pottery, the influencer scenes generated โ€” the kitchen vignettes, the slow-morning aesthetic, the styling choices โ€” align with what the product actually looks like. Without that visual anchor, the pipeline would generate contextually plausible content for a generic version of the product category, which may or may not match the real thing.

We're targeting an improvement from roughly 40% to 80% product identification accuracy for image-heavy, text-sparse businesses. The businesses that benefit most are precisely the ones that communicate best through visuals โ€” which, in the Indonesian small business ecosystem we're built for, is a large portion of the market.

The Broader Principle: Better Inputs, Not More Retries

Both improvements share an underlying philosophy. When something goes wrong late in the pipeline โ€” images that don't cohere, products that don't match the content โ€” it's tempting to fix it with more retries at the end. Add another quality check. Add another regeneration pass. Add more human review.

That works, but it's expensive and slow. The better intervention is usually earlier: give the system more signal at the point where understanding is being built, so the content generated from that understanding is right the first time.

Visual product identification is an input improvement. Cohesion-aware regen is a smarter end-of-pipeline recovery. Together, they move the pipeline in the direction of fewer surprises โ€” content that holds together visually because it was grounded in a richer understanding of the brand from the start.

If you're building AI content systems for businesses where the product is the brand, it's worth asking early: what signals are you collecting, and what signals are you leaving on the table?