JiwaAI
Blog
โ†All posts
engineering
image-generation
content-generation
ai
cost-optimization

We Deleted Our AI Scene Planner and Got More Visual Variety

Jiwa AI Teamยท

The Scene Planner That Wasn't Worth Its Weight

Every time a business generated a batch of posts, we ran a pre-generation step that asked Claude to plan unique scenes for each one. The intent was good: give the image model detailed scene specifications โ€” camera angle, lighting direction, background setting, influencer pose โ€” so the final images would look intentionally varied and not like the same shot repeated six times.

In practice, the step was expensive in two ways. It added several seconds of latency before the actual image generation could start, because the scene plans had to be complete before any parallel work could begin. And despite the planning, the outputs still had a tendency to cluster. Ask an AI to generate six "diverse" scene descriptions for a skincare brand and it will reliably produce three variations on "natural light bathroom" and two on "outdoor golden hour," no matter how many instructions you add about variety.

We eventually asked a harder question: what if the diversity we wanted could be achieved without an AI planning step at all?

Composition as a Deterministic Signal

The insight that unlocked a different approach came from thinking about what "visual variety" actually means at the composition level. Two photos can have identical subjects โ€” same influencer, same product, same brand colors โ€” and still feel completely different if one is shot with stark minimalist framing and the other has a warm scrapbook-collage aesthetic. The content is the same. The composition makes them feel like different posts.

We built a library of eight distinct composition formats, each inspired by a recognizable visual style: the warm nostalgia of a Polaroid collage, the cinematic tension of a film strip, the editorial confidence of a bold hero shot, the organic layering of a scrapbook, and four others. Each format comes with a specific set of composition instructions that gets injected into the image generation prompt โ€” not as a vague style label, but as concrete visual directions about framing, light quality, and depth treatment.

The assignment logic is deliberately simple. We map each content angle โ€” storytelling, educational, lifestyle, trending, promotional โ€” to the formats that suit it best. Brand personality traits add another layer: a luxury brand gets steered toward minimalist and editorial formats, a playful brand toward collage and scrapbook styles. And a rotation rule prevents the same format from appearing in consecutive posts.

The entire assignment runs in under a millisecond. No network call. No token cost. Completely deterministic given the same inputs, which means it is easy to test and reason about.

Why Deterministic Beats Generative for Structural Variety

This might seem counterintuitive. AI models are supposed to be better at generating varied, contextually appropriate outputs than rule-based systems. For many tasks that is true. But composition format assignment has a property that makes rules genuinely superior: the space of good answers is finite and enumerable.

There are not infinite valid ways to compose a lifestyle post for a beauty brand. There are a handful of formats that work well and a handful that do not. A generative model does not have better taste than a thoughtfully designed mapping โ€” it just has more words to use when describing the same underlying set of choices. And unlike a generative output, the mapping is transparent. When a post uses a polaroid-collage composition, we know exactly why and we can adjust the logic if the results are not what we want.

The AI scene planner, by contrast, was a black box with latency. We were paying in time and cost for the illusion of unlimited creative possibility when the actual useful output space was narrow and well-defined.

Trend Intelligence Fills the Gap

Removing the scene planner meant we needed to ensure the content itself carried enough contextual signal to drive meaningful image prompts. The answer came from a parallel change: replacing our hallucinated trending topics with real web search signal.

Previously, the system asked Claude to generate "currently trending topics" from memory โ€” which, of course, produced evergreen ideas that were months or years old, dressed up as current events. A trending topic generator that cannot search the web is not a trend generator at all. It is a topic suggestion engine with a misleading name.

We rewired the trending topics refresh to use live web search, pulling actual viral content and real discussion threads from the past seven days. The result is that the content angle for each post now reflects genuine current context โ€” what the brand's audience is actually talking about this week, not what an AI thinks they might generally talk about.

This context flows into the composition assignment. A post about a genuinely trending topic gets the editorial-hero or split-panel treatment that suits high-energy, timely content. A lifestyle post tied to a real seasonal moment gets the warm organic composition that makes it feel grounded rather than generic.

The Numbers That Justified the Change

The composition format injection adds essentially nothing to the cost of image generation. It is a string appended to a prompt that was already being sent. The latency contribution is immeasurable.

Removing the AI scene planning step eliminated several seconds of sequential work that previously blocked the entire parallel generation batch. That time is now back in the user's pocket on every content generation run.

The trend intelligence refresh runs on a stale-while-revalidate pattern: if the cached topics are fresh, the caller gets them instantly with no API call at all. If they are stale, the caller gets the cached data immediately while a background refresh updates the store for next time. Only the very first run for a new business pays the cost of waiting for real search results โ€” and for that first run, the wait is worth it.

The irony of the whole arc is that deleting an AI-powered feature and replacing it with deterministic logic produced better outcomes across every dimension we measure: speed, cost, and the visual consistency that makes a brand's content calendar actually look like a calendar rather than a random assortment of images.

Looking Ahead

The composition format library is a starting point, not a ceiling. We can tune the angle-to-format mappings based on what actually performs on Instagram. We can add new formats as visual trends evolve. And because the assignment logic is a few readable lines rather than a model call, tuning it requires judgment and data โ€” not prompt engineering and token budgets.

Sometimes the right engineering move is to put the AI where it genuinely adds value and use clear, fast rules everywhere else. We are still learning where that line is, and this change moved it in a useful direction.