JiwaAI
Blog
โ†All posts
ai
image-generation
computer-vision
engineering

When AI Invents Products That Don't Exist

Jiwa AI Teamยท

The Protein Bar Problem

Early in development, we asked our image generation pipeline to create a social media post featuring an influencer holding a protein bar. The AI delivered a beautiful image โ€” perfect lighting, natural pose, great composition. There was just one problem: the protein bar had the wrong colors, an invented brand name, and packaging that didn't exist.

This is the hallucination problem. AI image generators are trained on millions of images and have learned what products generally look like. But they have no concept of what your specific product looks like. Ask for a "protein bar on a gym bench" and you'll get something that resembles a protein bar in the way a dream resembles reality โ€” close enough to recognize, wrong enough to be useless.

Why Prompting Can't Fix This

Our first instinct was to throw more detail at the prompt. We described packaging colors, logo placement, text on the label. It didn't help. Current generation models treat text prompts as suggestions, not specifications. They'll get the general vibe right while inventing every specific detail.

We tried style transfer and image-to-image approaches too. These got closer but introduced their own artifacts โ€” blurry logos, shifted color palettes, distorted proportions. The fundamental issue is that these models are generative. They create new pixels rather than preserving existing ones. And for product marketing, you need the real pixels.

The Composite Insight

The breakthrough came from reframing the problem. Instead of asking AI to generate the product, we asked it to generate everything except the product.

The approach is conceptually simple: generate a beautiful, contextually appropriate background scene with no product in it, then layer the real product photo on top. The AI handles what it's good at โ€” creating atmospheric, well-lit environments. The real product photo handles what AI can't โ€” accurate colors, correct logos, precise packaging details.

Of course, the execution is more involved than it sounds. You can't just paste a rectangular product photo onto a generated background and call it done. The product needs to be cleanly separated from its original background, properly scaled, positioned naturally within the scene, and given realistic shadows so it looks like it belongs there.

Making It Seamless

We built a multi-step pipeline that handles this automatically. First, a background removal model isolates the product from its original photo, producing a clean cutout with transparent edges. Then, a separate AI call generates the scene โ€” a kitchen counter, a gym bench, a cafe table โ€” with no product mentioned in the prompt.

The compositing step layers the product cutout onto the generated scene at a natural position and scale. We add a procedural drop shadow โ€” a blurred, semi-transparent copy of the product silhouette offset slightly โ€” to ground it in the scene. The result looks like the product was photographed in that environment.

The cutouts are cached, so the same product photo is only processed once regardless of how many posts feature it. This keeps costs down and speeds up subsequent generations.

Why Not Just Use Product Photography?

A reasonable question: if you want accurate product images, why not just use the original photos? The answer is context. A product photo on a white background doesn't make compelling social media content. You need the product in a lifestyle setting โ€” on a yoga mat, next to a cup of coffee, in someone's hand at a sports club.

Traditional product photography for lifestyle shots is exactly what makes influencer marketing expensive. You need a photographer, a location, props, and often a model. Our composite approach gives you unlimited lifestyle contexts from a single product photo.

The Fallback Philosophy

Not every product photo works perfectly with this approach. Some images have complex backgrounds that don't separate cleanly. Some products have shapes that look unnatural when composited. Rather than failing, the pipeline falls back gracefully โ€” first to an AI-guided style transfer approach, then to pure generation as a last resort.

A slightly hallucinated product image is better than no image at all. But in practice, the composite approach succeeds for the vast majority of standard product photography, which is exactly what most e-commerce businesses already have on their websites.

The Bigger Lesson

The product hallucination problem taught us something important about working with AI: the best results often come from constraining what the AI does, not from asking it to do more. By splitting the task into "things AI is great at" and "things that need to be exact," we got better results than any single-model approach could deliver.

This principle โ€” use AI for creativity, use deterministic tools for precision โ€” runs through our entire architecture. It's less glamorous than end-to-end AI generation, but it produces content that businesses can actually use.