The Silent Problem: When Product Images Reach AI as Empty Slots

When Nothing Looks Like Something

There is a class of bug that is especially hard to detect: the one that produces output without producing the right output. A missing image is obvious. An image generated without a product reference — showing a generic lifestyle scene instead of the actual product — is much harder to notice at a glance.

This is the blank URL problem. During onboarding, we scrape a business's website and attempt to persist product images to permanent storage. When that persistence fails — a CDN link has expired, the upload times out, the source returns an error — the original URL is silently discarded. The product record is created with zero images. At generation time, the AI model receives no product reference at all, and produces something that looks plausible but is entirely disconnected from the real product.

The Full Failure Chain

To understand why this happens, it helps to walk through what should happen versus what was happening.

Ideally: scraper finds product images → we upload them to permanent storage → generation uses those permanent URLs as reference slots → AI produces an accurate product image.

In practice: scraper finds images from a time-limited CDN → upload to storage fails → URL is discarded → product has no images in the database → generation runs without reference → result is generic.

The silent part is the key failure. Every step after the upload failure looks normal from the outside. The product was created, the post was generated, an image was returned. Nothing crashed. But the output was disconnected from the product it was supposed to represent.

Storing Something Is Better Than Storing Nothing

The first fix addresses the persistence failure itself. When we cannot upload a product image to permanent storage, we now store the original source URL as a fallback. This feels counterintuitive — why store a URL that might not work? — but it unlocks the right behavior downstream.

With the original URL stored, the generation pipeline can attempt to use it. And critically, the system already has a reachability check in place: before passing any URL to the AI model, we HEAD-check it to verify it responds. If the original CDN URL has since expired, the check catches it and the URL is dropped cleanly. If it is still live — which it often is, since generation can happen within hours of onboarding — it becomes a valid product reference.

The net result: more products arrive at generation with at least one image to work from, without any risk of expired URLs reaching the AI model.

Catching Expired URLs Before They Reach AI

The second fix extends this reachability checking to the full set of reference images used in generation: product images, moodboard references, and style inspirations. Before building the reference slot array that gets sent to the AI model, we run parallel HEAD checks against every external URL.

This check is designed to be fast. All requests fire concurrently, so the total latency is bounded by the slowest single response — not the sum of all responses. We also apply one important optimization: URLs we have already verified by uploading to our own storage infrastructure are automatically trusted and skipped. There is no point making a network round-trip to confirm that a file we uploaded five minutes ago still exists.

The result is that the reference slot array passed to the AI model contains only URLs that are known to be live and reachable at generation time. Empty strings, expired CDN links, and timed-out endpoints are all excluded before any AI call is made.

Why Reference Slot Hygiene Matters

AI image generation models that accept multiple reference images treat each slot as a conditioning signal. An empty slot is not a no-op — it is a malformed input. Depending on the model, it may produce degraded results, ignore the slot silently, or behave unpredictably in ways that vary by generation.

Filtering aggressively before the model call means we are always sending a well-formed, minimal set of references: only the images that exist, only in the categories that make sense for the post type, only up to the slot count the model supports. This makes generations more predictable and easier to reason about when results are unexpected.

Giving Users a Path Forward

The third change is a product one. When onboarding completes and a business's products have no images — because scraping found none, or all persistence attempts failed — we now surface an immediate upload prompt rather than silently continuing to generation.

Users see a list of their products with no images and a simple upload interface for each. This happens before they are redirected to their business dashboard, giving them the opportunity to add product photos at the moment when it is most relevant. They can also skip this step and add photos later from the business management page.

This closes the loop on the full failure chain. Even when our automated image sourcing comes up empty, there is a clear, low-friction path for the user to provide what the AI needs to do its job well.

Defensive Systems, Better Outputs

The underlying theme across all three changes is defensive hygiene: checking assumptions, storing fallbacks, and filtering bad inputs before they reach components that cannot easily recover from them. AI generation models are powerful but not forgiving of malformed inputs — they will produce something regardless, and that something may look deceptively reasonable while being completely wrong.

Building the defensive layer between data collection and model invocation is what separates pipelines that occasionally produce irrelevant output from pipelines that reliably produce accurate ones.