Why We Stopped Trusting External Image URLs in Our AI Pipeline
The Silent Failure Nobody Was Watching
For weeks, our image generation pipeline was producing lower-quality output for certain brands โ posts that looked fine but were missing the brand's actual products. The AI had done its job. The images were photorealistic. But the product that was supposed to be featured had quietly disappeared.
The culprit wasn't prompt engineering or model quality. It was a URL that returned a 404 at the wrong moment.
Reference Images Are the Heart of Multi-Model Generation
Our image pipeline works by feeding multiple reference images to a generative AI model โ the influencer's face, product photography, and mood-board inspiration all loaded as visual conditioning inputs. The model uses these to anchor identity, reproduce exact packaging, and match the brand's aesthetic.
When a reference image is missing, the model doesn't crash. It improvises. It generates something plausible, draws on training data, and produces an image that looks professional โ just without the specific product the brand paid to feature. To a casual glance at the output, nothing looks wrong.
This is what makes the failure insidious. A timeout or a stale CDN URL doesn't raise an exception that anyone reviews. It silently degrades quality.
The Old Approach: Check First, Persist Second
Our original design ran a reachability check โ a fast HTTP HEAD request โ on every external image URL before persisting anything to stable storage. The logic seemed sound: why waste time downloading and re-uploading images that are already unreachable?
The problem is that HEAD requests and GET requests are not atomic. A URL can return a successful response to a HEAD request and then fail the actual download milliseconds later. CDN edge caches have inconsistent behavior. Scraped product images from brand websites have short-lived signed URLs. Indonesian e-commerce platforms rotate image hosting frequently.
We were checking reachability at the wrong stage and calling the problem solved.
Persist First, Ask Questions Never
The shift we made was conceptually simple: remove the reachability check entirely and go straight to persistence. Every reference image โ product photos, influencer assets, mood-board references โ gets downloaded and uploaded to our own storage before a single token is sent to the AI models.
If the download fails, the image is dropped from the reference slots. If it succeeds, the AI model receives a stable, permanent URL that will still be valid in ten minutes when the generation completes. The uncertainty window collapses to zero.
We also extended this principle earlier in the pipeline. Rather than waiting until the AI call to discover that a product URL had expired, we now resolve and persist all product images at the very start of batch processing โ before any captions are written, before any generation jobs are queued. By the time the generation model runs, every URL it receives has already been verified through the only test that actually matters: a successful download.
Trade-offs We Accepted
This approach is strictly slower and uses more storage than the previous design. Downloading and re-uploading images takes time and costs bandwidth. For images that were already in stable storage โ influencer assets we manage directly โ the extra step is a no-op. But for scraped product images, there's real work happening on every generation run.
We decided the trade-off was worth it for three reasons.
First, the cost of a silent failure is higher than the cost of extra bandwidth. A post that generates without its featured product damages the brand's trust in the platform โ and that's not recoverable with an engineering explanation.
Second, the deduplication logic means repeat runs don't re-download the same image twice. Stable references are detected and returned immediately without touching the network.
Third, the failure mode is now explicit. If an image genuinely cannot be fetched, it's logged and dropped from the reference slots with a clear warning. The generation still runs with whatever references are available, but the system records what was missing โ rather than silently proceeding as if nothing happened.
What Changed in Practice
The practical result is that our AI models now receive only stable, verified URLs. External CDN links, signed S3 URLs, and e-commerce image hosts no longer have any path into the generation pipeline. Every image reference lives in infrastructure we control.
For brands with complex product catalogs scraped from their websites, this was the difference between consistent product representation and unpredictable outputs. For the generation pipeline itself, it removed an entire category of non-deterministic behavior that had been nearly impossible to debug from logs alone.
Building for the Failure Case
The broader lesson is one we keep relearning: in AI pipelines, the inputs matter as much as the models. A generation model operating on degraded inputs will produce degraded outputs โ and it will do so quietly, without complaint, because it's doing exactly what it was asked to do.
Defensive input handling isn't glamorous engineering. But in a system where quality failures are invisible until a brand notices their product missing from a post, it's the most important work we do.
If you're building AI generation pipelines that depend on external media, make your storage layer the first step โ not an afterthought.