Persist Early, Drop Loudly โ How We Eliminated a Whole Class of Blank Images
The Deceptive Safety of Silent Fallbacks
Every image we show in Jiwa AI comes from Instagram โ a brand's own posts, their products, their visual identity. Instagram CDN URLs are temporary: they expire within hours. Our pipeline has always known this, and it tries to download and re-upload every image to our own persistent storage during onboarding.
But what happens when that re-upload fails?
For a long time, the answer was: we stored the original URL anyway. The catch block returned the source CDN URL as a fallback, the pipeline kept moving, and the dashboard looked fine. For a few hours.
Then the URL expired, and a blank rectangle appeared where a product photo used to be.
The Problem with "Better Than Nothing"
Silent fallbacks feel defensive. The reasoning is intuitive: if persistence fails, returning the original URL at least lets the pipeline continue. The image will display briefly. Maybe the failure is transient and the URL will persist on the next run.
The flaw in this reasoning is that the failure isn't transient โ it's structural. A CDN URL that we couldn't download at persistence time is a CDN URL we couldn't download period. Storing it guarantees a future blank. We weren't being defensive; we were deferring a crash to the worst possible moment: when a business owner is showing their dashboard to a team member, or when our AI pipeline tries to use that URL as a visual reference.
What looks like a "soft failure" at persistence time is actually a time-bomb planted in the database.
Moving the Fence Upstream
The deeper fix isn't better error handling at the point of failure โ it's changing when persistence happens. In our pipeline, image analysis and AI generation don't start until well into the process. Historically, Instagram images were fetched in Wave 1 and persisted in Wave 4, meaning three waves of computation ran against temporary URLs.
We moved image persistence to immediately after Wave 1. As soon as Instagram data arrives, all image URLs are persisted concurrently โ with a concurrency cap to prevent stampeding Supabase โ before any downstream logic sees them. If an image can't be persisted, it's dropped from the media set entirely.
This changes the contract: by the time Wave 2 begins, every URL in the system is either a stable storage URL or it doesn't exist. No temporary CDN references travel deeper into the pipeline.
Dropping Instead of Falling Back
The other half of the fix was removing the catch-block fallbacks entirely. When persistence fails โ network error, invalid image, CDN already expired โ the item is now excluded from results rather than returned with its original URL.
This feels more aggressive, but it's actually kinder to the system. A missing moodboard reference image causes the AI to work with slightly fewer visual anchors. A blank moodboard image causes the AI to fail on a 403 error and generate nothing. Fewer references produce worse output; expired references produce no output at all.
The tradeoff is that some moodboard images may disappear if their source CDN is already stale when the pipeline runs. In practice, this is rare โ Instagram CDN URLs are valid for hours after fetch, and Wave 1 runs at the start of onboarding. The risk of a missing reference slot is far lower than the risk of a stored time-bomb URL.
What Changes for the User
The visible effect of this change is subtle: onboarding now persists images concurrently right at the start, which adds a brief parallel upload phase after Instagram data arrives. The total time impact is small โ uploads are I/O-bound and run in parallel โ and the payoff is that every image displayed in the dashboard, used in AI generation, or stored in the database is guaranteed to be a permanent URL.
No more blank rectangles after the first login. No more Gemini 422 errors caused by expired CDN references. No more repair-endpoint runs to retroactively fix URLs that should have been rejected at write time.
The Architectural Principle
Silent fallbacks are appropriate when the fallback value is genuinely safe to use indefinitely. A default color, a placeholder text string, an empty array โ these are safe fallbacks. A temporary URL that expires in hours is not a fallback; it's a delayed failure with better UX.
When your fallback has an expiry, treat it as a failure at write time, not a temporary inconvenience. Fail loudly, drop explicitly, and move on with fewer but trustworthy inputs. The pipeline's resilience comes from handling missing data gracefully โ not from pretending that ticking time-bombs are stable references.
If you're building pipelines that depend on external media, the rule is simple: if you can't persist it now, don't store the original URL. Log the failure, skip the item, and let the downstream logic handle a smaller but reliable dataset.