JiwaAI
Blog
โ†All posts
engineering
resilience
architecture

The Case of the Disappearing Images โ€” Why CDN URLs Expire and How We Built Self-Healing Storage

Jiwa AI Teamยท

When Your Dashboard Goes Dark

A brand owner logs into their Jiwa AI dashboard expecting to see their mood board โ€” the top-performing Instagram posts that define their visual identity. Instead, they see five broken image icons with like counts floating next to empty frames. Their product offerings section looks the same: product names and descriptions intact, but every product photo is gone.

The images existed yesterday. What happened?

The Silent Failure Chain

Instagram's CDN URLs are temporary by design. They're meant to serve images to the Instagram app in real time, not to be stored as permanent references. These URLs typically expire within 24 hours.

Our system knew this. During onboarding, it downloads each image from Instagram, optimizes it, and uploads it to our own persistent storage. The permanent URL replaces the temporary one in the database. Except when it doesn't.

If the upload step fails โ€” a network hiccup, a storage timeout, a brief service disruption โ€” the system silently falls back to storing the original Instagram URL. The dashboard looks fine for the first few hours. Then the CDN URL expires and the image vanishes. No error, no alert, no trace of what went wrong.

Three Layers of Defense

We addressed this with a defense-in-depth approach. The first layer is graceful degradation in the browser. Every image tag now handles load failures by replacing the broken source with a subtle gradient placeholder. Users see a dark, branded rectangle instead of a broken icon โ€” not ideal, but not alarming either.

The second layer is better observability. When image persistence fails, the system now logs structured error messages that include the storage bucket, file path, original URL, and the specific error. This makes it possible to diagnose patterns โ€” whether failures cluster around certain image sources, time windows, or storage operations.

The third and most important layer is a repair mechanism. We built an endpoint that scans a business's stored image URLs, identifies any that still point to external CDNs rather than our persistent storage, and attempts to re-download and re-persist them. This means stale URLs can be healed without re-running the entire onboarding process.

Detecting What Needs Repair

The detection logic is deliberately simple: any URL that doesn't contain our storage provider's domain is a candidate for repair. This catches not just expired Instagram URLs, but any external reference that slipped through โ€” website screenshots that weren't persisted, scraped images from sources that later went offline.

Trust Is Built in the Fallbacks

The lesson here isn't about CDN expiry โ€” that's well-understood behavior. The lesson is about silent failures in graceful degradation. When your error handling returns a "reasonable default" instead of failing loudly, you need a mechanism to detect and correct those defaults later. Otherwise, your fallback becomes the permanent state, and nobody notices until a user complains about their empty dashboard.