How We Made Our Content Pipeline More Reliable: Quality Gates, Smart Retries, and Better Diagnostics
Why Pipeline Reliability Matters
When you're generating AI content at scale โ captions, images, and delivering them via WhatsApp โ every silent failure is a missed opportunity. A caption that sounds generic, an image generation strategy that fails without logging which approach worked, or a WhatsApp message that disappears into the void โ these add up.
This week, we shipped a focused set of improvements targeting the reliability and observability of our content pipeline.
What We Improved
1. Caption Quality: Catching Generic Language Before It Ships
AI language models sometimes fall back to clichรฉd promotional phrases โ "game changer", "must-have", "you need this". These phrases feel inauthentic and undermine the influencer voice we work so hard to match.
We added a zero-cost deterministic anti-pattern check that scans every generated caption for forbidden phrases before they leave the pipeline. When a generic phrase is detected, it's flagged with a warning including the exact phrase and a caption preview. This gives our quality gate an early signal to trigger a retry.
We also improved malformed response diagnostics. When a batch caption generation returns incomplete data, we now log exactly which fields are missing per post (e.g., "index 3: missing variantA.caption and caption") instead of a generic "malformed" warning. This makes debugging significantly faster.
2. Quality Gate: Configurable Thresholds
Our quality gate scores every post on caption quality (70% weight) and visual quality (30% weight), then auto-retries posts that score below threshold.
Previously, the retry limits and score thresholds were buried as magic numbers in the code. We extracted them into named constants:
- Caption retry threshold: 55/100 (posts below this get caption rewrites)
- Image retry threshold: 50/100 (posts below this get image regeneration)
- Review flag threshold: 40/100 (posts still low after retry get flagged for human review)
- Max caption retries: Increased from 2 to 3 posts per batch
This makes it trivial to tune quality standards per business tier in the future โ a luxury brand can set a higher bar than an SME.
3. Image Generation: Strategy Observability
Our image orchestrator uses a multi-strategy fallback chain. For product posts: hybrid composite, IP-Adapter, then generic. For UGC posts: multi-IP-Adapter, PuLID face, then generic.
Previously, we only logged when a strategy failed. Now we log when each strategy succeeds, making it possible to answer questions like:
- What percentage of product posts use the hybrid strategy vs. falling back to generic?
- Is PuLID face reliability improving or degrading over time?
We also extracted the realism score threshold (minimum score to accept without retry) into a named constant, making it easy to adjust as our image models improve.
4. WhatsApp Delivery: Hardened Against Network Failures
Our WhatsApp delivery via Fonnte previously had no fetch timeout โ a slow API response could hang indefinitely. We added:
- 30-second fetch timeout via AbortController โ no more indefinite hangs
- HTTP status validation โ we now check
res.okbefore attempting JSON parse, catching 5xx errors that would otherwise produce cryptic parse failures - Structured response โ callers now receive the HTTP status code alongside the API response, enabling upstream retry logic
The Numbers
| Pipeline Stage | Before | After | Improvement |
|---|---|---|---|
| Caption Generation | 4.0/5 | 4.2/5 | Anti-patterns + diagnostics |
| Image Generation | 3.5/5 | 3.7/5 | Strategy logging + constants |
| Quality Scoring | 3.5/5 | 3.8/5 | Configurable thresholds |
| Content Delivery | 2.5/5 | 3.0/5 | Timeout + HTTP checks |
| Overall | 3.5/5 | 3.7/5 |
What's Next
These improvements lay the groundwork for our next cycle of enhancements:
- Adaptive quality thresholds per business tier
- A/B variant tracking to learn which caption approach works best per influencer
- Influencer data refresh to keep matching scores current
- Integration tests for the core pipeline stages
Every critique cycle moves us closer to a pipeline that doesn't just generate content โ but learns what makes content great.