How We Made Our Content Pipeline More Reliable: Quality Gates, Smart Retries, and Better Diagnostics

Why Pipeline Reliability Matters

When you're generating AI content at scale — captions, images, and delivering them via WhatsApp — every silent failure is a missed opportunity. A caption that sounds generic, an image generation strategy that fails without logging which approach worked, or a WhatsApp message that disappears into the void — these add up.

This week, we shipped a focused set of improvements targeting the reliability and observability of our content pipeline.

What We Improved

1. Caption Quality: Catching Generic Language Before It Ships

AI language models sometimes fall back to clichéd promotional phrases — "game changer", "must-have", "you need this". These phrases feel inauthentic and undermine the influencer voice we work so hard to match.

We added a zero-cost deterministic anti-pattern check that scans every generated caption for forbidden phrases before they leave the pipeline. When a generic phrase is detected, it's flagged with a warning including the exact phrase and a caption preview. This gives our quality gate an early signal to trigger a retry.

We also improved malformed response diagnostics. When a batch caption generation returns incomplete data, we now log exactly which fields are missing per post (e.g., "index 3: missing variantA.caption and caption") instead of a generic "malformed" warning. This makes debugging significantly faster.

2. Quality Gate: Configurable Thresholds

Our quality gate scores every post on caption quality (70% weight) and visual quality (30% weight), then auto-retries posts that score below threshold.

Previously, the retry limits and score thresholds were buried as magic numbers in the code. We extracted them into named constants:

Caption retry threshold: 55/100 (posts below this get caption rewrites)
Image retry threshold: 50/100 (posts below this get image regeneration)
Review flag threshold: 40/100 (posts still low after retry get flagged for human review)
Max caption retries: Increased from 2 to 3 posts per batch

This makes it trivial to tune quality standards per business tier in the future — a luxury brand can set a higher bar than an SME.

3. Image Generation: Strategy Observability

Our image orchestrator uses a multi-strategy fallback chain. For product posts: hybrid composite, IP-Adapter, then generic. For UGC posts: multi-IP-Adapter, PuLID face, then generic.

Previously, we only logged when a strategy failed. Now we log when each strategy succeeds, making it possible to answer questions like:

What percentage of product posts use the hybrid strategy vs. falling back to generic?
Is PuLID face reliability improving or degrading over time?

We also extracted the realism score threshold (minimum score to accept without retry) into a named constant, making it easy to adjust as our image models improve.

4. WhatsApp Delivery: Hardened Against Network Failures

Our WhatsApp delivery via Fonnte previously had no fetch timeout — a slow API response could hang indefinitely. We added:

30-second fetch timeout via AbortController — no more indefinite hangs
HTTP status validation — we now check res.ok before attempting JSON parse, catching 5xx errors that would otherwise produce cryptic parse failures
Structured response — callers now receive the HTTP status code alongside the API response, enabling upstream retry logic

The Numbers

Pipeline Stage	Before	After	Improvement
Caption Generation	4.0/5	4.2/5	Anti-patterns + diagnostics
Image Generation	3.5/5	3.7/5	Strategy logging + constants
Quality Scoring	3.5/5	3.8/5	Configurable thresholds
Content Delivery	2.5/5	3.0/5	Timeout + HTTP checks
Overall	3.5/5	3.7/5

What's Next

These improvements lay the groundwork for our next cycle of enhancements:

Adaptive quality thresholds per business tier
A/B variant tracking to learn which caption approach works best per influencer
Influencer data refresh to keep matching scores current
Integration tests for the core pipeline stages

Every critique cycle moves us closer to a pipeline that doesn't just generate content — but learns what makes content great.