JiwaAI
Blog
โ†All posts
architecture
pipeline
onboarding
image-generation
engineering

From URL to Instagram Posts in Under a Minute โ€” The Complete Pipeline

Jiwa AI Teamยท

What the Pipeline Does

A business owner pastes their website URL into Jiwa AI. Under a minute later, they receive six Instagram posts on WhatsApp โ€” each with a photorealistic image, a caption in their brand voice, relevant hashtags, and a scheduled date. The whole thing costs about thirty-five cents.

This post walks through every step of that pipeline: what happens, in what order, what calls what, and how much each piece costs.

Step 1: Scraping and Data Collection

The pipeline starts by gathering raw material about the business from every available source.

Website scraping uses Cheerio to extract the page title, meta description, visible text content (capped at five thousand characters), up to ten images with alt text, and color hints from CSS and meta tags. If the page is a JavaScript SPA with less than fifty characters of content and no images, we fall back to Jina Reader for server-rendered extraction.

For businesses with multiple pages, we scrape up to three subpages โ€” prioritizing paths like /products, /menu, /services, and /about โ€” and merge the results. Content grows to ten thousand characters across pages, and we collect up to fifteen images.

Instagram data comes in parallel if the business has connected their account. We pull their username, bio, follower count, and recent media with captions and engagement metrics. This data feeds into brand analysis and product-to-post matching later.

Image processing downloads up to ten images (six from the website, four from Instagram), resizes each to 512x512 max, converts to JPEG at quality 80, and encodes to base64 for Claude's vision analysis.

Security: Every URL passes through SSRF validation before fetching. Private IPs (10.x, 192.168.x, 127.x, 172.16-31.x), localhost, and .local/.internal domains are all blocked. Only HTTP and HTTPS protocols are allowed. Responses are capped at five megabytes with a ten-second timeout.

Step 2: Brand Analysis

A single Claude Sonnet call analyzes everything we've collected โ€” website text, Instagram data, and product images โ€” and returns a structured brand profile.

The output includes the business name, industry, description, target audience, brand tone, five to ten keywords, and three to five content themes. More importantly, it extracts products โ€” each with a name, description, unique selling angle, content keywords pulled from real Instagram captions, and image indices mapping back to the photos we provided.

Product extraction has a critical guardrail: Claude is instructed to only list products explicitly mentioned in the source material. No inventing generic names. If the website says "Sourdough Cookies" and shows a photo of cookies, that's a product. If it doesn't mention anything specific, the products array comes back empty.

We validate this further by checking that at least fifty percent of each product name's words appear in the source corpus. Products that fail this check get filtered out.

The same call also builds the brand's voice DNA: personality traits (three to five adjectives), voice dos and donts, sample phrases the brand would use, and keywords to avoid. This voice profile constrains every caption generated later.

Cost: ~$0.015 (one Sonnet call with vision)

Step 3: Visual Identity

Two parallel analyses build the brand's visual profile.

Theme analysis extracts four brand colors โ€” primary, secondary, accent, and neutral โ€” from website color hints (CSS hex codes, meta theme-color, background colors) combined with Claude's interpretation of the brand's industry and tone. Each color gets a hex code, a human name, and a role description. The analysis also produces a mood descriptor and a visual style statement.

If the website has no color hints, Claude infers appropriate colors from the industry and brand tone. A bakery gets warm earth tones. A tech startup gets clean blues. Colors are sanitized against a hex regex and fall back to sensible defaults if invalid.

Mood board analysis identifies four to six visual content styles ranked by relevance โ€” things like "Colorful Product Showcase," "Minimalist Flat Lay," or "Playful Skit." It recommends preferred content types (product showcase, lifestyle, educational, testimonial) and provides an overall visual approach statement.

If the business has Instagram media, the top five posts by engagement are persisted to Supabase storage as permanent reference images โ€” avoiding Instagram CDN expiration issues.

Cost: ~$0.010 (two Haiku calls)

Step 4: Product DNA

For businesses with product images, Claude Haiku Vision analyzes each product photo individually. This step produces the visual description that image generation prompts need to render the product accurately.

The output per product: a detailed visual description (packaging color, material, shape, visible text), three to five packaging hex colors, a physical shape description (for example, "rectangular bar in metallic wrapper, approximately 15cm by 5cm"), visible branding elements, four to six natural interaction modes (holding, biting, showing to camera), and a holding description that specifies how a person would naturally hold the product with branding visible.

These interaction modes and holding descriptions directly feed into UGC image prompts later, telling the AI model exactly how the influencer should interact with the product.

Cost: ~$0.002 per product image (Haiku Vision)

Step 5: Influencer Matching

Matching uses a four-signal composite score to rank influencers from our database against the brand profile.

Keyword overlap (40% weight): Product content keywords matched against influencer niches and bio text. Niche matches use exact or word-level overlap. Bio matches use phrase presence for multi-word keywords, exact word match for single-word. This is purely deterministic โ€” no AI call needed.

LLM content alignment (30% weight): Claude Haiku evaluates whether the influencer's real lifestyle overlaps with product keywords. A fitness influencer matched to a protein bar scores high. The same influencer matched to a luxury handbag doesn't.

Niche alignment (20% weight): How many of the influencer's niches overlap with the brand's industry and keywords. Deterministic.

Visual alignment (10% weight): Whether the influencer has color data that matches the brand theme. A simple heuristic โ€” seventy points if data exists, thirty if not.

The composite score filters to influencers above fifty percent, or the single best match if none qualify. Results are stored on the business record for calendar generation.

Cost: ~$0.005 (one Haiku call for LLM scoring)

Step 6: Content Calendar

A Haiku call generates six posts spread across seven days, following a fixed distribution: two influencer UGC posts (feed), two product-only posts (feed), and two carousel threads.

Each slot gets a specific date, time (varied between morning 09:00-11:00 and evening 18:00-20:00), an assigned influencer, a product, and a creative theme describing the scene.

Seasonal context enriches the prompt with Indonesian cultural events. During Ramadan, the calendar leans into iftar moments and sahur themes. During Lebaran, it emphasizes gifting and family. During Hari Kemerdekaan, patriotic red-and-white styling. This comes from a holiday API lookup cached by month.

Post-processing applies three deterministic fixes:

  1. Product ID resolution โ€” Maps Claude's product references to actual database IDs with case-insensitive fallback lookup.

  2. Mismatch fixing โ€” Checks whether each influencer's niches overlap with their assigned product's keywords. If a fitness influencer got assigned to a dessert brand, and a food influencer is available, they swap. Zero-cost keyword matching on arrays that already exist.

  3. Portfolio diversity โ€” No single influencer gets more than fifty percent of posts. Excess assignments are redistributed to underused influencers.

Cost: ~$0.003 (one Haiku call)

Step 7: Caption Generation

All six captions are generated in a single Claude Sonnet batch call. Each post in the batch includes the influencer's name, style, and DNA; the product's name, highlight angle, and content keywords; the creative theme; and the brand's voice profile (personality traits, dos, donts, keywords to avoid).

The output per post includes two caption variants โ€” Variant A (emotional/storytelling approach) and Variant B (informational/value-focused) โ€” five to ten hashtags, overlay text (max six words, sanitized), and a recommended text position.

The prompt enforces authentic product positioning: the product must appear in its real use case, not forced into an aspirational category. A sourdough biscuit is a snack, not a post-workout recovery fuel. The quality scorer checks this later and penalizes forced positioning.

Caption diversity is validated after generation. If multiple posts start with the same opening words, a warning flags the repetition.

Cost: ~$0.010 (one Sonnet call)

Step 8: Image Generation

This is the most complex step and the most expensive. Each post runs through a strategy cascade specific to its content type. The orchestrator tries the highest-fidelity approach first and falls back to cheaper alternatives on failure.

Product Posts

Priority Strategy Model Cost When
1 Hybrid โ€” AI background + real product cutout flux-realism + BiRefNet ~$0.030 Has product image
2 IP-Adapter โ€” AI renders product in scene flux-general $0.030 Hybrid failed
3 Generic โ€” category-appropriate scene flux-realism $0.025 No product image

The hybrid approach generates a photorealistic background scene, removes the product's background using BiRefNet (five-tenths of a cent), and composites the real product cutout on top. The product in the final image is the actual photo โ€” every detail preserved. The AI only generates the environment.

UGC Posts

Priority Strategy Model Cost When
1 Multi-IP-Adapter โ€” face + product in one shot flux-general (2 adapters) $0.030 Has both face and product refs
2 PuLID face-only โ€” influencer without product ref flux-pulid $0.035 Has face ref, strategy 1 failed
3 Generic โ€” no references available flux-realism $0.025 No face ref

UGC posts are the hardest to get right. The influencer's face needs to be recognizable (PuLID with identity weight 0.8), the product needs to be visible (IP-Adapter with scale 0.7), and the whole thing needs to look like a real photo. Multi-IP-Adapter stacks both adapters in a single call when both references are available.

Carousel Posts

Priority Strategy Model Cost When
1 PuLID โ€” influencer-led hook flux-pulid $0.035 Has face ref
2 Generic โ€” eye-catching cover flux-realism $0.025 No face ref

Carousels generate three base images in parallel โ€” a hook (influencer via PuLID), content slides (gradient template, zero Fal cost), and a CTA (product via hybrid or IP-Adapter). Six slides are assembled from these three bases:

  • Slide 0 (Hook): Influencer face, bold text overlay, no blur
  • Slide 1 (UGC): Influencer + product interaction, blurred background
  • Slides 2-4 (Content): Educational points on blurred gradient, 44px text
  • Slide 5 (CTA): Product hero shot, call-to-action, blurred

Content and CTA slides reuse base images with an 18px Gaussian blur and text overlay โ€” no additional generation calls needed.

Prompt Engineering

Every prompt ends with photographic anchors: "Shot on Canon EOS R5, 85mm f/1.4, natural window lighting, shallow depth of field." These cues prime Flux toward photorealistic output instead of illustrated or stock-photo aesthetics.

Every prompt includes a strict anti-text negative: "ABSOLUTELY NO TEXT of any kind โ€” no words, letters, logos, watermarks, gibberish text." AI models love rendering text. The text always looks wrong. We prevent it at the prompt level and add it ourselves later.

Brand colors are injected as a strict palette directive: "Use primary: Warm Gold (#D4A853), accent: Deep Brown (#5C3A1E)." This makes generated images feel like they belong to the brand.

Cost per batch: ~$0.15-0.21 (six images at $0.025-0.035 each, depending on strategy)

Step 9: Text Overlay and Safe Zones

After the base image is generated, text overlay happens in two phases.

Safe zone analysis (optional) sends the image to Claude Haiku Vision to identify visually uncluttered regions suitable for text. It returns normalized coordinates, a recommended text color (white or black for contrast), and a confidence score. This only runs when the text position is unknown โ€” for carousels (always center) and posts where the caption generator already predicted a position, we skip it. Saves about two-tenths of a cent per skipped image.

Text compositing uses Sharp with SVG overlays. A smooth gradient fades from transparent at the middle of the image to a subtle dark wash at the bottom (or top, depending on position). Text is rendered uppercase with Liberation Sans, 700 weight, generous letter spacing, and a drop shadow instead of a heavy stroke outline. The result looks like professional social media design, not text slapped on an image.

Overlay text is capped at six words (truncated with ellipsis if longer) to prevent layout breakage. Font size is 64px for feed posts, 56px for stories, and varies by slide type for carousels.

Cost: ~$0.004 (two safe zone analyses at $0.002 each, rest skipped)

Step 10: Quality Scoring and Retry

Quality scoring runs two tracks in parallel after all images are generated.

Caption scoring sends all posts to Claude Haiku in a single call. Each caption is evaluated on authentic product positioning, product stickiness (connection to real activities), brand voice alignment, influencer authenticity (does it sound like this specific person?), and Instagram optimization. Score: 0-100.

Visual scoring sends every image individually to Claude Haiku Vision. Each image is checked for product visibility, brand color presence, AI artifacts (distorted faces, extra fingers, deformed objects), composition quality, and text readability if overlay is present. Score: 0-100.

The final quality score blends both: caption 70% + visual 30%.

If brand DNA is available, a separate DNA evaluation scores brand alignment, influencer authenticity, product visibility, and visual consistency. This blends with the caption score at 60% caption + 40% DNA.

Retry Gates

Captions scoring below 60 trigger a Haiku rewrite. The retry prompt includes the original caption and the specific quality feedback ("scored 42 โ€” product feels forced into fitness context"). Capped at two caption retries per batch.

Images scoring below 50 trigger regeneration. The smart retry replays the same strategy that produced the original โ€” if PuLID generated the face, PuLID retries the face, with enhanced negative prompts targeting the specific defects found ("no distorted features, natural facial proportions, correct finger count"). If the smart retry fails, it falls back to generic generation. Capped at three image retries per batch.

Posts still scoring below 40 after retry are flagged as needing manual review.

Cost: ~$0.014 (caption scoring $0.002, visual scoring $0.012 for six images, DNA evaluation $0.002 per image if available)

Step 11: Persistence and Delivery

Database persistence creates Post records in a batch Prisma call. Each post links to its business, assigned influencer, product, scheduled date, caption, hashtags, image URL, generation method, and quality score. Review status is set to "PENDING" โ€” nothing publishes without explicit approval.

WhatsApp delivery fires as a fire-and-forget promise that doesn't block the API response. First, a text notification tells the business owner their content is ready with a link to the dashboard. Then each post is sent as an image preview with a truncated caption and formatted hashtags. Carousel posts send multiple images in a single message.

Phone numbers are normalized to E.164 format with Indonesian country code detection (leading 0 becomes +62).

Cost: ~$0.007 (seven Fonnte messages at $0.001 each)

The Complete Cost Breakdown

Step What Model Cost
Brand analysis 1 Sonnet call with vision claude-sonnet-4 ~$0.015
Theme analysis 1 Haiku call claude-haiku-4.5 ~$0.005
Mood board 1 Haiku call with vision claude-haiku-4.5 ~$0.005
Product DNA ~2 Haiku Vision calls claude-haiku-4.5 ~$0.004
Influencer matching 1 Haiku call claude-haiku-4.5 ~$0.005
Calendar generation 1 Haiku call claude-haiku-4.5 ~$0.003
Caption generation 1 Sonnet batch call claude-sonnet-4 ~$0.010
Image generation 6 Fal AI calls flux-realism/pulid/general ~$0.180
Safe zone analysis ~2 Haiku Vision calls claude-haiku-4.5 ~$0.004
Quality scoring 6 Haiku Vision + 1 Haiku text claude-haiku-4.5 ~$0.014
Quality retries ~1 image + ~1 caption varies ~$0.035
WhatsApp delivery 7 messages Fonnte API ~$0.007
Total ~$0.29-0.40

The range depends on which image strategies trigger (hybrid costs more than generic), how many retries are needed (zero to three), and whether safe zone analysis runs (skipped when position is pre-determined).

Even at the high end, a full business onboarding โ€” brand analysis, influencer matching, content calendar, six posts with photorealistic images, quality scoring, and WhatsApp delivery โ€” costs under fifty cents.

Error Handling: Nine Independent Try-Catches

The pipeline is designed to degrade gracefully. Each major step wraps in its own error handler:

  1. Instagram fetch fails โ€” continues without Instagram data, uses website only
  2. Theme analysis fails โ€” uses default colors (blue primary, slate secondary)
  3. Mood board fails โ€” continues without visual style guidance
  4. Product analysis fails โ€” continues without enriched positioning
  5. Influencer matching fails โ€” returns empty array, calendar generates brand-only content
  6. Calendar generation fails โ€” returns empty array, no posts created
  7. Individual image generation fails โ€” that post gets an empty image URL, other posts unaffected
  8. Quality scoring fails โ€” all posts get a default score of 50
  9. WhatsApp delivery fails โ€” logged silently, user can still access dashboard

Only post generation (Claude returning captions) is truly fatal โ€” without captions, there's nothing to ship. Everything else can partially fail and the pipeline still produces usable output.

What Happens After Onboarding

The generated posts sit in "PENDING" review status. The business owner sees them on their dashboard and in WhatsApp previews. They can approve, reject (with an optional note), or edit each post individually.

Approved posts can be published directly to Instagram if the business has connected their account. A cron job checks for posts scheduled in the past that are approved but not yet published, and pushes them automatically.

Rejected posts feed back into the mood board โ€” the rejection adjusts style scores by minus five points, making the system less likely to generate similar content in future batches. Approvals add five points, reinforcing what works.

The entire feedback loop โ€” from URL submission to published Instagram post โ€” can complete in under two minutes if the business owner approves immediately. More typically, they review over a few hours, approve the ones they like, and let the scheduler handle publishing at optimal times.