JiwaAI
Blog
โ†All posts
engineering
ai
brand-consistency
architecture

Brand DNA Architecture โ€” How We Keep AI Content Authentic at Scale

Jiwa AI Engineeringยท

1. The Problem: Generic AI Content

Ask any marketer who has tried AI content generation and they will tell you the same thing: the output is competent but generic. Studies suggest that general-purpose AI content generators produce on-brand text roughly 60% of the time. The other 40% ranges from subtly off-tone to completely wrong for the brand. Acrolinx's research on brand voice in AI systems confirms this gap -- most LLMs default to a "helpful assistant" register that sounds nothing like a streetwear brand or a Betawi food influencer.

The stakes are higher than awkward phrasing. A 2024 study published in the Journal of Retailing and Consumer Services found that consumers who perceive content as AI-generated rate brands significantly lower on authenticity and trustworthiness. The moment your audience detects the "AI voice," you lose the parasocial connection that makes influencer marketing work in the first place.

The challenge compounds when you introduce virtual influencers. Now you are not just matching one brand voice -- you are maintaining brand voice AND influencer persona simultaneously. A protein bar brand needs to sound like itself, but the caption also needs to sound like the specific AI influencer delivering it. Two constraints, one piece of text.

Holo AI, one of the early entrants in AI influencer generation, tackled this with a 4-section brand profile and a dual-brain architecture that separates brand identity from influencer personality. Their approach demonstrated that structured brand data dramatically outperforms free-form "tone" strings. We studied their architecture closely and extended it into what we call Triple DNA.

2. Our Approach: Triple DNA Architecture

Jiwa AI injects three structured identity layers into every content generation prompt:

  • Brand DNA -- who the brand is (tone, personality, do's/don'ts, keywords to avoid)
  • Influencer DNA -- who the creator is (voice, catchphrases, content style, language patterns)
  • Product DNA -- what the product looks like (visual description, packaging, branding elements, interaction modes)

All three are constructed during onboarding, stored in the database, and formatted into every caption generation and image generation call. The flow looks like this:

                    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                    โ”‚  Brand URL   โ”‚
                    โ”‚  + Images    โ”‚
                    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                           โ”‚
                    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                    โ”‚   Scrape +   โ”‚
                    โ”‚   Analyze    โ”‚
                    โ”‚  (Claude)    โ”‚
                    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                           โ”‚
              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
              โ”‚            โ”‚            โ”‚
       โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
       โ”‚  Brand DNA  โ”‚ โ”‚Influencerโ”‚ โ”‚ Product DNA  โ”‚
       โ”‚  (voice,    โ”‚ โ”‚  DNA     โ”‚ โ”‚ (Haiku       โ”‚
       โ”‚  persona-   โ”‚ โ”‚  (tone,  โ”‚ โ”‚  Vision      โ”‚
       โ”‚  lity,      โ”‚ โ”‚  catch-  โ”‚ โ”‚  analysis)   โ”‚
       โ”‚  do/don't)  โ”‚ โ”‚  phrases)โ”‚ โ”‚              โ”‚
       โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
              โ”‚           โ”‚            โ”‚
              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                       โ”‚
                โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                โ”‚  Caption +   โ”‚
                โ”‚  Image       โ”‚
                โ”‚  Generation  โ”‚
                โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                       โ”‚
                โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                โ”‚  Multimodal  โ”‚
                โ”‚  Evaluation  โ”‚
                โ”‚  (Haiku      โ”‚
                โ”‚   Vision)    โ”‚
                โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

The key insight: none of these layers is optional. Drop Brand DNA and the caption drifts from brand guidelines. Drop Influencer DNA and every influencer sounds the same. Drop Product DNA and the image generator hallucinates packaging that does not exist.

3. Brand DNA: Structured Brand Voice Framework

Most AI content tools ask brands for a single "tone" string -- "professional and friendly." That is not enough information for an LLM to reliably produce on-brand copy. Our BrandDNA interface captures a much richer signal:

interface BrandDNA {
  // Core identity
  name: string;
  industry: string;
  description: string;
  // Market positioning
  targetAudience: string;
  keywords: string[];
  // Voice (the critical section)
  brandTone: string;
  personalityTraits: string[];   // e.g. ["bold", "health-conscious", "community-driven"]
  voiceDos: string[];            // e.g. ["Reference local culture and activities"]
  voiceDonts: string[];          // e.g. ["Don't use medical claims"]
  samplePhrases: string[];       // e.g. ["Fuel your hustle"]
  keywordsToAvoid: string[];     // e.g. ["cheap", "artificial", "diet"]
  // Visual identity
  primaryColor: string;
  accentColor: string;
  mood: string;
  visualStyle: string;
}

This entire structure is auto-populated during onboarding. When a brand submits their website URL, Claude analyzes the scraped content -- including page copy, image alt text, and connected Instagram posts -- and extracts personality traits, voice rules, sample phrases, and keywords to avoid. No manual form-filling required.

Here is what a formatted Brand DNA block looks like when injected into a caption generation prompt for a fitness brand:

=== BRAND DNA ===
Brand: SHRED Nutrition | Industry: Sports Nutrition
Tone: Bold, energetic, community-driven
Personality: bold, health-conscious, community-driven, performance-focused
DO: Reference real sports and activities. Use motivational language. Speak like a training partner.
DON'T: Make medical claims. Use the word "cheap." Sound corporate or clinical.
Sample phrases: "Fuel your grind", "Recovery starts now", "Built for athletes"
NEVER use these words: cheap, artificial, diet, supplement
Target: Active millennials and Gen-Z athletes in Indonesia
Keywords: protein, recovery, performance, padel, gym, high-intensity

We also compute a completeness score for every Brand DNA profile, broken into four weighted sections -- core (25%), market (25%), voice (25%), and visual (25%). This score is surfaced in the dashboard, incentivizing brands to fill in richer voice guidelines. The principle is straightforward: garbage in, garbage out. A Brand DNA with empty voiceDos and no keywordsToAvoid will produce noticeably weaker content than a fully populated one.

4. Influencer DNA: Preserving Authentic Voice

Each AI influencer in the Jiwa Studio roster has their own structured voice data, independent of any brand they promote. The InfluencerDNA interface mirrors Brand DNA's voice section but adds creator-specific fields:

interface InfluencerDNA {
  name: string;
  slug: string;
  bio: string;
  toneOfVoice: string;
  personalityTraits: string[];
  voiceDos: string[];
  voiceDonts: string[];
  catchphrases: string[];
  contentStyle: string;
  niches: string[];
  visualStylePrompt: string;
  colorPalette: string;
}

The catchphrases array is especially important. It gives the LLM concrete examples of how this influencer actually talks, not just abstract descriptors.

Consider two of our influencers promoting the same protein bar:

Bagas Kuliner -- Jakarta street food energy, Betawi slang, ALL CAPS enthusiasm:

Voice: loud, energetic, authentic Betawi Catchphrases: "MANTAP JIWA!", "Gila sih ini enak banget!!!", "Wajib coba, bro!" DO: Use Betawi slang and casual Jakarta street language. Write in ALL CAPS for emphasis. DON'T: Never use formal or refined language. Avoid muted, quiet, or meditative tone.

Vivi Tan -- Gen-Z Cindo seller, bubbly, cute, Shopee-live energy:

Voice: bubbly, relatable, Gen Z slang, Cindo casual Catchphrases: "Gemes banget sih ini!", "Wajib checkout sekarang!", "Affordable tapi kualitas sultan!" DO: Use Gen-Z Cindo slang and Shopee live seller energy. Keep everything affordable and accessible. DON'T: Never use corporate, formal, or boring business language. Avoid dark or moody aesthetics.

Same product. Completely different captions. When the caption generator receives Bagas's DNA, it produces energetic, ALL-CAPS Indonesian street slang with fire emojis. When it receives Vivi's DNA, the output shifts to cute, bubbly marketplace language with sparkle emojis. The brand's keywordsToAvoid list still applies to both -- neither influencer will call the protein bar "cheap" or make medical claims -- but the surface-level voice is radically different.

Research on AI-generated influencer content and brand trust confirms that voice consistency is the primary driver of perceived authenticity. Users do not just notice what is said; they notice how it is said. A single off-voice caption can break the illusion.

5. Product DNA: Multimodal Visual Identity

Text-only product descriptions are insufficient for image generation. Telling an image model "protein bar" does not communicate packaging color, branding placement, or physical shape. The result is hallucinated product visuals that look nothing like the real thing.

We solved this with ProductDNA, populated by Claude Haiku Vision during onboarding at approximately $0.002 per image:

interface ProductDNA {
  name: string;
  description: string;
  highlightAngle: string;
  // Visual identity from Haiku Vision analysis
  visualDescription: string;
  packagingColors: string[];
  productShape: string;
  brandingElements: string[];
  // Interaction modes
  interactionModes: string[];
  holdingDescription: string;
  // Content metadata
  contentKeywords: string[];
  igPostExamples: string[];
  colorTheme: string;
}

When a brand uploads product images, Haiku Vision analyzes each one and extracts structured visual data. The prompt asks for specifics: dominant hex colors, physical dimensions, visible text and logos, and natural interaction modes for Instagram content.

For example, analyzing a SHRED protein bar image might return:

{
  "visualDescription": "Black matte gift box with blue satin ribbon, contains three protein bars in metallic black packaging with blue 'SHRED' branding and blueberry imagery",
  "packagingColors": ["#1a1a1a", "#2563eb", "#1e3a5f", "#c0c0c0"],
  "productShape": "rectangular bar in metallic wrapper, ~15cm x 5cm",
  "brandingElements": ["SHRED logo in blue", "FUEL YOUR GRIND text", "blueberry imagery"],
  "interactionModes": ["holding bar near face", "taking a bite", "showing packaging to camera", "opening the gift box", "placing on gym towel"],
  "holdingDescription": "Person holds the slim protein bar in one hand, packaging visible with SHRED branding facing camera"
}

The holdingDescription field is injected directly into image generation prompts, telling the model exactly how the influencer should interact with the product. The visualDescription gives enough detail for the image model to render recognizable packaging rather than generic shapes.

Before Product DNA, our productImageDescription field on post specs was frequently empty, leading to images where the product was either absent or unrecognizable. Now every product that has at least one uploaded image gets a rich visual identity extracted automatically.

6. Cheap Multimodal Evaluation: The Quality Gate

Generating content is only half the problem. You also need to know whether the generated content is any good. Text-only scoring misses visual issues -- wrong brand colors, invisible product, mismatched aesthetic. We needed multimodal evaluation, but Sonnet-class models are expensive for batch scoring.

Our solution: post-generation evaluation using Claude Haiku Vision, which is roughly 10x cheaper than Sonnet for vision tasks.

interface DNAEvaluation {
  brandAlignment: number;        // 0-100
  influencerAuthenticity: number; // 0-100
  productVisibility: number;     // 0-100
  visualConsistency: number;     // 0-100
  overall: number;               // weighted average
  suggestions: string[];         // 1-3 specific improvements
  keywordsViolated: string[];    // brand keywords found in caption
}

The evaluator receives the generated image, the caption text, the full Brand DNA, and the full Influencer DNA. It scores along four dimensions:

  • Brand alignment (25%): Does the caption + image match the brand's tone, personality, and visual mood?
  • Influencer authenticity (30%): Does the caption sound like this influencer? Does it use their catchphrases and language patterns?
  • Product visibility (25%): Is the product visible, naturally placed, and recognizable in the image?
  • Visual consistency (20%): Do image colors, lighting, and mood match the brand palette and influencer aesthetic?

The keywordsViolated check is a hard constraint. If the brand's keywordsToAvoid list includes "artificial" and the caption says "no artificial ingredients," that gets flagged regardless of the overall score.

Cost breakdown for a typical 6-post onboarding batch: approximately $0.01 total for evaluation. That is cheap enough to run on every generated post, not just spot checks.

7. Continuous Learning: DNA That Evolves

Static profiles degrade over time. Brands evolve, influencer voices sharpen, and product lines change. Our DNA architecture includes feedback loops that keep the system current.

Approval/rejection signals. When a brand manager approves or rejects a post in the dashboard, that signal feeds back into quality scoring baselines. Consistently rejected posts with low influencerAuthenticity scores indicate that the Influencer DNA needs tuning -- perhaps the catchphrases are stale or the voiceDos are too vague.

Caption edit patterns. When a user customizes a caption through our post editor, the delta between the original and edited version is informative. If a brand manager consistently removes exclamation marks or adds specific product terminology, those patterns suggest DNA updates. The customizePost flow already tracks edit instructions as reviewNote, giving us a growing corpus of brand-specific corrections.

Completeness score as incentive. The Brand DNA completeness score -- visible in the dashboard -- nudges brands to provide richer input. A brand at 60% completeness with empty voiceDonts and no samplePhrases sees materially worse content quality. When they fill those fields and see the score jump to 90%, the content quality improvement is immediately visible in the next generation cycle.

Memory regeneration. When Brand DNA fields are updated, the system regenerates the business memory document, ensuring that all downstream systems (calendar generation, theme analysis, influencer matching) operate on the latest brand identity.

8. Results and What's Next

The Triple DNA architecture -- Brand DNA, Influencer DNA, and Product DNA injected into every generation call, evaluated by cheap multimodal scoring -- produces measurably better content than single-tone-string approaches.

Brand alignment improves because the LLM receives explicit constraints: personality traits, voice rules, sample phrases, and a hard blocklist of words to never use. The evaluator catches violations that slip through.

Influencer authenticity improves because each influencer has structured voice data with concrete catchphrases and style rules. The same brand sounds completely different through Bagas Kuliner versus Vivi Tan, which is exactly the point.

Product accuracy improves because Haiku Vision extracts real visual descriptions from product images, eliminating the hallucinated packaging problem. The holdingDescription field ensures natural product interaction in generated images.

Cost efficiency makes this practical at scale. Brand DNA extraction costs a single Claude API call during onboarding. Product DNA extraction costs approximately $0.002 per product image via Haiku Vision. Multimodal evaluation costs approximately $0.01 per 6-post batch. The total onboarding cost is under $0.03.

What is next: we are exploring A/B testing of DNA variations -- does a brand perform better with "bold and energetic" personality traits versus "confident and empowering"? By generating content with DNA variants and measuring downstream engagement (likes, saves, shares), we can close the loop between structured identity data and actual audience response. The DNA evolves not just from manual feedback, but from real-world performance.


References