Splitting a 1500-Line Monster Into Modules That Make Sense

How a File Becomes a Monster

It starts innocently. You build a function that generates social media post captions. It works, so you add hashtag generation next to it. Then quality scoring — that needs the same context, so it goes in the same file. Then A/B caption variants. Then carousel-specific logic. Then image prompt construction. Each addition makes sense in the moment because it shares data with what is already there.

Eighteen months later you have a 1,533-line file that generates captions, scores quality, builds image prompts, evaluates carousel structure, retries failed generations, and enforces hashtag formatting. It does all of these things competently. It is also impossible to understand, impossible to test in isolation, and terrifying to modify because every change might break something three hundred lines away.

Our post generator had reached this point. Our image orchestrator was following the same trajectory at 950 lines. Both files were accumulating responsibilities faster than they were being organized.

The Case for Decomposition

The trigger was not a bug. Everything worked. The trigger was velocity — how quickly we could make changes to the content pipeline. Every improvement to caption quality required reading through image prompt code to make sure nothing was affected. Every tweak to the quality scoring formula meant scrolling past hundreds of lines of unrelated caption logic to find the right function.

When a critique cycle identified six improvements across captioning, scoring, and image generation, the thought of making all six changes in a single massive file was the breaking point. We needed to split the file before we could improve what was inside it.

What We Split and Why

The post generator became three focused modules plus a thin orchestrator. Caption generation — the prompt construction, A/B variant logic, and hashtag enforcement — moved into its own module. Quality scoring — the evaluation criteria, retry logic, and review flagging — became another. Image prompt templates — the scene descriptions, camera settings, and industry-specific configurations — moved out of the image orchestrator into a shared prompts module.

The original post generator file dropped from 1,533 lines to 864. The image orchestrator went from 950 to 599. The extracted modules totaled about 1,100 lines. The total line count actually increased slightly — the overhead of module boundaries, explicit imports, and interface definitions adds some weight. But each individual file now has a single clear purpose that fits in your head.

The remaining orchestrator function is genuinely thin. It calls caption generation, then quality scoring, then carousel evaluation, passing results between them. You can read it in under a minute and understand the entire flow. The details of how captions are generated or how quality is scored are in their respective modules, out of the way until you need them.

The Zero-Behavior-Change Rule

We made a deliberate decision: this refactoring would change zero behaviors. No bug fixes bundled in. No "while we are in here" improvements. No subtle parameter tweaks. The before and after versions would produce identical outputs for identical inputs.

This constraint is painful in the moment. You are reading through the code to extract it and you see an obvious improvement — a hardcoded value that should be configurable, a retry loop that could be smarter, an error message that is misleading. The temptation to fix it while you are moving the code is strong. We resisted it deliberately.

The reason is testability. If the refactored code produces exactly the same outputs as the original, you can verify the refactoring is correct by running the existing tests and end-to-end flows. If you also changed behaviors, a test failure could mean either "the refactoring broke something" or "the behavior change has a bug." You cannot tell which. Pure refactoring gives you a clean signal.

What It Enabled

The payoff came immediately. The same critique cycle that motivated the split identified improvements to caption resilience, quality score persistence, and image realism. Each of these changes touched exactly one module. Caption resilience was a change to the caption generator — no need to read quality scoring code. Quality score persistence touched the quality gate and the database layer — caption generation was irrelevant. Image realism changes went into the prompts module and the image orchestrator — caption code stayed untouched.

Three developers could have worked on all three changes simultaneously without merge conflicts, because the changes lived in different files with clean interfaces. In the monolithic version, all three changes would have been in the same file, creating conflict zones and requiring careful coordination.

The Lesson for Growing Codebases

Files become monolithic gradually, never all at once. The right time to split a file is before it becomes painful — but nobody does it then because there is always more urgent work. The second-best time is when you realize that the file is actively slowing down your improvement cycle. When the cost of understanding the file exceeds the cost of splitting it, the refactoring pays for itself immediately.

The key insight is that decomposition is not about making code prettier. It is about making the next change faster. Every module boundary you create is a wall that says "you do not need to understand this part to change that part." In a fast-moving AI pipeline where improvements come in rapid critique cycles, those walls are what keep the pace sustainable.