# The Upgrade List — how to take a vibe coded app from sloppy to scalable
Building an app isn't hard now. The hard part is what comes next — when "well, it compiles" stops being enough and you need something you can actually scale.
Engineers have spent decades figuring out how to keep codebases from collapsing under their own weight. Those practices weren't invented for AI. But they map because they answer the same underlying question — how do you get dozens of individuals with differing skillsets, preferences, and predilections to produce consistently high quality work?
The answer is, quality has to come from the system, not the people (or agents) operating it. You need a development system so well-structured that bad output becomes structurally impossible.
Agents know these techniques and can put them into practice, but they won't do it unprompted. You have to know what to ask for.
So, here's what to ask for.
## Tier 1: Foundations
### Formatting & Style
Remove style as a variable. If formatting is not enforced mechanically, agents will make inconsistent choices, and those choices will compound across every file they touch. A formatter rewrites code to a single standard automatically, so style is never a variable. The same principle applies at the editor level: line endings, indentation, trailing whitespace should be consistent regardless of which tool is writing the file. Setting this up is fast and makes everything else easier.
Biome (JS/TS, also handles linting), Prettier (JS/TS), Black or Ruff (Python), EditorConfig.
### Type Safety
Make your type system belligerent. Most vibe-coded projects have silent holes (implicit 'any' types, liberal use of 'as X' to override type errors) that let agents make incorrect assumptions without any warning. Turn on strict mode, ban implicit any, and ban type assertions except where explicitly justified. The structure exists, let it enforce itself properly.
Validate data at every external boundary. Typechecks only happen at compile time. Any data coming in from outside (API requests, environment variables, third-party responses) should be validated at runtime against a declared schema. Let invalid data fail loudly at the boundary rather than silently corrupting state downstream.
Zod, Valibot, ArkType (TypeScript), Pydantic (Python).
### Single Source of Truth
Every category of information should have exactly one place where it lives. An agent trying to figure out "how do we do x" will search everywhere and grab the first approach that looks reasonable. Reduce mistakes by picking one authoritative location per category.
Your database schema should be a readable file, not a history of migrations. Use a schema-first ORM. They give you a single file that describes your entire data model in a form an agent can read in seconds.
Define APIs in one place. Types, client code, and documentation should be generated from that definition, not written separately and allowed to drift.
Prisma, Drizzle (TypeScript), SQLAlchemy with declarative models (Python), tRPC (TypeScript full-stack), OpenAPI/Swagger spec with code generation.
## Tier 2: Enforcement
### Mechanical Enforcement
Enforce code complexity limits. Given the choice, agents will keep adding to existing functions rather than decomposing them, leading to overly complex files that both agents and humans have a hard time reasoning through. Complexity rules (maximum function length, maximum cyclomatic complexity, maximum nesting depth) make this a build error rather than a judgment call. The linter refuses, the agent is forced to decompose.
Make public functions explicit.
If an agent can import anything from anywhere inside a module, it will. Barrel files (an index.ts that explicitly declares what's public) mean that what's internal stays internal. Combined with import rules in your linter, this enforces intentional API design at the module level.
Catch security issues and code smells automatically. Beyond style, there's a category of problems a linter or static analysis tool will catch before they're ever committed (duplicate logic, dangerous patterns, known vulnerability signatures). Grab a few static analysis tools and let them check your work.
ESLint with complexity rules (JS/TS), dependency-cruiser or Nx module boundaries for architecture, SonarLint/SonarQube, Semgrep, Ruff (Python), CodeRabbit, Greptile.
### Testing
Test behavior, but also architecture. Most people think of tests as checking that functions return the right values. But you can also use tests to verify the repository itself: that dependencies flow in the right direction, that certain logic doesn't exist where it shouldn't. In an agentic workflow, architectural tests may be the most valuable tests you can write — they catch the category of drift that's hardest to see and easiest to compound.
End to end tests as a last line of defense. Use end to end tests sparingly, but use them. A single e2e test that walks through a core user flow will catch issues that unit/integration tests miss entirely.
Use tests as specification. A well-written test is the most machine-readable description of what something is supposed to do. Writing tests before or alongside implementation gives agents a spec to target, not just code to not break. Integration tests are usually more valuable here than unit tests.
Playwright, Cypress, Vitest, Jest, RTL (JS/TS), pytest (Python), Testing Library for UI behavior, agent-browser if you want to get fancy.
### Feedback Loops
Agents need to see runtime errors directly, not through you. Every human copy+paste between agent output and agent input is a speed tax and a quality leak. Set up structured error output the agent can read, not a terminal a human has to watch.
Log as structured data, not freeform strings. Humans and agents parse data differently. Whereas we like prose, agents prefer structured data like JSON. Structured logging makes your application's runtime output as legible to agents as your source code is.
Close the feedback loop as tightly as possible. Set up tool-use/pre-commit/pre-push hooks to run formatting, linting, and typechecking scripts automatically. They're generally fast and give agents inline feedback on changes without you having to copy/paste across. Give feedback in seconds rather than waiting for CI.
Sentry, Highlight, Axiom, Husky, Lefthook, lint-staged.
## Tier 3: Agent-Specific
### Agent Context
Give agents a map, not just a territory. An agent without a codebase index will rebuild things that exist, miss utilities it needs, and import from the wrong places. Instead, create a flat, auto-generated map: modules, responsibilities, exported functions. Bonus points: make this self-updating with a pre-commit hook — a manually maintained index will drift from reality and become another source of misinformation.
Make your errors explain themselves. Every lint rule and architecture test should carry an annotation explaining why it exists and how to resolve it. The error message is not just an explanation of what's wrong — it's the agent's instruction for what to do next.
Capture significant decisions where agents can find them. When you make a meaningful architectural choice, write it down in a lightweight decision log: why you're using this pattern, what alternatives you rejected, what constraint you were working around. An agent asked to change something relevant will know it's undoing an intentional choice rather than correcting an oversight. Architecture Decision Records (ADRs) templates can be a good starting point for this.
Make commit history machine-readable. Conventional commit formats (feat:, fix:, refactor:, chore:) turn git history into structured data. An agent tracing when a behavior changed can actually get an answer from a clean history. A history of "new feat" and "bug fix" tells it nothing.
### Secrets & Dependencies
Prevent credentials from ever entering the codebase. Agents will accidentally commit API keys, tokens, and secrets — they're getting better at this, but the tools exist to enforce this mechanically so better to use them. Automated secret scanning catches this before it reaches your repository.
Audit dependencies automatically. People find security vulnerabilities in open source software constantly. Unless you've hand-written 100% of your code, chances are a package you use has one. Set up automated scanning and update PRs to catch and fix these as they happen, not after a breach.
Keep your API keys and secrets somewhere else. Make it structurally impossible for an agent to have access to a raw secret it could commit. Secrets managers mean agents request credentials through a controlled interface rather than reading from files.
Gitleaks, TruffleHog, Dependabot, Renovate, npm audit, pip audit. Doppler, 1Password (Secrets Automation), AWS Secrets Manager, GitHub Secrets.
### CI/CD as the Final Gate
Protect your main branch. No direct commits, no force-pushes. Make every change require a pull request. Make every merge require passing status checks. Make main a reliable snapshot that always reflects the actual state of the enforced rules.
Make quality a hard requirement, not a best effort. Combine these techniques together and embed them in your CI pipeline. It will run on every pull request (format check, lint, type-check, tests, architecture rules) and will block merge on failure. It's the difference between quality being someone's responsibility and quality being the system's responsibility. Agents can try whatever they want. The system decides what actually ships.
GitHub Actions, GitLab CI, branch protection in either.
AI-friendly infrastructure is a deep rabbit hole, and as an industry we're still digging. There's a lot more you could do but the tactics outlined above are the foundation upon which all those other techniques rest.
If you've read this far, you know what to ask for.
Now go make it happen.