The $400 Wake-Up Call
I used to have Cursor’s highest tier subscription—$200 a month with $400 in monthly credits. Sounds like plenty, right?
Two weeks. That’s how long it took me to burn through the entire quota.
I found myself rationing my coding time based on API credits. Think about that for a moment—I love coding. I want to work all day. But here I was, checking my usage dashboard like someone monitoring a dwindling bank account, wondering if I could afford one more refactoring session before the month reset.
Something was deeply wrong.
But the cost wasn’t even the worst part. The worst part was watching my carefully planned code dissolve into something I couldn’t recognize anymore.
When Good Plans Meet Eager AI
Picture this: You’ve architected something beautiful. You know exactly what you want. You’ve thought through the design patterns, planned the separation of concerns, mapped out the type system. You’re not just playing around—you’re building something real, something that needs to scale, something that actual users will depend on.
So you explain your plan to your LLM assistant. The plan is solid. You’re feeling good.
And then it generates 500 lines of code.
At first glance, it looks… fine? It runs. But as you read through it:
- The Python typing is sloppy—bare dictandlisttypes everywhere when you need strict typing
- Data is being passed around as raw JSON objects, making everything impossible to trace
- There’s zero separation of concerns—just one massive function doing everything
- It’s like the LLM threw the entire architecture into a blender and hit “frappe”
“No problem,” you think. “I’ll just ask it to fix these issues.”
So it rewrites the entire thing.
Now the typing is overly specific. It’s created seventeen helper functions where you needed three. The original logic that actually worked is gone, replaced with something that breaks in subtle ways you won’t discover until production. And you can’t even remember what the original code looked like because it’s been through four complete rewrites in twenty minutes.
You’re sitting there, staring at your screen, thinking: “I had a plan. What happened to my plan?”
If you’ve experienced this sinking feeling, you’re not alone. And if you’ve sworn off AI coding assistants because of it, I don’t blame you. I nearly did too.
The Real Problem (And It’s Not the AI)
Here’s what I eventually realized: The LLM isn’t bad at coding. It’s actually pretty good—when it understands what you want and operates within clear boundaries.
The problem was me letting it jump straight to implementation.
Think of it like this: Imagine hiring a new developer who, the moment you mention a problem, immediately opens their editor and starts frantically rewriting your entire authentication system before you’ve even finished explaining what’s wrong. You’d stop them, right? You’d say, “Hold on, let’s talk through this first. What are the trade-offs? How does this fit with our existing architecture?”
But with LLMs, we often skip that conversation. We describe a problem and—because the tools make it so easy—let them immediately start generating code. No discussion. No design review. No “have we thought through the implications of this?”
The result? Cascading failures:
- File A has a subtle architectural inconsistency
- File B depends on File A, amplifying the problem
- File C builds on File B, creating something that looks functional but is fundamentally fragile
- You’ve burned through thousands of tokens generating code you’ll have to throw away
The Three-Prompt Rule That Changed Everything
After months of frustration (and drained API credits), I stumbled into a pattern that actually works. It’s almost embarrassingly simple:
Discuss, decide, then deploy.
Or more specifically: Make the LLM talk through the solution before it touches a single line of code.
This breaks down into three distinct prompts—three phases of conversation that keep you in control while letting the AI handle the heavy lifting.
Phase One: “Here’s How Things Work Around Here”
The first prompt isn’t about solving a problem. It’s about orientation—making sure the LLM understands your project’s architecture, design patterns, and philosophies before it offers any solutions.
Here’s what I actually send:
|  |  | 
The magic ingredient here is project_big_picture.md—a document I maintain that’s part architecture diagram, part philosophy statement, part annotated table of contents.
What goes in this document?
Think of it as the thing you’d give to a new team member on their first day. Not a line-by-line code walkthrough, but the high-level understanding they need:
- Architecture overview: How the pieces fit together (I literally draw ASCII diagrams)
- Design patterns we use: “All database access goes through repositories,” “We favor immutability,” etc.
- File directory map: What each file/module does and how they relate
- Design philosophies: The why behind our decisions
- Non-negotiables: “We require strict typing,” “Error handling must be explicit,” that sort of thing
Here’s a real example from one of my projects (a cat photo rating API, because why not):
|  |  | 
Is it extra work to maintain this document? Yes. Does it save me from re-explaining my architecture every single time I start a new chat? Absolutely.
Why this works:
When the LLM has this context up front, it can explore your codebase intelligently. It reads the relevant files. It understands how pieces connect. Most importantly, it learns what you care about—and it won’t suggest solutions that violate your principles.
Plus, you write this prompt once. Then you reuse it across coding sessions until your architecture meaningfully changes. The time investment pays for itself immediately.
Phase Two: “I’ve Got a Problem. How Would You Solve It?”
Now that the LLM understands your project, you can actually discuss the problem at hand.
The key here: Ask for proposals, not implementations.
Good second prompt:
|  |  | 
Bad second prompt:
|  |  | 
See the difference? The first invites discussion. The second invites the LLM to immediately start rewriting code based on assumptions.
What happens next is what I call the “discussion chain”—a back-and-forth conversation where the LLM proposes solutions and you poke holes in them:
LLM: “I’d suggest extracting an interface for the repository and injecting it into the rating service…”
You: “That makes sense, but what about the rating cache? It’s currently stored in the repository layer.”
LLM: “Good point. We could move caching to the service layer, or introduce a dedicated caching layer…”
This might go on for 3-10 messages. You’re refining the approach together. You’re asking “what if” questions. You’re making sure the solution actually fits your architecture.
And crucially: No code changes happen during this phase.
The LLM is your thinking partner, not your typing monkey. It’s suggesting, explaining trade-offs, showing you where problems might arise. You’re maintaining full control over the design decisions.
I can’t overstate how much stress this eliminates. There’s no frantic “wait, stop, that’s not what I meant!” scrambling. There’s no throwing away thousands of lines of generated code. You’re designing the solution together, in plain English, before a single file gets touched.
Phase Three: “Okay, Let’s Do It”
Only after you’ve fully discussed the approach, considered the trade-offs, and pinned down exactly what you want—only then do you give permission:
|  |  | 
Now the LLM implements.
And because you’ve thoroughly discussed the solution, the changes will:
- Actually align with your architecture
- Follow your established patterns
- Include the right level of typing
- Solve the problem without creating three new ones
But here’s the beautiful part: You’re not micromanaging the implementation. You’re not specifying every variable name or worrying about which files need imports. The LLM handles those details.
You designed it. The LLM built it.
The Complete Development Loop
Here’s what this looks like in practice:
|  |  | 
When do you update project_big_picture.md?
Update when you’ve made a change that meaningfully alters your architecture:
- Added a new layer (like that caching layer)
- Changed a fundamental pattern (REST to GraphQL)
- Refactored a core abstraction
Don’t update for:
- Bug fixes
- New features that follow existing patterns
- Refactoring that doesn’t change public interfaces
When you do update, I typically ask the LLM to do it:
|  |  | 
What Actually Changed (Besides My Sanity)
After adopting this approach, here’s what shifted:
Cost: I haven’t hit my $400 limit in months. Seriously. By scoping conversations to one problem at a time and discussing before implementing, I’ve cut my token usage by something like 70%. Sometimes I even copy a proposed solution from one chat and paste it into a new conversation to get a fresh perspective—sounds paranoid, but it saves a ton of tokens.
Code Quality: My code actually follows my architecture now. There’s no drift where the LLM subtly introduces patterns I don’t want. Type safety is consistent. Error handling is explicit. Everything feels intentional.
Debugging: When something breaks in production, I know exactly where to look—because I made the design decisions. The LLM just implemented them. The code structure makes sense to me because I architected it.
Mental Load: This is the surprising one. I thought adding all this structure would make coding feel more bureaucratic. Instead, it’s the opposite. I’m not constantly context-switching between “what do I want” and “what did the LLM just do?” I think at the design level. The LLM handles the implementation details. It feels like pair programming with someone who types really fast and never gets tired.
Speed: Yes, I’m actually faster overall. The time I spend in discussion is dwarfed by the time I save not debugging cascading failures or rewriting entire modules.
For the Skeptics (I Was One of You)
Look, if you’ve tried AI coding assistants and decided they write garbage code, you’re not wrong about what you experienced. I’ve been there. I’ve seen the sloppy typing, the tangled messes, the solutions that technically work but are architectural nightmares.
But here’s the thing: The LLM isn’t the architect. You are.
When you let it jump straight to code, you’re handing over design authority to something that doesn’t understand your project’s constraints, your team’s conventions, or your future maintenance burden. Of course it produces code you’d never ship.
But when you keep it in the discussion phase—when you use it as a thinking partner before giving it implementation authority—it becomes remarkably useful.
Think of it this way: You wouldn’t hire a developer and immediately give them commit access to main before they understand your codebase, right? You’d have them read the architecture docs, discuss approaches, get code reviewed.
Do the same with your LLM. Orientation, discussion, then implementation. In that order. Every time.
The Golden Rule
Discuss, decide, then deploy. Never let the LLM skip straight to implementation.
The moment you see it generating code during what should be a design discussion, you’ve lost control. Pull it back. “Let’s talk through the approach first.”
Your job: Architecture, design decisions, quality standards.
The LLM’s job: Remembering where everything is, handling boilerplate, implementing the approach you designed.
When you maintain that boundary, AI-assisted coding stops being chaos and starts being productive. You get to code faster without sacrificing quality. You get to work all day without burning through API credits. You get to ship code you actually understand and can debug when things go wrong.
And honestly? It makes coding fun again. I’m not fighting with an overager assistant that keeps rewriting my work. I’m designing systems and having them built to spec. That’s what I wanted all along.
(Written by Human, improved using AI where applicable.)
