AI Coding Tools for Production Engineers
In this lesson: pick your paradigm, run a mix of models, and put your effort into the harness.
Top 3 takeaways
Decide whether you work in the code or above it
Some tools keep you inside the code as a co-pilot, the way Cursor does, while others put you above it as an architect, the way Claude Code does. The right choice depends on how large your codebase is and how far along the project has come.
Use a different model for each task
Code generation has become a commodity, so you should match the model to the stage of the task and have a separate model review what the first one wrote. A well-built multi-agent workflow can beat a $200 subscription while costing around $45 a month.
The harness is where reliability comes from
Reliability comes from the environment around the model, which should constrain it, inform it, verify its work, and correct it. At 95 percent reliability per step, twenty steps leave you with only about 36 percent end-to-end success, so the harness is where your effort pays off most.

Byron Mackay
Director of Learning, Gauntlet AI
Director of Learning at Gauntlet AI, currently training hundreds of engineers to work AI-first. 16+ years as a mobile/iOS engineer before becoming an AI platform engineer (Savant, School AI), where he built eval platforms from scratch. Led curriculum development at BloomTech (a cohort of his saw nearly every graduate land an engineering role at Amazon) and ran the Amazon partnership/SDR program that moved non-traditional candidates into engineering roles at Amazon. Deep across platform engineering, AI, mobile, and learning.
Lesson notes
A written walkthrough of the lecture, covering the patterns, the code, and the things that trip people up.
Three Decisions Before You Build Anything
Byron Mackay argues that choosing AI tools comes down to three decisions:
- Paradigm — How you work with AI.
- Models — Which models you use for different tasks.
- Harness — The system that keeps everything reliable.
His central argument is that the model matters less than the system around it. Choose the right workflow, use the right models for each job, and invest most of your effort in building a strong harness.
Decide Whether You Work in the Code or Above It
The first decision is how you collaborate with AI.
Tools like Cursor keep you in the editor. You remain the primary developer while AI accelerates your work through suggestions and code generation.
Tools like Claude Code move you into more of an architect role. You define the objective, review plans, and approve changes while the agent handles implementation.
Neither approach is universally better. Smaller projects often benefit from the hands-on workflow of Cursor, while larger, more mature codebases are better suited for autonomous coding agents.
Regardless of the tool, the quality of your specification determines the quality of the output.
Use Different Models for Different Jobs
One of the biggest mistakes is treating every model the same.
Instead, assign models based on the task. Code generation, code review, research, and planning all benefit from different capabilities. A common pattern is to let one model generate code while another reviews it, reducing blind spots and improving quality.
The goal isn't to find the single best model. It's to build a workflow where each model contributes where it's strongest.
The Harness Is What Makes AI Reliable
The agent writes code. The harness makes sure that code is trustworthy.
Byron compares the relationship to a computer: the model is the CPU, the context window is RAM, and the harness is the operating system.
A good harness performs four jobs:
- Constrains the agent.
- Provides the right context.
- Verifies outputs.
- Corrects failures.
Without these controls, reliability falls quickly as workflows become more complex. Production systems succeed because they include planning, verification, testing, and evaluation—not just generation.
Where the Real Work Is Now
As AI becomes better at writing code, engineering shifts toward designing reliable systems around it.
Byron recommends limiting agent execution, maintaining progress logs, assigning different models to different stages, mocking external dependencies during testing, and building observability into every workflow. The biggest opportunity isn't creating another coding agent—it's building the harnesses, evaluations, and infrastructure that make AI dependable in production.
His closing advice is simple: if you're looking for where the industry is heading, learn to build harnesses. They are becoming the new foundation for AI software.
FAQ
Which AI coding tool is best? +
What AI coding tools do production engineers use? +
How do you pick the right model? +
When do you need an agent framework over a simple script? +
How do you keep an AI setup from getting too complex? +
What is a harness in this context? +
How often should you re-evaluate your tool choices? +
What's next?
Keep building with the rest of Night School, or apply to Gauntlet — twelve weeks of technical intensity with the best AI engineers we can find.