AI Coding Tools for Production Engineers

In this lesson: pick your paradigm, run a mix of models, and put your effort into the harness.

Byron Mackay · 60 min · April 2026
Released April 15, 2026

Top 3 takeaways

01

Decide whether you work in the code or above it

Some tools keep you inside the code as a co-pilot, the way Cursor does, while others put you above it as an architect, the way Claude Code does. The right choice depends on how large your codebase is and how far along the project has come.

02

Use a different model for each task

Code generation has become a commodity, so you should match the model to the stage of the task and have a separate model review what the first one wrote. A well-built multi-agent workflow can beat a $200 subscription while costing around $45 a month.

03

The harness is where reliability comes from

Reliability comes from the environment around the model, which should constrain it, inform it, verify its work, and correct it. At 95 percent reliability per step, twenty steps leave you with only about 36 percent end-to-end success, so the harness is where your effort pays off most.

Byron Mackay

Byron Mackay

Director of Learning, Gauntlet AI

Director of Learning at Gauntlet AI, currently training hundreds of engineers to work AI-first. 16+ years as a mobile/iOS engineer before becoming an AI platform engineer (Savant, School AI), where he built eval platforms from scratch. Led curriculum development at BloomTech (a cohort of his saw nearly every graduate land an engineering role at Amazon) and ran the Amazon partnership/SDR program that moved non-traditional candidates into engineering roles at Amazon. Deep across platform engineering, AI, mobile, and learning.

Lesson notes

A written walkthrough of the lecture, covering the patterns, the code, and the things that trip people up.

Three Decisions Before You Build Anything

Byron Mackay argues that choosing AI tools comes down to three decisions:

  • Paradigm — How you work with AI.
  • Models — Which models you use for different tasks.
  • Harness — The system that keeps everything reliable.

His central argument is that the model matters less than the system around it. Choose the right workflow, use the right models for each job, and invest most of your effort in building a strong harness.

Decide Whether You Work in the Code or Above It

The first decision is how you collaborate with AI.

Tools like Cursor keep you in the editor. You remain the primary developer while AI accelerates your work through suggestions and code generation.

Tools like Claude Code move you into more of an architect role. You define the objective, review plans, and approve changes while the agent handles implementation.

Neither approach is universally better. Smaller projects often benefit from the hands-on workflow of Cursor, while larger, more mature codebases are better suited for autonomous coding agents.

Regardless of the tool, the quality of your specification determines the quality of the output.

Use Different Models for Different Jobs

One of the biggest mistakes is treating every model the same.

Instead, assign models based on the task. Code generation, code review, research, and planning all benefit from different capabilities. A common pattern is to let one model generate code while another reviews it, reducing blind spots and improving quality.

The goal isn't to find the single best model. It's to build a workflow where each model contributes where it's strongest.

The Harness Is What Makes AI Reliable

The agent writes code. The harness makes sure that code is trustworthy.

Byron compares the relationship to a computer: the model is the CPU, the context window is RAM, and the harness is the operating system.

A good harness performs four jobs:

  • Constrains the agent.
  • Provides the right context.
  • Verifies outputs.
  • Corrects failures.

Without these controls, reliability falls quickly as workflows become more complex. Production systems succeed because they include planning, verification, testing, and evaluation—not just generation.

Where the Real Work Is Now

As AI becomes better at writing code, engineering shifts toward designing reliable systems around it.

Byron recommends limiting agent execution, maintaining progress logs, assigning different models to different stages, mocking external dependencies during testing, and building observability into every workflow. The biggest opportunity isn't creating another coding agent—it's building the harnesses, evaluations, and infrastructure that make AI dependable in production.

His closing advice is simple: if you're looking for where the industry is heading, learn to build harnesses. They are becoming the new foundation for AI software.

FAQ

Which AI coding tool is best? +
There is no single best AI coding tool, so you should choose deliberately across four areas, which are coding tools, model selection, harnesses, and agent frameworks. Pick the simplest option that meets the need, and match the model to the difficulty, latency, and cost of the task rather than defaulting to the biggest one.
What AI coding tools do production engineers use? +
They decide across four layers, which are coding tools, the model, the harness that holds the scaffolding of prompts, retries, and tool calls, and agent frameworks. They reach for a framework only when they genuinely need multi-step autonomy, and the rule they follow is to start minimal and add a piece only when a concrete need appears.
How do you pick the right model? +
Match the model to the difficulty, latency, and cost of the task rather than defaulting to the biggest one, and test your candidates against your actual workload before you commit.
When do you need an agent framework over a simple script? +
Reach for a framework only when you genuinely need multi-step autonomy or orchestration, since for many tasks a straightforward harness is simpler and easier to keep reliable.
How do you keep an AI setup from getting too complex? +
Start with the minimal setup that works, add a component only when a concrete need appears, and prefer tools you can reason about and debug.
What is a harness in this context? +
A harness is the scaffolding around the model, including prompts, retries, tool calls, and control flow, that turns a raw model into a dependable system.
How often should you re-evaluate your tool choices? +
Check in periodically, since the field moves quickly, but make a change when a tool is clearly holding you back rather than switching for novelty.

What's next?

Keep building with the rest of Night School, or apply to Gauntlet — twelve weeks of technical intensity with the best AI engineers we can find.

▶ Play lesson Slides