I’ve been getting a lot of questions about how I manage AI-assisted development and what exactly my workflow looks like when I am building features with Claude Code and Codex, so I decided to write this down, not just as a guide, but as a real experience from months of trial and error that changed the way I approach software development.
The key insight? Stop treating AI as a single tool that does everything. Split the responsibilities. Make it focused.
Let me show you exactly how I do it.
The problem with “Vibe Coding”
If you’ve done any AI assisted coding, then you probably experienced the moment when you give a prompt and AI starts implementing things and suddenly around the 50th line of code it completely forgets what you asked for or worse yet, it starts making decisions that don’t align with your project structure, uses outdated patterns or creates technical debt that you’ll pay for later.
I faced this exact problem when building features on top of Sayna: the open-source voice layer for AI agents I am working on: the codebase grew, patterns became more complex and suddenly AI tools started to struggle with keeping context.
The typical “give it a big prompt and hope for the best” approach ruined my productivity: sometimes it took longer to fix what AI has generated than to write from scratch. Not exactly the future I had signed up for!
The mental model shift
Here is the thing that changed everything for me: AI models have different strengths Just like you wouldn’t ask a backend engineer to design your marketing materials, you shouldn’t expect one AI interaction to handle both research AND implementation.
So I started to split things:
- Codex for research and task composition
- Claude for actual code implementation
Sounds simple, right? BUT the way you structure this workflow makes all the difference between chaos and a smooth development pipeline.
The CLAUDE. md Foundation
Before we dive into the workflow, let me explain the most important file in any AI-assisted project: the CLAUDE. md file.
This is where I keep everything that defines how my project works:
- Project overview and goals
- Commands and scripts
- Project structure references
- Theme guides and design patterns
- Cursor rules and conventions
- Best practice guidelines
Whenever Codex or Claude takes a look at this file, they immediately have all the references needed to complete any task. It is like giving someone a full orientation before their first day at work.
# Project: LatestCall
## Overview
Phone call scheduler built on top of Sayna.ai voice infrastructure
## Tech Stack
- Next.js with App Router
- Prisma with SQLite (dev) / PostgreSQL (prod)
- MobX for state management
- Shadcn/ui for components
## Commands
- `npm run dev` - Start development server
- `npm run build` - Build for production
- `npx prisma migrate dev` - Run migrations
## Best Practices
- Always use server components by default
- Client components only when needed for interactivity
- Reuse Shadcn components, never create from scratch
- Follow MobX store patterns from /stores
This file becomes the single source of truth and AI models understand context instantly when they are referenced in the prompts without burning tokens on exploration.
The Task Separation Workflow
Here’s the core of my workflow, and this is where things get interesting.
Step 1: Describe what you want
I start by writing a clear description of what needs to be done, let’s say that my database schema has a scheduleday table that is completely unnecessary – it should just be a field inside the schedule table.
Instead of immediately asking Claude to fix it, I go to Codex first with a specific role definition:
You are a senior project manager with deep technical knowledge.
Your job is to create detailed task files for implementation.
Reference CLAUDE.md for:
- Best practice guides
- Project structure
- Existing patterns and conventions
Output: Create task files inside /todo folder
Each task should be:
- Independent and self-contained
- Reference specific files to modify
- Include acceptance criteria
- Small enough to implement in one focused session
Step 2: Let codex plan and research
This is the magic part. Codex will:
- Read the CLAUDE. md file
- Explore the codebase to understand current implementation
- Identify all files that need changes
- For each work piece create separate task files.
The output looks something like this:
todo/task-001-update-prisma-schema.md
# Task 001: Update Prisma Schema
## Goal
Remove ScheduleDay table and add days field to Scheduler model
## Context
- SQLite doesn't support array types
- Use JSON field instead
- Reference: CLAUDE.md best practices for Prisma
## Files to Modify
- prisma/schema.prisma
- Generate new migration
## Acceptance Criteria
- [ ] ScheduleDay model removed
- [ ] days: Json field added to Scheduler
- [ ] Migration runs without errors
todo/task-002-refactor-server-logic.md
# Task 002: Refactor Server Side Logic
## Goal
Update all server-side code that references ScheduleDay
## Context
- Check API routes in app/api
- Update any services using ScheduleDay relations
- Reference: MobX patterns in CLAUDE.md
## Files to Modify
- app/api/schedule/route.ts
- services/scheduler.service.ts
## Acceptance Criteria
- [ ] No TypeScript errors
- [ ] API responses maintain same structure
- [ ] Tests pass
And so on for each piece of the implementation.
Step 3: Execute tasks one by one
Here is where Claude comes in: I have a simple bash script that:
- Lists all task files in the /todo folder
- Feeds each task to Claude one at a time
- Captures the output for reference
#!/bin/bash
for task in todo/task-*.md; do
echo "Processing: $task"
claude -p "$(cat $task)" > "results/$(basename $task)"
echo "Completed: $task"
done
The critical part is that each task is executed in isolation Claude sees only one task file, not the entire todo list, this keeps the model focused and prevents context pollution.
When there is a specific thing to do, always focus on one thing at a time and then proceed based on the changes. Exactly the same principle applies to AI-assisted development.
Step 4: Capture Results
I also keep results files for each task executed, which becomes invaluable when something breaks later:
results/
├── task-001-update-prisma-schema.md
├── task-002-refactor-server-logic.md
└── task-003-update-ui-components.md
If a bug or a reference is broken, I can tell Claude that you had this output in the past, explore what’s going on based on the tasks: keeping the context alive rather than starting from scratch every time.
Why this works so much better?
Everything focus is
When AI gives a massive prompt with multiple requirements, it tries to solve everything at once, the reasoning process gets fragmented and the quality drops significantly.
Each Claude execution is laser focused by breaking tasks into small independent units: it knows exactly what to do, has all the references it needs, and can apply deep reasoning to just that one problem.
Context Window Management
AI models have limited context windows: when you package everything together, important details get lost in the middle, but with individual task files each execution starts fresh with the context that it needs.
Tasks that would overwhelm a single session work perfectly if they are split into 5-6 focused pieces.
Parallel potential
While I run tasks sequentially to ensure proper dependency resolution, this approach opens doors for parallel execution when tasks are independent. Imagine putting up multiple Claude instances, each working on a different task, all from the same todo list.
Quality Verification:
Because each task has clear acceptance criteria, verification becomes straightforward. After Claude completes a task the third task in my workflow is usually “verify that everything was correct”:
# Task 003: Verify Implementation
## Goal
Ensure all changes from previous tasks are consistent
## Verification Steps
- [ ] Run build: npm run build
- [ ] Run tests: npm test
- [ ] Verify MobX store patterns
- [ ] Check component rendering
This catches issues early before they manifest across the codebase.
The Overnight Development Experience
One thing I want to emphasize: this workflow is not always fast in the moment: some features require 10-15 task files and running them all can take hours.
BUT here’s the beauty: you can start it and walk away.
I’ve had sessions where I kicked off a major refactoring before bed and woke up to a fully implemented feature. The bash script keeps adding tasks, Claude keeps implementing and the results keep accumulating.
This changes your relationship with development time: instead of being chained to your IDE making small tweaks, you can think about the big picture while AI handles the execution.
Real Example: Database Schema Refactor
Let me share a concrete example from the project I mentioned earlier.
The problem: I had a ScheduleDay table that was completely unnecessary. Days of the week should just be a field in the Scheduler model, not a separate relationship.
Without this workflow: I would have asked Claude to “fix this” and watched it struggle with re-using all the places that reference ScheduleDay, probably breaking things along the way.
*With this workflow:
- Codex examined the codebase and created 3 task files
- First task: Update Prisma schema (figured out SQLite requires JSON, not arrays)
- Second task: Refactor all server-side code
- Third task: Verify everything works
Each task independently ran, Claude maintained focus and completed the refactoring without a single syntax error. The migration passed, tests passed and the feature had first run.
That’s the difference between hoping AI gets it right and designing a system where it consistently does it.
Integrating with Sayna Development
Since many of you might build voice applications with Sayna, here is how this workflow works:
When you work on the integration of voice agents, the complexity multiplies;
- WebSocket handlers
- Audio-processing pipelines
- STT/TTS provider configurations
- Turn detection logic in
Breaking these into specific tasks becomes even more crucial: A task like “add Google Cloud TTS support” could become:
- Task: Add provider configuration to config. yaml
- Task: Implement the struct GTTSProvider
- Task: Register provider in VoiceManager
- Task: Add endpoint to voice list
- Task: Write integration tests
Each piece is manageable, combining to provide a complex feature reliably.
Tools and setup
Here is my exact setup if you want to replicate it:
Directory Structure
project/
├── CLAUDE.md # Main context file
├── todo/ # Task files (gitignored)
│ ├── task-001-*.md
│ └── task-002-*.md
├── results/ # Execution outputs (gitignored)
│ ├── task-001-*.md
│ └── task-002-*.md
└── run-tasks.sh # Execution script
.gitignore
/todo/
/results/
This keeps task files and results out of your repo, are temporary working artifacts and not permanent documentation.
Task Execution Script
#!/bin/bash
TASK_DIR="todo"
RESULT_DIR="results"
mkdir -p "$RESULT_DIR"
for task in $(ls "$TASK_DIR"/*.md | sort); do
filename=$(basename "$task")
echo "========================================="
echo "Processing: $filename"
echo "========================================="
claude -p "$(cat $task)" > "$RESULT_DIR/$filename"
echo "Completed: $filename"
echo ""
done
echo "All tasks completed!"
Common Pitfalls to Avoid
Don’t skip the CLAUDE. md
I’ve seen people try this workflow without proper context files - the tasks end up being too generic and Claude makes decisions that don’t fit the project.
Invest time in your CLAUDE. md., update as your project evolves. It is the foundation everything else builds on.
Don’t Make Tasks Too Large
If a task file is longer than 200 lines or touches more than 3-4 files, split it | Smaller tasks = more focused execution = better results
Don’t run everything in a session.
The whole point is isolation, if you paste all tasks into one Claude session, you lose the focus benefits. Trust the sequential process
Don’t Ignore the Results
Those result files are gold for debugging: When something breaks, you can trace exactly what Claude did and why.
The future of development
I believe that this is how software development will work in the near future - not replacing developers but amplifying what we can accomplish.
The role shifts from typing code to:
- Designing System Architecture
- Defining task requirements
- Reviewing AI output
- Making strategic decisions
It’s a higher level way of building software, and honest, it’s more fun: it thinks about what to build instead of how to type it.
Conclusion
This workflow took me months to perfect and remains evolving, but the core principles remain:
- Dissolve research from implementation: Use Codex for planning, Claude for coding
- Maintain a strong context file: CLAUDE. md is the brain of your project
- Create focused, independent tasks – small pieces execute better than large ones
- Run sequentially, capture everything: isolation prevents context pollution
- Trust the process – Let AI work while you think about the next feature
If you are building voice applications, check out Sayna – I’d love to hear how you’re using it!
If you try this workflow, let me know what you think and I’m always looking for ways to improve it.
Don’t forget to and share this article if it has helped you think differently about AI-aided development!
