AI Co-Developing and the SDLC
Learnings from a first-pass with AI-DLC
During the last six months of 2025, one of our teams was doing some soul-searching about how to update our Agile rituals to better integrate our AI co-developer tools and practices. Our process was refined over months throughout 2025, but I attended a session at AWS re:Invent 2025 which crystallized a lot of our learnings into a concrete framework that shaped our next steps. By the end of 2025, we were delivering features in days that were multiple sprints pre-2025, and we helped to expand the co-developer strategies other teams as our work integrated with theirs, in order to align our delivery timelines.
As our most recent projects wrapped up, I’ve been capturing here my own observations and learnings from what most contributed most to this success. In addition, since the release of Claude Code/Codex/Antigravity, I’ve been using my own projects, playing the roles of TPM/Lead/Dev, to serve as test environments for applying these frameworks in smaller teams, to see how well the learnings have translated to the agentic coding tools, with even more responsibilities delegated to coding agents.
From SDLC to AI-DLC
This framework, which the re:Invent session speakers called the AI-DLC, or AI driven development life cycle, was refined within AWS during the growth of Q Developer (previously CodeWhisperer) and Kiro, but could be applied to different Agent-driven co-developer models and workflows as well. We first began applying it in our workflow in mid-2025, with the publication of the AWS blog post “AI-Driven Development Life Cycle: Reimagining Software Engineering” and the accompanying white paper: “AI-Driven Development Lifecycle (AI-DLC) Method Definition”.
With that context, a couple of quick caveats: many of the following observations leverage guidance from the session "Introducing AI driven development lifecycle (AI-DLC)” with speakers Anupam Mishra and Raja SP from AWS; along with an accompanying workshop. In addition, the demo shown in the linked session can look very similar to what might have contributed to some of the recent negative headlines coming out of Amazon (and which they have disputed) so YMMV on these approaches.
In that vein, my first general recommendation: Ensure that guardrails (both technical and procedural) are defined and developed first before development automation. Agents are not just automation, they are automation + guardrails. The weight to the latter increases with scale.
Challenges to AI Co-Development
At the re:Invent Session, Mishra and SP highlighted some observations about where AI co-development helps and hurts teams, branching off of the commonly-cited observation about observed efficiency gains in existing development teams being capped at around 10-15%. They also discuss some example explanations of this cap, from METR’s July 2025 report on AI co-developers slowing down experienced developers:

They also describe two Anti-Patterns in teams leveraging AI co-developers:
AI-Managers: AI autonomously builds and maintains software, with near-zero human involvement. This causes a lack of connection between the human developers supporting the project and the code itself, often leading to near-total rebuilds when the first problem arises. (This ties into the concepts of Cognitive Debt and Intent Debt championed by Margaret-Anne Storey at the University of Victoria, for example in this paper.)
AI-Assistants: Developers still perform almost all coding and only apply AI co-developers in narrow subtasks during development (eg. code-completion). This limits the amount of actual efficiency gains, and often extracts those gains in other SDLC stages.
The AI-DLC framework is intended to get higher efficiency above the observed 10-15% cap, which can be applied to production systems.
AI-DLC: High-Level Development Cycle
AI-DLC at a high-level involves the following loop for each project, and each development stage of a project:
AI Creates Plan: For initial projects, the humans define the project and the details, and provide the context for the AI to generate a first-pass of the deliverables. At this stage the goal is finite buildable components, similar to the traditional story-based approaches. This is the stage the loop returns to for iteration on the project deliverables (similar to the iteration prompts within a Claude Code cycle, for example).
Humans Verify the Plan: Given the outputted plan from the AI, the stakeholders in the project review the plan to ensure alignment. Often the AI-generated plan will be overkill, so the verification will involve pruning and clarification. The Product Team should recognize the requirements, and the developers should recognize the components as how they would have built it. All stakeholders should review it with the understanding that it may be off to the races for the AI after the handoff, so sources of ambiguity should be identified before moving on.
AI Refines the Plan: Input from human verification is used to update the requirements and to translate them into the individual tasks that will be performed as executable units during the
AI Executes the Plan: together with…
Humans Verify the Outcome: These two steps are intermingled depending on how the project execution occurs. For example, for developer-led projects where the AI coding agents are only able to submit PRs that build on each other, the human verification is frequent and ongoing during development; if all coding is handled by the AI coding agents, then the review may be largely functional and at the end of the loop. On delivery of the project, the stakeholders review the output and continue the collaboration with the AI, either on refinement and feedback, or on extension with additional features and requirements.
Note that attentive users of Claude Code/Codex/Antigravity/etc will see the above in action when observing the actions of these tools during development, and in iterating on the output after task completion.
AI-DLC Framework: Inception, Construction, Operation
To implement this high-level loop, the AI-DLC is broken into three stages:
Inception (Steps 1 and 2 from loop): This is where the project is defined by the human stakeholders and AI co-developers. The humans meet (in real-time) to define requirements, the AI develops the project plan from the context created by the humans.
The real-time requirement here is important since this step creates the artefacts that feed into the AI’s actions in the rest of the loop. One recommendation from the framework is to have this stage be timeboxed, but prioritized to include all relevant stakeholders.
Construction (Steps 3-5 from loop): The human developers take ownership on the human-side at this stage as technical specs like domain model and architecture are finalized here — including providing context that can help guide the AI co-developers — along with code generation, code review, and testing.
Operation (Steps 5 and the handoff back to 1): The automated deployment of the approved generated-code occurs here, along with the other automated systems in place to handle the underlying project infrastructure (eg. anomaly detection, incident monitoring, etc.).
The AI-DLC also provides some alternative terms for some of the more familiar Agile terms, but most importantly are the following (with others reviewable here):
Mob Elaboration: the process of defining the team that will inform, supervise, and review the project deliverables (occurs in the Inception stage)
Mob Construction: the process of providing technical context, guidance, and decisions, often in realtime during code generation (occurs during the Construction stage)
the “Mob” during Mob Construction is recommended to be small (eg. 1 full-stack developer, 1 business representative, 1 specialist) and working as closely together as possible
Intent: a high-level statement of purpose that defines what is to be delivered, which the AI will use to guide the decomposition of the context into executable tasks
Unit: a self-consistent, self-contained unit of work, derived from an Intent, often with a measurable value that is intended to be delivered
Bolt: analogous to Sprints in Scrum, the smallest iteration for the implementation of a Unit
As I’ve been working with others on new projects with Claude Code or Codex, I’ve been adjusting my language where possible into these terms, as I find they carry less baggage during faster AI-guided collaboration than some of the familiar Agile terms.
With all the terminology in place, the above shows the best capturing of the three stages of AI-DLC:
Inception:
Expose the AI to the existing codebase to populate the context;
Create the Intents with user stories (sometimes with AI generation and human review); and
Use these to decompose the work into executable Units.
Construction:
Populate the technical details and requirements;
Generate code and test;
Add architectural components for integration;
Deploy with CI/CD and tests into dev & staging environments.
Operation:
Deploy in production with CI/CD;
Manage incidents with automated monitoring and other infrastructure agents.
AI-DLC Recommendations
At the re:Invent session on AI-DLC (see above links) I was encouraged by seeing how our team’s practices aligned with the framework already. But what I described to my team afterwards as ‘life-changing’ coming out of it were the “Lessons Learned”, which captured some things that we had been observing already in our early stages of using this framework. They are all available here, but these in particular were immediately actionable and quickly created additional efficiencies:
Team-Centered Learnings
For production systems, avoid “vibe-coding”: Even with AI co-developers, human developers are responsible for the code they ship, so if we/they can’t understand it end-to-end, and can explain and support it, then we shouldn’t ship it. One goal of the Mob Construction stage is to increase ownership of the human developers for the AI-generated code, so that it is never advanced ahead of the stage where it is understood.
Ask AI to do small, simple tasks: The more complicated the task, the more opportunities for ambiguity. These may take the form of unclear goals leading to building unnecessary features; or deliverables that are so large that they can’t be reviewed and understood and thus has no path to production.
More context doesn’t always mean better: Blindly attaching an entire codebase to the context window may result in inconsistent coding practices, legacy code providing misleading architecture guidance, and so on. Similarly, in chat dialogs with AI co-developers, often earlier messages lead to earlier solutions, but can pollute the context window when kept for too long; to avoid this, reset your chat dialogs regularly, possibly at the Unit level.
Unit tests are also AI tools: Unit tests can increase developer efficiency, giving them confidence that bugs will be caught early. But with AI co-developers, unit tests are superpowers, in that they enable AI to check its own work automatically and self-correct when needed.
This is particularly important for managing bugs from generated code: if the number of bugs per line of code stays fixed, but the number of lines of code increases exponentially due to the efficiency gains from AI co-developers generating code, then the number of bugs generated will also increase exponentially. Robust unit tests that the AI can use to test its own work will enable this potential bug growth to stay manageable.
Invest heavily in reliable pre-prod environments: Similar to the last point, just as unit tests are AI tools to enable agents to check their work, reliable dev environments that simulate prod for end-to-end testing allow a safe setting for AI agents for testing to have free-reign on a representative environment, to catch issues that occur outside of the scope of the unit tests. (Ideally all environments would have consistent CI/CD workflows to enable that testing as well.)
Maximize semantics-per-token: Describing a task in detail provides many extra tokens which each provide an opportunity for the generation of ambiguity. Using technical terms which have clear defined meanings will yield more accurate output. (Note: Another benefit of having experienced knowledgeable technical team members involved at each stage in that the gap between them and token clarity is smaller.)
Quick aside: This is great advice for any LLM prompting, not just within AI-DLC framework.
Keep only the MCP servers you actively use: Since each MCP server has the full context of all the tools inside it, and all of the tools for all MCP servers live together in the same context window, each MCP server provides additional opportunities for ambiguity to arise for the AI.
Organization-Centered Learnings
Protect contiguous time for builders: For a developer, every use of Stack Overflow or Google is a distraction from being in the IDE building code. With many of these interruptions to a contiguous development flow, additional interruptions due to meetings could feel like less of an interruption. With AI co-developers providing those entry points inside the IDE, developers can now stay in the flow longer, and with AI co-developers they can be more efficient in that time. With this type of workflow in place, the interruption cost of a meeting increases dramatically. Meetings are still important (sometimes) but aim to have contiguous meeting time and contiguous building time, and minimized interruptions between them.
AI Agents are distinct from Sr Engineers: Swapping out Engineers for Co-developers is a not a 1:1 trade. Coding agents often overthink and over-engineer. They make assumptions in the face of ambiguity, and ambiguity may look different to an AI than to a human. They can make basic errors that can get corrected with even minimal pushback or follow-up. An Engineer plus an AI co-developer can be a multiple over two engineers, but an AI co-developer on its own can be a negative multiple over a single engineer. Questioning the AI in detail at every stage of the planning process will yield optimal results… and it won’t take it personally.
Align as an organization on productivity metrics: Further integration with AI agents, particularly during the period of transition, can raise opportunities for distrust, both from management towards developers (are they still working while the AI is generating the code?) and developers towards the process (what is the concrete footprint connected to me within this process?). Unclear expectations can create lack of buy-in for the entire AI-DLC framework, and result in failure to achieve breakout efficiency gains. This is particularly complicated by the fact that traditional metrics are broken by this new way of working: lines of code doesn’t tell you much anymore and can often be an anti-pattern. One proposed metric is time from ideation to delivery, ideally with a baseline pre-AI for comparison.
Don’t get too attached to your code: Traditionally, the tradeoff between rebuilding vs patching as a path to resolving technical debt was any easy choice, due to the high-cost and long turnaround time involved in a rebuild of existing infrastructure. Efficiency gains due to AI co-developer integration change that calculation: if a team of AI co-developers can rebuild existing infrastructure from scratch faster than working from an existing codebase, then that becomes a viable path worth considering. Worth considering in this calculation, however, is the existing internal human knowledge required to support a rebuild: even if the rewrite is inexpensive and fast, it doesn’t help to just replace technical debt with comprehension debt and intent debt.
Be willing to experiment and fail fast: AI-DLC is a shift from traditional Agile practices, and the tradeoff is velocity in exchange for uncertainty and unfamiliarity. A side benefit of gaining velocity is that the cost of an individual experiment is less… but with that same uncertainty. This exchange is best utilized by embracing the uncertainty and allowing some freedom for building some experimental features — in particular narrow ones while transitioning to AI-DLC — and allowing the fast delivery to make up for the cost of any fast failures. (With sufficient guardrails in place!)
One way that our team handled this during the AI-DLC transition was to pair two features together which touched similar stacks or parts of the codebase, and to measure the speedup from the delivery of the first to the second. This enabled us to quickly iterate on the process while it was fresh in our heads, without also adding uncertainty from the change in the components of the system that was being touched.




