We are all architects now. Congrats on the promotion.

In the last month or so, I have been working on a greenfield project that does a lot of complex things. The code is robust to a reasonable degree, quick, and works as expected. I, however, barely know the code other than the structure I communicated in the design conversations. In every sense, I have orchestrated a full-fledged system that may very well be a Frankenstein one, but it works as expected. I know a little more than I care to admit here. I also know the memory and space constraints and optimizations that need to be done at scale.
Today I spent some time cooking up a two-pointer fuzzy matching algorithm for an evaluation framework. I wrote up the specs and gave them to Claude, asking to develop it in a TDD fashion. It wrote up several unit tests covering the two-pointer scenarios and built the evaluation algorithm in one shot. In subsequent iterations, we optimized for performance with review sub-agents.
The question of identity
Suffice it to say, less than 3 months into the year, my function has changed drastically. My previous hesitations around AI taking away my identity are both true and false. See my previous post 5,000 Lines of Trust for more on this. However, the skillsets I have built over a lifetime around systems thinking and product intuition are helping me function at a pace bottlenecked by my own consumption and feedback pace, and processes we have built as guardrails.
2 meta questions These are exploratory thought experiments, not proposals or recommendations. on that front:
A. If your code was written with AI and reviewed by AI, do you need a human reviewer for every PR, other than yourself?
B. If we let a long-running agent build the entirety of the last month’s work with occasional human feedback, do we need human developers to raise PRs?
These aren’t questions of panic. Rather, it’s written from a vantage point through reflection on possibilities that is incredibly freeing. That context setting makes me wonder at the purpose of a human in the loop here. From a development view, I see us as architects. From a PR review perspective, we remain as policy setters. Mapping back to the architects, the job was to shape systems and streamline execution without causing roadblocks. If an AI reviewer is able to reliably assess that the code changes followed the architectural principles [with review, tests, integration], adding myself as a man in the middle is a bottleneck that is vestigial and ceremonial. Essentially, I don’t need to check if the pipes are leaking (the AI agent already did that); I need to check if the pipes are leading to the right room.
We’re entering an era where systems work before they are understood. I expect both personas to see a rapid transformation, where the intent validation gets more emphasis over the implementation specifics.
The paradox of surface architects
Code was always the medium. Judgment was always the craft. Ahmad Al-Dahle, The Future of Software Engineering Isn’t What You Think
Systems thinking has always been the key. Before the rise of AI agents, it was a craft one developed as they hand-wrote systems from scratch, integrated various systems as components of a mega system, and architected a system into a general-purpose platform that accommodates multiple use-cases. Now, AI agents can do that in a few hours, while ironically and simultaneously estimating the effort to several weeks.
In a different setting, we once pushed back against the product team because the specs didn’t make sense. In the new world, we are incresingly expected to be both engineers and product thinkers. We need both the why and how in the design thinking, as we instruct our coding agents to perform the deed.
So the engineers pre-AI agents built intuition the hard way. What does that entail for the new grads? Or even for engineers working on new areas that they don’t have deep expertise in? I think there is a risk of creating a generation of “surface architects” who can describe a system but cannot diagnose a failure when the abstractions leak. How can they reason about the failure modes of this system? A recursive loop of relying on agents to fix the code written by agents leaves me unsettled. Today, we can use different models as reviewers to prevent model family biases, but it still leaves a lot desired from a trust standpoint.
These days, I find myself sending a review agent to do daily code reviews on top of per PR code reviews to take into evaluation purview a larger granularity of code and come up with a strategic plan of optimizations. I am still in the loop, asking specific design questions and rejecting choices of solutions or things that are wrong, most likely due to a lack of context specification.
From that vantage point, perhaps the way we develop new intuitions lies in deep observability and debugging. Effectively auditing systems like a systems pathologist forces one to look at gaps and flaws. We could rely on AI to audit, but to avoid falling into the circular reasoning traps, the focus moves to effective hypothesis building. This requires having a clear mental model of the system before auditing for the answer.
I think there’s value in this “auditing” regardless of experience in this field. For one, this adversarial testing mindset can actually help us create sound arguments that can be validated against observed behaviour. Second, intuition is built on the things that cannot be hallucinated, i.e., must rely on observed behaviour.
Looking forward
We now coordinate AI agents to think across the product stack from design to infra and across various interconnected components. We get continuous feedback from AI like drinking water from a firehose, faster than we can synthesize. In many ways, our job has evolved to synthesize that feedback into a format consumable by the next iteration with our intuition.
For example, I am seeing noticeable productivity gains in assessing existing systems with AI-generated documentation. We can effectively do incisive integration assessments at such a rapid pace that it’s almost unsettling. Now, it’s about validating the spec generated by AI, which more and more is a cursory glance to see if something is off.
I think we have fundamental new problems in this state space. For example, what should an interview look like? If an agent can solve a convoluted two-pointer problem in minutes, what signal does Leetcoding give other than serving as a rejection filter? We should shift toward debugging systems or refining specs, i.e., giving a candidate a flawed AI-generated architecture and seeing if they have the intuition to spot the “off” not, potentially without using AI.
We also have a free fall possibility to semantic fragmentation in code - slightly different version of the same logic spread across the code base. For example, the two pointer fuzzy match algorithm I asked for could be reused for various use-cases, rewritten each time with slight variation to accomodate the different needs. A good engineering pattern is to ensure the system maintains a single source of truth, especially when it’s easier to generate a new implementation than to find and refactor an existing one.
The question of the need for a human reviewer also comes from data showing AI-generated PRs actually wait 4.6x longer for review than human ones. Source: AI Technical Debt: 30-41% Increase Hits Developers As discussed above, building auditor agents can help, but how do we build long-term trust in this coordinated setup to keep code maintainable? Trust is built over time, but it only needs one major production incident to break it.
TL;DR
The cost of doing has dropped to zero, leaving only the cost of deciding. We are moving away from the aesthetic of clean code toward the pragmatism of viable systems; we have moved away from striving for perfection to effectiveness. If the code is a Frankenstein, let it be, provided we are not its victim. It is a terrifying level of responsibility, and an incredibly freeing one. It’s time to take responsibility for the one thing agents cannot: the why.