You MUST Review AI-generated Code

2026-03-05

In 2016, the golden rule of programming was "don't blindly copy-paste code from StackOverflow". Well, it's 2026, and the rule requires an obvious update: "don't blindly copy-paste code from AI".

AI-assisted development provides great value, there's no doubt about that. But your AI tool can quickly turn from a PhD-level developer to a kindergarten-level monkey, without you even noticing. If you miss this tipping point, disaster awaits.

Recently I was working on my pet project April⋅SSG — a tiny static site generator that I built to power this website. I used Claude Code for a complete rewrite of the tool, and it did a great job. However, as the session went on for a couple of hours, I noticed that the code quality started to degrade. I had to read each and every line of code it generated, tell Claude what was wrong, and ask it to fix things.

An Example: Writing Test Cases

I asked Claude Code to write test cases for a function that generates HTML from markdown. I had a custom markdown processor for generating the <img> tags with the image size specified within the title field, and wanted to test that.

Here's the expected input:

![Alt Text](/images/image.jpg "Title 100x50")

And the expected output (multiple lines for readability):

<img src="/images/image.jpg" 
    alt="Alt Text" 
    title="Title" 
    width="100" 
    height="50">

Here's the test case that Claude generated to test if the above markdown is correctly converted to the expected HTML:

it('renders image with title, alt text, and dimensions', () => {
    const html = markdownToHtml(...);
    assert.ok(html.includes('src="/images/image.jpg"'), 'should have src');
    assert.ok(html.includes('title="Title"'), 'should extract title without dimensions');
    assert.ok(html.includes('alt="Alt Text"'), 'should extract alt text correctly');
    assert.ok(html.includes('width="100"'), 'should parse width from title');
    assert.ok(html.includes('height="50"'), 'should parse height from title');
});

At first glance, it looks good. But did you notice the problem here?

If you look closely, you'll notice that the test case is wrong. It checks for the presence of the title, alt, width, and height attributes, but it doesn't check if everything is within the same <img> tag. If these attributes are present anywhere in the generated HTML, the test will pass, even if they are not correctly associated with the same image.

The problem with test cases is that, if the test case itself is wrong, there's no point in writing them.

I asked Claude "Shouldn't we check if everything is within the same <img> tag and not just anywhere in the HTML?" and it said in surprise "Oh, you're right! Let me fix that".

Then it generated the updated test case:

it('renders image with title, alt text, and dimensions', () => {
    const html = markdownToHtml(...);
    const imgTag = html.match(/<img[^>]*alt="Alt Text"[^>]*>/);
    assert.ok(imgTag, 'should find the image tag');
    const tag = imgTag[0];
    assert.ok(tag.includes('src="/images/image.jpg"'), 'should have src');
    assert.ok(tag.includes('title="Title"'), 'should extract title without dimensions');
    assert.ok(tag.includes('alt="Alt Text"'), 'should extract alt text correctly');
    assert.ok(tag.includes('width="100"'), 'should parse width from title');
    assert.ok(tag.includes('height="50"'), 'should parse height from title');
});

This now checks for the presence of an <img> tag and then verifies that all the expected attributes are within that same tag, ensuring that the test is accurate and reliable.

The Tipping Point

I had been working with Claude Code for hours in the same session. My hypothesis is that once Claude Code triggers auto-compaction, it causes the model to lose important details from earlier in the session. Other AI tools can have their own tipping points too.

Whatever the exact cause, the pattern was clear — quality degraded over time. This may improve in future models. Still, reviewing the generated code regardless is critical to ensure that nothing bad slips into the commit without you noticing.

Don't blindly trust AI-generated code. You MUST review the AI-generated code before committing it.

Prediction: Broken Software Everywhere

Thousands of developers are currently writing software with AI tools. I assume many of them are committing code without thoroughly reviewing it — especially junior developers who fancy the idea of AI as a magical code generator.

The consequences may not be evident now, but in a couple of years, we will start seeing broken software everywhere — bombarded with bugs that were committed years back. Imagine an AI-generated, unreviewed authentication check that looks correct but has a subtle logic flaw, or a test suite that passes because the AI wrote both the code and the tests with the same blind spot. These are the kinds of bugs that sit quietly in production for years before someone discovers them.

Writing a To-Do App in 2027 using plain English (with fictional codebase)
My Two Predictions for AI in 2026 (and they're Bad) — "Human Generated Content" disclaimers, and AI-everywhere.

#tech #ai

You MUST Review AI-generated Code

An Example: Writing Test Cases

The Tipping Point

Prediction: Broken Software Everywhere

Related Posts: