DEV Community

Cover image for Everyone Is Teaching AI To Create Images. I Wanted To See If It Could Judge Them.
David Hillier
David Hillier

Posted on

Everyone Is Teaching AI To Create Images. I Wanted To See If It Could Judge Them.

I've spent more than 20 years in architectural visualisation, reviewing thousands of renders for architects, developers and design teams.
Like most people, I was fascinated when AI started generating images.
But after playing with the latest models, I found myself asking a different question:

Could AI evaluate images instead of creating them?

Creating and judging are very different skills.

Most 3D artists know when an image feels "off", but identifying exactly why it feels off is much harder. Professional art direction is largely about visual judgement — understanding which few changes will have the biggest impact on an image.

As a non-developer who has spent the last year teaching myself software development with AI-assisted tools, I decided to run an experiment: Could professional art direction be delivered autonomously if the AI was supported by the right knowledge system?

What surprised me was that the AI wasn't the hard part. Modern vision models are incredibly capable. The hard part was calibration.
Without guidance, the model was:

  • too generous
  • inconsistent
  • overly focused on surface-level observations

The breakthrough came when I stopped focusing on prompts and started focusing on standards. Instead of asking "What do you think of this image?", I built a structured framework around how professional architectural renders are actually reviewed:

  • Composition & Camera
  • Lighting & Grading
  • Materials & Geometry
  • Landscaping & Context
  • Realism & CG Artefacts
  • Narrative & Mood

I also introduced visual reference imagery, because I realised something important: People don't just need feedback. They need to see what good looks like.

The biggest lesson from the project was that AI doesn't magically create expertise but it can amplify it. A weak framework produces weak results.
A strong framework produces surprisingly useful ones.

As AI continues to improve, I think we'll see more focus on evaluation rather than generation. Once everyone can create images, code and content, the scarce resource becomes judgement.

I'd be interested to hear whether others have found the same thing when building AI products. Has the challenge been the model itself, or the knowledge system behind it?

If you want to check out the site and try it out, it's Final01

Top comments (0)