A.I., Architects, Creativity, and the Big Red Herring
Why the Whole 'Is A.I. Creative' Debate is a Distraction
In this Post:
You Can’t Measure Creativity!
Putting A.I. Creativity to the Test(s)
Designing Creatively > Creative Designs
I recently gave a talk at the Saskatchewan Association of Architects’ Annual Conference entitled ‘Designing Through Uncertainty: A.I., Disaster and the Search for Transcendence’ where I drew out a rosy, transcendent future for the profession of architecture. However, I also argued that in order to get there, we had to dispense with some of the myths we’ve collectively circulated about A.I. and how it might affect the profession.
One of the most stubborn myths is the whole ‘A.I. isn’t a threat to architects because Architects are so creative, and machines aren’t’ saw. My basic premise, as I shared with the audience, was that the whole debate is something of a red herring. What matters is how creative machines are relative to normal people. Here’s why:
Architects derive their social (and economic) value from being more creative than the average person. Significantly so, according to this study. Architects score 1.5 standard deviations above the mean in ‘Openness’1 when measured by a Five-Factor personality test.
That is a lot of difference in statistical terms. Like, a lot a lot.
For the highly creative person (the architect), that difference of 1.5 standard deviations means that even their average idea is more creative than 93.32% of all ideas had by the average person. In any situation, against any problem, the architect can be trusted to reliably come up with some kind of solution that’s more creative than what you’d get from the Average Joe. It’s almost a statistical certainty.
But what happens when A.I. starts to approximate the creativity of an Average Joe? Or exceeds it?
A recent paper by psychologists at the University of Montreal went straight to the heart of that question (Divergent Creativity in Humans and Large Language Models). The paper concluded that some of the most advanced A.I. language models, particularly GPT-4, can match or even surpass average human performance on creativity tests, and that those results could be further enhanced by changing specific settings or using different prompt strategies. A.I. models were given a creativity test called the Divergent Association Test (DAT) as well as asked to write short creative pieces like haikus and movie summaries.
You Can’t Measure Creativity!
Yes, you can. Of course, ‘creativity’ is subjective, but there’s a few different tests which psychologists use to measure it:
The Alternative Use Test (AUT): Test subjects are asked to generate novel uses for common objects. While the AUT has been used in previous studies to study A.I. creativity, there have been outstanding concerns that A.I.’s could be ‘cheating’ if AUT tests were within their training data.
The Divergent Association Test (DAT): Test subjects are asked to generate a list of 10 words that are as semantically different from one another as possible. Not as easy as it sounds, since our brains are semantic engines – we draw connections between things. So, if I say ‘peanut butter’, you think ‘jelly.’ Admittedly, the DAT is purely language-based, and not necessarily the visual or spatial creativity that architects might be thinking of. But still, the DAT makes for a good test because it’s hard to game and easy to administer at scale.
Divergent Semantic Integration (DSI): This test measures the cosine similarity between successive word embeddings present in a textual narrative. Basically, how creative your writing is.
Putting A.I. Creativity to the Test(s)
In the Divergent Creativity paper, researchers first used the DAT method, and benchmarked their study against the DAT results of 100,000 human test subjects. If you’re still reading, you may have already guessed the results: Gemini Pro was at the same level as humans. GPT 4 surpassed humans by a statistically significant margin.
The results became even more dramatic once researchers started playing with the temperature.2 With the best models, at the highest temperature condition, the A.I.s were able to achieve a mean DAT score of 85.6, which is higher than 72% of all human scores.
The style of prompting also seemed to affect an LLM’s creativity. When researchers told the LLM’s to use "a strategy that relies on varying etymology”, the models’ DAT score went up. When prompted with “using a strategy that relies on meaning opposition”, the models’ DAT score went down.
The researchers simultaneously used the DSI test to measure the creativeness of short writings by the A.I. Here, they found that humans still had the edge in creative output, although not by much. The mean creativity score for humans was about .815, while for GPT 4, it was about .810:
Comparing these results apples to apples against other tests (like the AUT) is difficult, since LLM models are evolving so quickly. An earlier study from University of Amsterdam in June 2022 (Putting GPT-3’s Creativity to the (Alternative Uses) Test) found that humans outperformed GPT 3 on the AUT test. But Good God, that was two years ago, and it was GPT 3. That’s basically a TI-82 Graphing Calculator at this point.
A more recent paper in 2023 (Best Humans Still Outperform Artificial Intelligence In A Creative Divergent Thinking Task) used the AUT and found that, on average, the A.I. programs came up with more original and creative ideas than the humans. However, the authors were very committed to pointing out that the most creative ideas still came from the top-performing humans – which is probably why they named their paper what they did. Frustratingly, the study did not provide any data on how many humans were at these rare heights of creativity. We can infer a bit from the graphs provided with the paper:
All charts basically show the same thing . . . the A.I. (in teal) is higher on average, the humans (in pink) have a wider distribution, and the very highest numbers belong to the humans. For a closer look:
Clearly, there’s a bunch of really creative humans up there who are still outperforming the A.I.s. They are probably architects.
Perhaps this gives some solace to the authors of the paper, but as a creative person, I feel threatened. Here’s why:
Assuming that A.I. is broadly distributed shortly, and it grants its ‘creativity powers’ to the Average Joe, it means that that Average Joe now has the ability to come up with solutions not at his own creativity level, but at the level of the A.I. So, if an A.I. can exceed 72% of all humans in its creativity, then it means that now all humans are at that 72nd percentile, provided they are utilizing A.I. To revisit our earlier graphic:
All of those dumb, not-so-creative ideas that the Average Joe used to have just simply disappear, because now he can just use the A.I. to come up with solutions to his problems. The average creativity of his solutions is now raised, because he won’t have any ideas that are below that 72nd percentile point. The ‘creativity gap’ – the gap between the average creative solution by an architect, and one by an Average Joe, has shrunk by more than half.
Here’s the takeaway: this kind of A.I. ‘creativity’ collapses the gap between architects’ creativity and the creativity of everyone else, diminishing our social and economic value. And before you say ‘well, architects can use A.I. too and stay ahead of the Average Joe.’ Yes, we can use A.I., but the A.I. is less creative than us, it only has an impact on our least creative ideas. Our mean creativity is relatively unchanged.
Designing Creatively > Creative Designs
We have to assume that A.I. will continue to improve, and likely improve in its creativity as well. The three studies I mentioned were from 2022, 2023, and 2024 and they show a definite evolution from ‘Humans are more creative than A.I.’ to ‘the most creative humans are more creative than A.I.’ to ‘A.I. is just generally more creative than humans.’ That doesn’t mean they’re more creative than architects, but that’s the red herring. By giving ‘creativity steroids’ to normal people, A.I. erodes the relative position of what is supposedly our core strength: creativity.
I think the solution is to start using A.I. creatively, as opposed to just using A.I. as another tool to make our creations. I saw someone on YouTube the other day who was using GPT Vision as their personal yoga instructor. Their webcam watched them do yoga, and GPT gave feedback on their postures through a voice module (Whisper, I think). I thought to myself ‘Damn, why didn’t I think of that?!’ And then I remembered that I don’t do yoga. But I think the point is solid.
How can we imagine ways to use A.I. to re-imagine what it is that architects do? I think if we turn our creativity inward and point it towards the future, we’ll probably be okay.
Openness is a quality measured by the Five-Factor Personality test. While it’s not a perfect analogue for creativity, it’s pretty close, and we’re using it as a stand in here. For a full definition, see here.
The ‘temperature’ of an A.I. model is basically a toggle whereby the user can alter the entropy (and thus the randomness) of responses. By introducing greater degrees of randomness, a good prompter can extract more creative responses from the model.