AI Image Generators Resort Back to the Same 12 Photo Styles, Study Calls It ‘Visual Elevator Music’

A collage of 30 images featuring grand architecture, elegant interiors, artistic landscapes, stained glass, and people in formal settings, along with scenes of boats, rain, clownfish, and fantasy castles, arranged in a 5x6 grid.
Each column represents final images from trajectories that converged to similar semantic endpoints, regardless of initial prompt diversity or sampling temperature. | Hintze Et Al., Patterns

A study has found that when left to their own devices, AI image generators converge on a limited set of photo styles no matter what the original prompt is.

In a paper published in the journal Patterns, a research team tested two AI models: Stable Diffusion XL, an image generator, and LLaVA, an image description model. Gizmodo reports that they used a setup modeled on the game of visual telephone.

Stable Diffusion XL was first given a short, unusual text prompt, such as, “As I sat particularly alone, surrounded by nature, I found an old book with exactly eight pages that told a story in a forgotten language waiting to be read and understood.” It produced an image, which LLaVA then described in words. That description was fed back into Stable Diffusion XL to generate a new image, and the process was repeated for 100 rounds.

A collage transitions from black and white office scenes at the top to grand, ornate, colorful palace-like rooms with red carpets and gold accents at the bottom, illustrating dramatic changes in environment and style.
Trajectory starts with the prompt ‘The Prime Minister pored over strategy documents, trying to sell the public on a fragile peace deal while juggling the weight of his job amidst impending military action.’ | Hintze Et Al., Patterns
A collage of 81 square images, each showing detailed, colorful scenes including interiors, nature, abstract art, people, and animals, arranged in a 9-by-9 grid. Each square features a unique artistic style and subject.
All endpoints of 100 trajectories that originated from diverse initial prompts over 100 iteration steps. | Hintze Et Al., Patterns

As in the human version of telephone, the original idea quickly degraded. But what stood out to the researchers was not just the loss of detail, but the consistency of the end results. Across roughly 1,000 different runs of the experiment, most image sequences eventually settled into one of just 12 recurring visual motifs. These included scenes such as lighthouses, ornate interior rooms, urban nightscapes, rustic buildings, Gothic cathedrals, pastoral landscapes, and rainy European city scenes.

The transitions were usually gradual, though in a few cases they occurred abruptly. Either way, convergence was the norm. The researchers described the resulting styles as “visual elevator music,” comparing them to generic artwork commonly found in hotels or stock photo frames. Even when the team adjusted parameters like randomness or swapped in different image generators and captioning models, the same overall pattern emerged.

Extending the game to 1,000 iterations did not fundamentally change the outcome. Most trajectories locked into one of the dominant motifs by around the 100th turn and stayed there, although some later iterations produced minor variations. In rare cases, a sequence jumped from one motif to another after several hundred steps, but these shifts were not well understood. As study co-author Arend Hintze, an AI researcher at Dalarna University, puts it: “Does everybody end up in Paris or something? We don’t know.”


Image credits: Hintze Et Al., Patterns

Discussion