We “Designed” Buildings and Urban Spaces Using an AI Image Generator - Here Are the Surprising Results

Design Outsider co-founders Callum O’Connor and Josh Speedie have been inputting architecture and urban design terms and precedents into Midjourney bots with surprising outcomes. The results highlight some potential challenges and opportunities for built environment professionals and beyond. The rendering output by the Midjourney Image Generator reflects the changing relationship between designers and Artificial Intelligence (A.I.). In this article, the duo discuss the basics of Midjourney, share their initial thoughts, what they found, and finish with a top ten list of recommendations when using Midjourney to produce your own visualisations.

Initial Thoughts

With a background in architecture and urban design, we are interested in how Midjourney can provide a starting point for discussion and respond to our prompts using architecture and urban design precedents and terms. In what ways could A.I be useful for concept development or conveying an idea of a final project? We're starting to investigate how.

While the visuals are interesting, for us, the iterative and evolving process of Midjourney is what is most intriguing.  A.I image generation is only as strong as the prompts being imputed into the machine. Knowing what to write in the prompt is one of the biggest challenges! While there have been flashy visuals being presented on social media using A.I. technology, there has been less written about the process behind the visualisations. We hope the following examples and discussion provide a starting point for others, and inspire you to begin your MidJourney..well…journey! Let’s get started.

The Basics

What is Midjourney?

According to Midjourney, Midjourney is “an independent research lab exploring new mediums of thought and expanding the imaginative powers of the human species.” Uh….say what? Simply put, Midjourney is an artificial intelligence program that creates images from textual descriptions. Write text prompts and Midjourney spits out visuals.

How MidJourney works.

In the Midjourney message feed first type “/imagine” then write a ‘prompt’. A prompt is a set of words that you want to see in a render. For example, you could write “aerial view, city by the sea paint style” for a prompt. The Midjourney artificial intelligence will generate four different variations of the prompt.

Figure 1

You can run the prompt again if you don’t like the way any one of them turned out by pressing the blue button (see Figure 1). If you see something you like, you can click V1, V2, V3 or V4 depending on the visual you prefer. Midjourney will then run the prompt again, producing another set of four renders that more closely resemble elements from the chosen visual. Alternatively, you can choose to ‘upscale’ one of the renders by choosing U1, U2, U3, or U4 depending on your preferred visual. You can also choose to ‘max upscale’ a visual after it is ‘upscaled’ once, for even greater detail.

You can repeat the process, again and again, until you are satisfied - or until your trial runs out! 

For a more detailed Quick Start Guide visit: https://midjourney.gitbook.io/docs/

For frequently asked questions visit: https://github.com/midjourney/docs/blob/main/FAQs.md#image-prompt-questions

Where to Begin?

After some initial research we identified a basic structure for some of the prompts. These are not specific rules to follow, but we found this general structure a useful starting point.

- Start with: /Imagine followed by key idea (noun), (action), (setting/environment).

- Use: "::" to separate the next set of descriptors (adjectives).

- Then add (rendering style).

For transparency, we provided the prompts we used throughout this article. Update and modify them yourself. Changing just a word can have a big impact on the Midjourney A.I visual output!

What We Found/Potential Uses

A general look or feel of a city

We began to experiment if the A.I could generate scenes of a specific city or neighbourhood.

Prompt: “view of Toronto street, winter sunny day, wide shot, unreal engine 5 render, 4k“
We chose one image to upscale.

Not bad. We then began to explore if writing in specific architecture prompts could produce a more realistic rendering of Toronto, Ontario. We wrote into the prompt architectural styles with greater precision. 

Prompt: “an eclectic combination of architectural styles in Toronto, ranging from 19th century Georgian architecture to 21st century, sunny day, wide shot, unreal engine 5 render, 4k”

The prompt produced a general sense of Toronto, but had difficulty producing more realistic buildings and streetscapes. Interesting to note, however, is that the A.I. was able to generate a clear contrast between two cities. For example, there was a contrast using similar language but writing ‘Copenhagen’ in the text prompt in place of ‘Toronto’.

Prompt: "many people, walking, in sunny Copenhagen :: wide shot, unreal engine 5 render, 4k --ar 16:9"
Upscaling produced even greater detail.

Replicating a Competition Submission Render

We then attempted to replicate a competition submission that we previously entered. We were trying to produce a building similar to this:

To read more on our competition submission, see here: The Foundry

In the prompt we attempted to describe the chosen building, the specific location, and environmental details, and we also included a chosen rendering equivalent, Lumion. 

Prompt: “wave shape building made of glass and steel frame with green roof:: located on Detroit Waterfront beside Huntington Place:: sunrise:: water in foreground:: reflection of buildings on water:: lumion render”

We thought after successfully capturing the feel of other cities such as Toronto and Copenhagen, the A.I may be able to recognize the Detroit skyline. Midjourney had a difficult time replicating this. While it generated a waterfront with some towers, it was not distinctly Detroit. Nor could the A.I. generate a specific building near the site (e.g., Huntington Place). The A.I. also wanted to incorporate many buildings into the design. We tried a different strategy.

Midjourney also has an option to insert an image found online into the prompt. To do this, type ‘/imagine’ to prompt as normal, but then in the prompt section, paste in the web address where the image is stored online. After the link you can add new text prompts to complement it. An image URL must be a direct link to an online image, which you can usually get by right clicking on an image and choosing ‘Copy Link Address’. This address will usually end in a .png or .jpg if it's a usable address. 

We inserted the competition render we originally produced ourselves seen above prior to the prompt. We also removed ‘Detroit’ and ‘Huntington Centre’ to see if the A.I. could focus more on the building itself.

Prompt: “wave shape building made of glass and steel frame with green roof:: located on Waterfront:: sunrise:: water in foreground:: reflection of buildings on water:: lumion render, 16:9”

Overall, this did not produce a desired outcome. The prompt could not recognize the description of the building, and that building details should only apply to one building. We have seen examples online with a PNG cut out with only the main building being sent to the A.I. Perhaps this could have improved our results…but on to our next experiment. 

Producing Renders in a similar style to famous Architects

We then posed, would the A.I. have an easier time recognizing an architectural style, or an architect with buildings more readily available through an online search? We started with Bjarke Ingles, keeping the prompt very simple. 

Prompt: "Building in the style of Bjarke Ingels"
Choosing one of the visuals to upscale, the A.I produced this.

This output incorporated some elements of a traditional ‘Bjarke Ingels’ building. Adding more detail in the rendering style and type of shot greatly changed the image output.

Prompt: “Building in the style of Bjarke Ingels, wide shot, unreal engine 5 render, 4k”

We then considered, perhaps another architect may produce different results.

Prompt: “building in the style of architect Frank Gehry, wide shot, unreal engine 5 render, 4k”
We chose this one to upscale.

Inputting Specific Buildings in the Prompt

Identifying that the A.I. could draw inspiration from some architects, we asked ourselves, what if we imputed specific buildings into the prompt? To test the boundaries, we combined a famous building with an alternative typology (i.e., skyscraper). Note we also considered the ‘mood’ of the image adding in additional descriptions such as ‘dramatic lighting’ and ‘mist’, including ‘people walking in foreground’.

Prompt: “Guggenheim Museum Bilbao skyscraper, dramatic lighting, mist, wideshot, people walking in foreground, unreal engine 5 render, 4k”

We upscaled, created variations, upscaled again, created variations choosing what image we preferred each time. We found max upscaling worked best when we were happy with a visual, then made a variation of a max upscaled image. If we liked any variations of the max upscaled image we would continue - if not, we went back to the previous image to create more variations. These are a few we were most happy with that captured the mood we were seeking, scale, and the general feeling of a large scale Frank Gehry building.

While there were some perceived hits - there were also some misses. This is not to say the A.I. could have not produced different visuals with the same prompts - it certainly would have. Here are a few evocative, but unsuccessful attempts.

Prompt: “grey kengo kuma building shaped like eastern Scotland cliffs with an arch in the middle that people are walking through, sunny day, wide shot, unreal engine 5 render, 4k”
Prompt: “Modern La Sagrada Familia Tower building, dramatic lighting, mist, wideshot, people walking in foreground, unreal engine 5 render, 4k”

We could have chosen to continue to modify the prompt to make the image outputs more in line with our desired goal of creating a ‘modern La Sagrada Familia Tower building’. Perhaps adding more detail regarding the shape of the proposed building could have brought the vision closer to fruition. Regardless, the prompt details such as ‘dramatic lightning’, and ‘mist’ made quite an impact on the output.

Combining Buildings and Architectural Styles

We then pondered if the A.I. could combine buildings and architectural styles. Taking inspiration from our competition submission on the Detroit waterfront mentioned above, we referenced precedents we utilised for the competition submission. These were the Pompidou Centre in Paris, the Markthal in Rotterdam, and Copenhill in Copenhagen. These were three unique buildings that we were interested to see if the A.I. could replicate in some fashion.

Prompt: “a modern building that looks like the pompidou centre, markthal, and copenhill, moody in a city, wide shot, unreal engine 5 render, 4k”

We felt these images definitely captured elements of the Pompidou Centre and to some extent the Markthal. We chose some for upscaling and here were the results.

Producing Landscapes and People Interacting

Maybe we took the ‘moody’ aesthetic too far. We went in a different direction and thought of some happier themes. Could the A.I. identify a park - for example ‘Superkilen’ park in Copenhagen? Could this be inserted into an alternative setting? We tried.

Prompt: “Superkilen Park in Toronto with children playing and elders sitting watching, sunny day, wide shot, unreal engine 5 render, 4k”
A chosen image upscaled.

It did not recognize ‘Superkilen Park’ but it did understand the other text in the prompt quite well. What about a type of playground?

Prompt: "a naturalized timber playground with children climbing throughout, sunny day, dramatic, wide shot, unreal engine 5 render, 4k"

That was hit or miss - so we upscaled one of the images and the results were more satisfying. We tried again for more variations, however the A.I. wanted to move more towards a traditional metal formalised playground so we stopped while we were ahead. Perhaps alternative prompts could have encouraged a more informal play setting?

TA chosen image chosen for upscaling.

The Refinement Process

So as we've seen, Midjourney can produce some incredible visualisations. For our last example though, we want to show how Midjourney can slowly build over time. By carefully choosing a preferred generated image, then upscaling or creating multiple variations again and again - Midjourney can produce renders that more closely match your imagined building or site. Take for example a 'cross-laminated timber building in a coastal mountain town' seen below.

Round 1 - Prompt: "cross-laminated timber building in a coastal mountain town, wide shot, unreal engine 5 render, 4k"
Round 1 - Upscaled iterations
Round 1 - Final image upscaled

In 'Round 2' of this process, we inserted the final image upscaled (seen directly above) and then entered the prompt with the addition of 'floor to ceiling windows'. The top right image from figure below was chosen to investigate in further detail.

Round 2 - Prompt: (Round 1 final image) + "cross-laminated timber building with floor to ceiling windows in a coastal mountain town, wide shot, unreal engine 5 render, 4k"
Round 2 - Upscaled iterations
Round 2 - Final image upscaled

In 'Round 3' of this process, we inserted the final image of round 2 upscaled (seen directly above) and then entered the prompt with the addition of 'dramatic lighting, mist, people walking in foreground' taking inspiration from the "Guggenheim skyscraper" prompt. The top left image from the figure below was chosen to investigate in further detail.

Round 3 - Prompt: (Round 2 final image) + "timber building with floor to ceiling windows in a coastal mountain town, dramatic lighting, mist, people walking in foreground, wide shot, unreal engine 5 render, 4k"
Round 3 - Final image upscaled

This experiment demonstrated that A.I. could prove useful in the early stages of the design process perhaps as part of a mood board exercise. Beyond this, it is incapable (at this time) of reliably delivering a complete or coherent structure. This process could also be used in the development of conceptual renders or final renders through a rigorous iteration process and some additional photoshopping. This would prove most effective on a project with a non-specific site, instead referencing a proposed back drop such as "coastal mountain town".

Top Ten Tips

To conclude, we'd like to share our top ten tips when using Midjourney so far...

  • Describing a specific ‘mood’ in a prompt for a render can make a dramatic impact.
  • The upscaling feature is very powerful - upscale first if you prefer one of the generated images - then create variations of the upscaled generated image.
  • Producing visuals that meet your idea can take time and multiple iterations - be patient.
  • Describing the type of shot can help determine the composition (e.g, wide angle).
  • Don’t forget about people, this helps provide scale (e.g., people in foreground).
  • Upload your own images (from an online URL) can help create variations that help match your intended visualization output.
  • Commas are soft breaks, :: are hard breaks between ideas and concepts. This affects how hard Midjourney tries to blend two concepts together for the image.
  • Midjourney struggles with many humans doing specific things right now, keep it basic.
  • Having a long and descriptive prompt can produce amazing results or total nonsense...trial and error is needed.
  • Learn from the online community. There are many resources now to help you produce amazing visualisations.

Want to submit your own Midjourney visualizations? DM us on Instagram @designoutsider. We’ll choose the best prompts and showcase the results on our account.

Do you have a piece of writing or project that aligns with the Design Outsider Manifesto? If so, would you like to use this platform to promote the work you are doing? Reach out to the team here.

By clicking “Accept All Cookies”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.