Design Outsider co-founders Callum O’Connor and Josh Speedie have been inputting architecture and urban design terms and precedents into Midjourney bots with surprising outcomes. The results highlight some potential challenges and opportunities for built environment professionals and beyond. The rendering output by the Midjourney Image Generator reflects the changing relationship between designers and Artificial Intelligence (A.I.). In this article, the duo discuss the basics of Midjourney, share their initial thoughts, what they found, and finish with a top ten list of recommendations when using Midjourney to produce your own visualisations.
With a background in architecture and urban design, we are interested in how Midjourney can provide a starting point for discussion and respond to our prompts using architecture and urban design precedents and terms. In what ways could A.I be useful for concept development or conveying an idea of a final project? We're starting to investigate how.
While the visuals are interesting, for us, the iterative and evolving process of Midjourney is what is most intriguing. A.I image generation is only as strong as the prompts being imputed into the machine. Knowing what to write in the prompt is one of the biggest challenges! While there have been flashy visuals being presented on social media using A.I. technology, there has been less written about the process behind the visualisations. We hope the following examples and discussion provide a starting point for others, and inspire you to begin your MidJourney..well…journey! Let’s get started.
According to Midjourney, Midjourney is “an independent research lab exploring new mediums of thought and expanding the imaginative powers of the human species.” Uh….say what? Simply put, Midjourney is an artificial intelligence program that creates images from textual descriptions. Write text prompts and Midjourney spits out visuals.
In the Midjourney message feed first type “/imagine” then write a ‘prompt’. A prompt is a set of words that you want to see in a render. For example, you could write “aerial view, city by the sea paint style” for a prompt. The Midjourney artificial intelligence will generate four different variations of the prompt.
You can run the prompt again if you don’t like the way any one of them turned out by pressing the blue button (see Figure 1). If you see something you like, you can click V1, V2, V3 or V4 depending on the visual you prefer. Midjourney will then run the prompt again, producing another set of four renders that more closely resemble elements from the chosen visual. Alternatively, you can choose to ‘upscale’ one of the renders by choosing U1, U2, U3, or U4 depending on your preferred visual. You can also choose to ‘max upscale’ a visual after it is ‘upscaled’ once, for even greater detail.
You can repeat the process, again and again, until you are satisfied - or until your trial runs out!
For a more detailed Quick Start Guide visit: https://midjourney.gitbook.io/docs/
For frequently asked questions visit: https://github.com/midjourney/docs/blob/main/FAQs.md#image-prompt-questions
After some initial research we identified a basic structure for some of the prompts. These are not specific rules to follow, but we found this general structure a useful starting point.
- Start with: /Imagine followed by key idea (noun), (action), (setting/environment).
- Use: "::" to separate the next set of descriptors (adjectives).
- Then add (rendering style).
For transparency, we provided the prompts we used throughout this article. Update and modify them yourself. Changing just a word can have a big impact on the Midjourney A.I visual output!
We began to experiment if the A.I could generate scenes of a specific city or neighbourhood.
Not bad. We then began to explore if writing in specific architecture prompts could produce a more realistic rendering of Toronto, Ontario. We wrote into the prompt architectural styles with greater precision.
The prompt produced a general sense of Toronto, but had difficulty producing more realistic buildings and streetscapes. Interesting to note, however, is that the A.I. was able to generate a clear contrast between two cities. For example, there was a contrast using similar language but writing ‘Copenhagen’ in the text prompt in place of ‘Toronto’.
We then attempted to replicate a competition submission that we previously entered. We were trying to produce a building similar to this:
In the prompt we attempted to describe the chosen building, the specific location, and environmental details, and we also included a chosen rendering equivalent, Lumion.
We thought after successfully capturing the feel of other cities such as Toronto and Copenhagen, the A.I may be able to recognize the Detroit skyline. Midjourney had a difficult time replicating this. While it generated a waterfront with some towers, it was not distinctly Detroit. Nor could the A.I. generate a specific building near the site (e.g., Huntington Place). The A.I. also wanted to incorporate many buildings into the design. We tried a different strategy.
Midjourney also has an option to insert an image found online into the prompt. To do this, type ‘/imagine’ to prompt as normal, but then in the prompt section, paste in the web address where the image is stored online. After the link you can add new text prompts to complement it. An image URL must be a direct link to an online image, which you can usually get by right clicking on an image and choosing ‘Copy Link Address’. This address will usually end in a .png or .jpg if it's a usable address.
We inserted the competition render we originally produced ourselves seen above prior to the prompt. We also removed ‘Detroit’ and ‘Huntington Centre’ to see if the A.I. could focus more on the building itself.
Overall, this did not produce a desired outcome. The prompt could not recognize the description of the building, and that building details should only apply to one building. We have seen examples online with a PNG cut out with only the main building being sent to the A.I. Perhaps this could have improved our results…but on to our next experiment.
We then posed, would the A.I. have an easier time recognizing an architectural style, or an architect with buildings more readily available through an online search? We started with Bjarke Ingles, keeping the prompt very simple.
This output incorporated some elements of a traditional ‘Bjarke Ingels’ building. Adding more detail in the rendering style and type of shot greatly changed the image output.
We then considered, perhaps another architect may produce different results.
Identifying that the A.I. could draw inspiration from some architects, we asked ourselves, what if we imputed specific buildings into the prompt? To test the boundaries, we combined a famous building with an alternative typology (i.e., skyscraper). Note we also considered the ‘mood’ of the image adding in additional descriptions such as ‘dramatic lighting’ and ‘mist’, including ‘people walking in foreground’.
We upscaled, created variations, upscaled again, created variations choosing what image we preferred each time. We found max upscaling worked best when we were happy with a visual, then made a variation of a max upscaled image. If we liked any variations of the max upscaled image we would continue - if not, we went back to the previous image to create more variations. These are a few we were most happy with that captured the mood we were seeking, scale, and the general feeling of a large scale Frank Gehry building.
While there were some perceived hits - there were also some misses. This is not to say the A.I. could have not produced different visuals with the same prompts - it certainly would have. Here are a few evocative, but unsuccessful attempts.
We could have chosen to continue to modify the prompt to make the image outputs more in line with our desired goal of creating a ‘modern La Sagrada Familia Tower building’. Perhaps adding more detail regarding the shape of the proposed building could have brought the vision closer to fruition. Regardless, the prompt details such as ‘dramatic lightning’, and ‘mist’ made quite an impact on the output.
We then pondered if the A.I. could combine buildings and architectural styles. Taking inspiration from our competition submission on the Detroit waterfront mentioned above, we referenced precedents we utilised for the competition submission. These were the Pompidou Centre in Paris, the Markthal in Rotterdam, and Copenhill in Copenhagen. These were three unique buildings that we were interested to see if the A.I. could replicate in some fashion.
We felt these images definitely captured elements of the Pompidou Centre and to some extent the Markthal. We chose some for upscaling and here were the results.
Maybe we took the ‘moody’ aesthetic too far. We went in a different direction and thought of some happier themes. Could the A.I. identify a park - for example ‘Superkilen’ park in Copenhagen? Could this be inserted into an alternative setting? We tried.
It did not recognize ‘Superkilen Park’ but it did understand the other text in the prompt quite well. What about a type of playground?
That was hit or miss - so we upscaled one of the images and the results were more satisfying. We tried again for more variations, however the A.I. wanted to move more towards a traditional metal formalised playground so we stopped while we were ahead. Perhaps alternative prompts could have encouraged a more informal play setting?
So as we've seen, Midjourney can produce some incredible visualisations. For our last example though, we want to show how Midjourney can slowly build over time. By carefully choosing a preferred generated image, then upscaling or creating multiple variations again and again - Midjourney can produce renders that more closely match your imagined building or site. Take for example a 'cross-laminated timber building in a coastal mountain town' seen below.
In 'Round 2' of this process, we inserted the final image upscaled (seen directly above) and then entered the prompt with the addition of 'floor to ceiling windows'. The top right image from figure below was chosen to investigate in further detail.
In 'Round 3' of this process, we inserted the final image of round 2 upscaled (seen directly above) and then entered the prompt with the addition of 'dramatic lighting, mist, people walking in foreground' taking inspiration from the "Guggenheim skyscraper" prompt. The top left image from the figure below was chosen to investigate in further detail.
This experiment demonstrated that A.I. could prove useful in the early stages of the design process perhaps as part of a mood board exercise. Beyond this, it is incapable (at this time) of reliably delivering a complete or coherent structure. This process could also be used in the development of conceptual renders or final renders through a rigorous iteration process and some additional photoshopping. This would prove most effective on a project with a non-specific site, instead referencing a proposed back drop such as "coastal mountain town".
To conclude, we'd like to share our top ten tips when using Midjourney so far...
Do you have a piece of writing or project that aligns with the Design Outsider Manifesto? If so, would you like to use this platform to promote the work you are doing? Reach out to the team here.