Prompt Engineering: From Words to Art and Copy

December 21, 2023
Michael Taylor

AI image and text generation tools like Midjourney and are exactly what Arthur C. Clarke was talking about when he said “any sufficiently advanced technology is indistinguishable from magic”

https://openai.com/blog/dall-e/ 

We all knew that AI was coming to take our jobs, we just thought that creativity was something uniquely human. Instead of the last thing to go before we reach singularity, it looks like white-collar design and writing jobs will be among the first to go. You might think this is a niche interest, but some AI art communities are already approaching the size of popular video game communities (approaching 1m in 2020, 13m+ a year later). This is bigger than anyone realizes.

http://midjourney.com/

The output you can get with a few minutes of work writing simple text prompts is shockingly good. Early adopters are regularly blowing minds as they share their creations on social media. Already a nascent community has appeared around “prompt engineering”: sharing tips and tricks for getting the AI to do what you want. Turning words into to art.

First, a definition. A “prompt” is the input we give to AI models when communicating with them. “Prompt engineering” is the process of discovering prompts which reliably yield useful or desired results. The output you get from generative AI models is non-deterministic, which is a fancy way of saying sometimes at random it produces crap. In order to solve for this you have to properly optimize your prompts, a/b testing them to see what works at scale.

This is a topic I know well, and have created an online course on Udemy called "Prompt Engineering Principles: ChatGPT & DALL-E" where you can learn a set of future-proof principles for working with AI. Or you can continue to read on to decide if I know what I'm talking about. :-)

https://twitter.com/karpathy/status/1273788774422441984?s=20 

In this post I’ll start by sharing value up front: a prompt engineering template that you can use right away to get started. I’ll walk you through how the template works, so you can start experimenting right away. If you want to learn more about how these innovative tools will change the way we do creative work, keep reading. I’ll finish with a case study based on my own experience using AI art to illustrate my book.

The Five Pillars of Prompting

The pace of progress in AI is blindingly fast. It feels like something new and innovative drops every single week. According to analysis of the arXiv database where most AI papers are hosted, the number of papers on AI published every month is doubling every 24 months: it's going exponential.

https://www.reddit.com/r/singularity/comments/xwdzr5/the_number_of_ai_papers_on_arxiv_per_month_grows/

All of this innovation is being driven by developer adoption, where legions of researchers, hackers, hobbyists, entrepreneurs, and tinkerers have all converged on this one industry, because all of the smartest people they know have also caught the bug. As Chris Dixon said, “What the smartest people do on the weekend is what everyone else will do during the week in ten years.” Open-source AI projects like Stable Diffusion are among the fastest ever growing repositories on GitHub (if at first you don't see it, it's the pale blue line that looks like the Y Axis because it's so vertical).

https://twitter.com/a16z/status/1592922394275872768/photo/1

This increased adoption of AI among technical folks has filtered through to the mainstream public, in particular since the release of ChatGPT by OpenAI in November 2022. Usage completely exploded, driving more demand than anyone expected, including OpenAI, who has complained of being GPU constrained ever since. ChatGPT has been the fastest ever growing consumer app, reaching 1m users in 5 days, and 100m users in 2 months.

https://twitter.com/kylelf_/status/1623679176246185985?t=g9wnm52DZEfe42CJAjooRA&s=03

It's impossible to keep up, and that means the tips & tricks that work today aren't likely to work for long. Sam Altman, CEO of OpenAI, said “I don’t think we’ll still be doing prompt engineering in five years.” by which he meant “...figuring out how to hack the prompt by adding one magic word to the end that changes everything else”. The good news is that he followed up with this: "What will always matter is the quality of ideas and the understanding of what you want." What Sam is saying is that no matter what happens with AI, being able to have good ideas and communicate them effectively will continue to be important.

Note: I have been commissioned for a book based on these principles with O'Reilly, out in 2024, but the unedited version is currently available for pre-release online.

https://learning.oreilly.com/library/view/prompt-engineering-for/9781098153427/

Instead of learning hacks, we should focus instead on ways to work with AI that are timeless: they've been useful in the past and will likely to stay useful far into the future. I call them the Five Principles of Prompting.

1. Give Direction: Describe the desired style in detail, or reference a relevant persona.

2. Specify Format: Define what rules to follow, and the required structure of the response.

3. Provide Examples: Insert a diverse set of test cases where the task was done correctly.

4. Evaluate Quality: Identify errors and rate responses, testing what drives performance.

5. Divide Labor: Split tasks into multiple steps, chained together for complex goals.

To start with, let's look at a typical beginner's prompt for GPT-4.

Can I have a list of product names for a pair of shoes that fit any shoe size?

How can we improve our communication? The AI is having to do a lot of guesswork, and that won't always give us the results we want. It's also complaining about the task, and has delivered it in a relatively unstructured format. How can this prompt be engineered to reliably yield useful results?

Product description: A home milkshake maker
Seed words: fast, healthy, compact
Product names: HomeShaker, Fit Shaker, QuickShake, Shake MakerProduct description: A pair of shoes that can fit any foot size
Seed words: adaptable, fit, omni-fit
Product names:

The example above has had the 5 Principles of Prompting applied to it. It provides and example of how to answer your prompt (a home milkshake maker), gives guidance on what kind of answer we want (seed keywords), it adjusts the model parameters to change the answers we get (temperature), makes it clear what format we expect (product names), and uses chaining to string multiple AI responses together.

These principles are transferrable across models, and work with text-to-image just as well as they do when generating text. Let's take a look at an example of an image prompt that needs to be engineered:

a photograph of a woman in a flapper dress inspired by starry night --v 5

We're not getting the results we want – an image of a woman wearing a dress inspired by Van Gogh's Starry Night – but with a little prompt engineering work we can get there. These models are capable of almost anything, you just need to know how to ask!

https://s.mj.run/2s5NHHniK9E :: 0.1 a photograph, flapper, flapper dress inspired by starry night, light yellow and dark azure vortex swirls, brushwork --v 5

I developed these principles based on what worked for me back when I was using the GPT-3 beta in 2020, and the Midjourney beta in 2022. They still work today with GPT-4 and Midjourney v6, and therefore they'll continue to work with GPT-5 and Midjourney v7, or whatever models we're using in the future.

Let's dig into more detail:

1. Give Direction: Describe the desired style in detail, or reference a relevant persona.

Product description: A home milkshake maker
Seed words: fast, healthy, compact
Product names: HomeShaker, Fit Shaker, QuickShake, Shake MakerProduct description: A pair of shoes that can fit any foot size
Seed words: adjustable, bigfoot, universal
Product names:

The very first thing you want to start with is giving stylistic directions. In this prompt we do it with seed keywords, telling the AI what part of latent space to search in for an idea. In or original prompt we seeded with"adaptable", "fit", or "omni-fit", and in this case we changed it to "adjustable", "bigfoot", and "universal", so we got very different results. Providing direction really biases the AI results, as like any good intelligence (artificial or otherwise) it wants to give you exactly what you want and can take things too literally. Be ambiguous with your feedback, just as you would with a human copywriter or designer: you're not giving them the opportunity to pleasantly surprise you if you tell them exactly how to do their job.

With AI image generation giving direction is the most important principle, because tools like Midjourney know every major artist and art style and can replicate it precisely. This is where going to art school or being educated in culture can be a huge advantage, because you know all the right words to use and how to mix styles in an aesthetically pleasing way. If you're uncultured like me you can reverse-engineer images you like by uploading them to the Midjourney Describe feature, which gives you four prompts that could generate a similar image. I wouldn't have thought to use the word "vortexes" in my prompt to describe the swirls Van Gogh uses, and I get better results know I know to include that word.

2. Specify Format: Define what rules to follow, and the required structure of the response.

Product description: A home milkshake maker
Seed words: fast, healthy, compact
Product names:
1. HomeShaker
2. Fit Shaker
3. QuickShake
4. Shake MakerProduct description: A pair of shoes that can fit any foot size
Seed words: adaptable, fit, omni-fit
Product names:

When we're just playing around with AI, especially with something like ChatGPT, the structure of the responses doesn't really matter. However pretty soon you start wanting to plug AI into production tools and that's when structure gets important. If the AI doesn't give you consistently formatted results every time, it's impossible to depend on the results for any serious work. The way that you prompt the AI really matters: from the examples given to how you finish your prompt gives the AI guidance on how to respond. GPT-4 is a data formatting savant and universal translator. It's not just limited to turning French to English or providing comma separated or numbered responses, it can even give you back JSON, YAML, or other structured data.

Similarly, image models are capable of outputting any format you like. Whether you want a stock photo, an oil painting, or an ice sculpture, Midjourney can replicate that format for you with any given concept. Sometimes the formats clash a little with the styles you give it, so experiment with a range of formats and styles to get the best results. If you're struggling for ideas, I built a simple free tool called visual prompt builder you can use that lists all the most common formats and styles.

https://tools.saxifrage.xyz/prompt

3. Provide Examples: Insert a diverse set of test cases where the task was done correctly.

Product description: A watch that can tell accurate time in space
Seed words: astronaut, space-hardened, eliptical orbit
Product names: iNaut, iSpace, iTimeProduct description: A home milkshake maker
Seed words: fast, healthy, compact
Product names: iShake, iSmoothie, iShake MiniProduct description: A pair of shoes that can fit any foot size
Seed words: adaptable, fit, omni-fit
Product names:

One of the things that makes GPT-4 so good is that it's capable of zero-shot reasoning, meaning giving you an answer without any examples. However that doesn't mean that giving examples can't radically improve the quality of the response. Giving examples is something we do regularly when briefing humans, so it stands to reason that even when AI gets superhuman, this will still help. Be careful however: AI has a tendency to learn too much from the examples, and providing too many similar ones can make the AI less creative in its answer, as we saw with the Steve Jobs iFit, iAdapt examples above. You can think of providing examples as 'fine tuning' the AI, so that it produces consistent results. Prompt length is limited so if you can't fit enough context, using LlamaIndex with Pinecone DB can help inject context into each API call. If you want to regularly get consistent results, it might make sense to actually fine-tune the model by training it on lots of examples, which is available at a higher cost.

For the image example, I included this still from the movie the Great Gatsby (that's the URL at the front of the prompt - you get that by uploading an image to Discord then clicking on "Copy Link"). Often including an example of what you want is easier than describing your vision, but be careful about copyright infringement if you're using the output for commercial uses. I downweighted this part of my prompt by 90% by putting `:: 0.1` after it.

https://people.com/tv/the-great-gatsby-tv-series-in-the-works-from-tudors-creator-michael-hirst/

4. Evaluate Quality: Identify errors and rate responses, testing what drives performance.

Product description: A home milkshake maker
Seed words: fast, healthy, compact
Product names: HomeShaker, Fit Shaker, QuickShake, Shake MakerProduct description: A watch that can tell accurate time in space
Seed words: astronaut, space-hardened, eliptical orbit
Product names: AstroTime, SpaceGuard, Orbit-Accurate, EliptoTimeProduct description: A pair of shoes that can fit any foot size
Seed words: adaptable, fit, omni-fit
Product names:Product description: A pair of shoes that can fit any foot size
Seed words: adjustable, bigfoot, universal
Product names:

LLMs work by picking the next word that's likely to appear in a sentence, but it doesn't choose them uniformly: sometimes it picks words that still fit but are relatively rare - there's an element of randomness and you can't always control what you'll get back. Some prompts are 'safer' than others, and more robust to errors, mistakes, hallucinations, and other undesirable outcomes. Other prompts perform much better on average and can make a bigger difference than you would have expected. You might even want to test your prompts against other LLMs, not just the ones provided by OpenAI. Whatever you do, running multiple prompts many times will accelerate your learning curve and get you better results. Pro Tip: use LangChain and enable tracing and pump the results into LangSmith for easy analysis and debugging after.

In image generation this principle is often a case of trying lots and lots of different words in your prompts, then manually checking the results. Alternatively if you want a quick analysis of your prompt, use the "Shorten" feature in Midjourney, which looks at the tokens in the prompt and gives you intel on what parts of the prompt are most important.

5. Divide Labor: Split tasks into multiple steps, chained together for complex goals.

Please rate the product names for "A pair of shoes that can fit any foot size" based on their catchiness, uniqueness, and simplicity. Rate them on a scale from 1-5, with 5 being the highest score. Respond only with a table containing the results.

Once you start using AI to complete real work, you'll find you often need multiple calls to complete a single task. As prompts are limited to around 4k tokens (around 1k words) with some models, it works well to break tasks up into multiple prompts. The example given here is that we're asking the model to evaluate the choices it gave based on a set of criteria. Because of the way LLMs work, always predicting the next token, they're actually more coherent in rating their own work after the fact as a separate prompt, than they are at making their work coherent in the middle of generating tokens. Tools such as Langchain can string multiple actions together, and keep prompts organized and production ready.

Not every model is good at everything, and as we diversify away from OpenAI we're seeing many prompt engineering workflows incorporate multiple AI models to handle different specialized tasks. For example DALL-E 3 tends to be good for text on images and composition, whereas Midjourney has a beautiful aesthetic and can do fantasy styles better. For some tasks the model doesn't even have to be generative AI, for example the background.bg AI model that is great at removing backgrounds from images, and only that.

https://background.bg/

If you're looking to learn prompt engineering, you can check out my prompt engineering courses on Vexpower, or check out my book on Prompt Engineering for Generative AI, published by O'Reilly Media in early 2024.

Image Prompt Engineering Template

The best way to learn AI models like DALL-E, Midjourney, or Stable Diffusion is trial and error. You’ll immediately see the power of AI generative tools, and will be hooked instantly.

Even if you do have access, it can seriously help improve your prompts to learn from what other people have figured out. I pulled this guide together while I was learning, and I’m sharing it with you so you don’t have to learn the hard way.

> PROMPT ENGINEERING TEMPLATE

This is a living document – so please tweet at me (@hammer_mt) with any suggestions, tips & tricks, and I’ll keep adding to it as best practices continue to emerge.

The meat of the template is the workflow for building up a prompt in DALL-E, Stable Diffusion or Midjourney. With prompts you should start simple with the subject term, whatever it is that you’re trying to create (i.e. a “space whale”). Then increase the complexity of the prompt using optional additional modifiers to change the style, format, or perspective of the image. There are certain magic words or phrases that have been found help to boost the quality of the image (i.e. “trending on artstation”) or conjure up an interesting vibe (i.e. “control the soul”), which feature heavily in publicly shared examples.

Most of the use of the template will come from the list of hundreds of words that get interesting results when you use them in prompts. You probably aren’t aware of all the different art styles, or artists, and it’s helpful to know different modifiers that improve the results of your prompts. I gathered these from my own experience and research in looking at what everyone else is finding works. Keep trying them and seeing what resonates with you. Keep your own prompt library for each project you work on.

Finally I’ve kept track of various useful articles, links and tools in my journey of learning prompt engineering. As I’ve learned from my own experience and others, I’ve added hints and tips as well as various terminologies I’ve encountered, so you don’t have to learn the hard way. My best advice would be to follow some of these links and immerse yourself in what others are doing in this space, as well as reading the rest of this article for more insight into prompt engineering.

UPDATE: Thanks to @mostlynotworkin, who took my prompt template and made it randomly generate a prompt every time you refresh the page!

> Randomized Prompt Engineering Template

---

***I have since released a free tool called "Visual Prompt Builder" that helps with prompt engineering by showing you what all the styles look like.***

> Visual Prompt Builder

https://tools.saxifrage.xyz/prompt

Tips and Tricks

Of course in the short term, there are some tips, tricks, and hacks that are working and can help you paper over the cracks in how the AI works today. Tweet at me (@hammer_mt) if you have any prompt engineering hacks and I'll put them here.

Simple Repetition

To make sure Dalle-2, Midjourney, or other AI Art tools really nail important characters when generating images, simple repetition works surprisingly well.i.e. prompt: "homer simpson, from the simpsons, eating a donut, homer simpson, homer simpson, homer simpson"

Ok not exactly what I wanted
Better results, but still not there
All four of the results are passible now

Invent Fictional Authors/Artists

One ethical concern of tools like GPT-3 and DALL-E is that we're copying the styles of famous authors and artists without attribution. Because of how the models work it's impossible to tell how much of the style we've copied, and where the limit should be. One surprising and yet elegant trick that works is to invent fictional authors and artists. The AI will imagine what style that person would have, and generate consistent results.

https://twitter.com/fabianstelzer/status/1554229352556109825/photo/1

Dreambooth and Textual Inversion

The rate of innovation in this space is tough to keep up with, and time will tell what will prove to be important. However one immediately obvious great leap forward was textual inversion and dreambooth. This is a new feature only currently available for Stable Diffusion (the open source competitor to DALL-E) allowing you to train the AI model on a specific concept, giving it only a handful (3-5) sample images to work with, then download that concept from a concepts library to use later in your prompts, or as a package in the case of Dreambooth. What that means is that you can now introduce your own object, character, or style and get back consistent results that match. For example if you trained Stable Diffusion on the concept of Pikachu, you could reference it again later in your prompts as <Pikachu>.

https://twitter.com/TomLikesRobots/status/1568916040599363586?t=Bmyz1UrXmna_Ds15E1GfCg&s=03

It processes the images and finds the corresponding latent space where that representation would live, and essentially puts a marker there, in the form of a token. Dreambooth is more resource intensive but it actually trains the model, rather than finding a point in the latent space, so it can be more accurate and reliable. This opens up a world of creative uses for these AI models, because now they can move from fun toys to consistent, reliable tools. For example you can imagine a branding agency working to train Stable Diffusion on the concept of their client's brand, which when included would always produce art that followed brand style guides.

https://textual-inversion.github.io/

Video game designers or filmmakers could train Stable Diffusion on a character, and then use that character in various scenes. Product designers could train Stable Diffusion to recognize their product, and then easily show that product in different styles, scenarios, and perspectives. This is big.

Image
https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb

To test it out, I trained Stable Diffusion on some concept art from a Reddit user who enjoys photoshopping things from Star Wars into old paintings. I admire this art style, but I would never in a million years have the talent to create something similar. So just for fun I tried to see if I can make the AI generate me a painting of Jar Jar Binks in the Roman Senate! The AI definitely didn't know the style before hand, as you can see below, but it picked up up from 4 hours of training (on a Google Colab GPU) with just 6 sample images. I could of course keep going and train Stable Diffusion to understand the concept of Jar Jar Binks, but I don't want Disney coming after me!

If you want to try out textual inversion, probably the easiest way is to check out the huggingface concepts library and either contribute to it or follow the instructions to create your own private concepts. They provide links to Google Colab notebooks which let you run the full code for free courtesy of the generous GPU access Google offers for no cost, and the fact that Stable Diffusion is completely open source. Many AI tools such as AvatarAI use this functionality under the hood in order to actually place your own face in pictures.

https://twitter.com/hammer_mt/status/1580198284186222592

Recursive Reprompting to Catch Mistakes

One common issue with LLMs is that they make mistakes. Sometimes they hallucinate wrong answers, get the context wrong, or output something in a bad format. You might thing the answer is to kick the problem to a human, but you'd be wrong: give the AI a chance to correct its mistake! Often with the right recursive feedback loop it can get to the right answer so long as your prompts are good. You can follow up each response with a prompt to check for common mistakes, and you can even feed errors back into GPT-3 and ask it to fix the mistake.

https://www.patterns.app/blog/2023/01/18/crunchbot-sql-analyst-gpt/

The Future of Creative Work

What are all the artists and authors going to do for work when the full impact of this breakthrough technology is realized? Will they all be out of a job?

The answer is no, of course not. As remarkable as Midjourney and GPT are, they’re not magic. They’re tools like any other. After a tumultuous period of change from introduction of new technology, people retrain or switch jobs, and we continue along with higher productivity.

Disney no longer illustrates cartoons by hand, but they own Pixar where higher quality renderings can be done via computer. Many illustrators learned computers, a lot will have retired, some progressed to management, and others still do things manually.

Over a long enough time period, even technologies that obliterate whole categories of industry eventually become commonplace, and those people find other things to do.

In 1908, before the car dominated transport, New York alone had a population of 120,000 horses that had to be fed, groomed, and cared for. They produced 2.5 million pounds of manure every day on the city’s streets. Do you know any out of work horse crap shovelers today?

The truth is that most of the time creative professionals spend is on the equivalent of shoveling horse crap. Generative AI tools have the potential to handle that for them.

To capitalize on this opportunity, creatives just need to reframe the value they provide. Let Midjourney be the artist; you can be the curator. Let ChatGPT be the writer; you can be the editor.

Here are a few tangible examples of how AI promises to make creative work better:

  • No more bad briefs: clients can use AI to show you exactly what they want, so there’s no time wasted second-guessing
  • Unlimited variations: rather than charging for or limiting the number of variants, you can generate 10+ new versions with the click of a button
  • Consistent brand guidelines: once you’ve designed a stable prompt that works, it’ll almost always replicate the right style and tone of voice for approval
  • Self-service small jobs: rather than handle unprofitable commissions manually, you can encode your ‘style’ into a prompt library and sell that for passive income
  • Unexpected inspiration: the tight feedback loop between prompt and results lets you take weird and wonderful routes to ideas you would never have thought of

Prompt Engineering as a Job

Much of the work of prompt engineering is persistence: these tools are still in beta, and working with AI takes a lot of trial and error, as well as some technical knowledge. Perhaps that's why some companies are paying $250k - $335k per year for prompt engineers!

https://jobs.lever.co/Anthropic/e3cde481-d446-460f-b576-93cab67bd1ed

What makes a great prompt engineer is that they’re capable of communicating clearly. When you can create anything you want, the bottleneck becomes your ability to express exactly what that is. 

https://twitter.com/DynamicWebPaige/status/1512851930837843970 

There are some industry hints and tricks that you can adopt to immediately improve your results. For example, telling GPT-3 to be “helpful” increases the truthfulness of its answers. Adding “trending on artstation” tends to boost the quality of images for DALL-E 2. 

However the vast majority of the work of prompt engineering is just being a student of history and literature. Since getting access to DALL-E 2 I’ve become somewhat of an amateur art historian. I’ve been learning the names of different art movements, experimenting with lighting and photography terms, and familiarizing myself with the work of artists I had never heard of before. 

Now simply knowing the name of a style makes you instantly able to replicate it. Every style has its own ‘memes’ – units of cultural information – that taken together are what distinguishes one category or style from another. GPT-3 and DALL-E have seen enough images and descriptions of Van Gogh to know that the prompt “in the style of starry night” means something like

“The painting is composed of thick brushstrokes of paint that are applied in a swirl-like pattern. The colors are mostly blues and greens, with some yellow and red. The stars in the night sky are rendered with white dots. The painting has a dreamlike quality, and the overall effect is one of intense emotion.” 

Painting. Thick brushstrokes. Swirl-like pattern. Blues and greens. White dot stars. Dreamlike. Intense emotion. These are the memes associated with Van Gogh, together they constitute his style. If you asked any artist to paint you something “in the style of starry night”, their brain would conjure up these exact same associations from memory. This is precisely what DALL-E 2 is doing, except its brain is artificial, and its memory is the entire corpus of the internet. That gives it the power to imagine almost anything, like for example what the Mona Lisa would look like if you zoomed out to see the rest of the landscape.

https://mixed-news.com/en/what-would-mona-lisa-look-like-with-a-body-dall-e-2-has-an-answer/

So training for work as a prompt engineer is very similar to training to work as an artist or copywriter. You need to read great novels, learn about famous art movements, understand human nature, and what resonates with your intended audience. Map out all the memes of different categories. Mix them together to achieve something unique. None of that has changed: what has is the ability to translate at blinding speed from your imagination to a computer screen.

Prompt Engineering Case Study

For a tangible example of how this works, take the artwork for the book I’m writing on Marketing Memetics. One of the main themes of the book is how our brains haven’t evolved in the past 200,000 years: we’re essentially cave men trying to make sense of the modern world and all of its fantastic technology. So I had the idea of taking ancient historical people, and dumping them into a futuristic city.

How this would have worked before AI art generators:

  • I would have to find an artist whose aesthetic I liked
  • I’d have to brief them on what I wanted, despite having no art knowledge or background
  • I might have to wait until they’re finished with their current commission
  • I would have to pay them thousands, maybe tens of thousands of dollars
  • It might take days, weeks, or months for me to see the final version
  • Once done, there’s nothing I can do to change the painting
  • If I wanted more than one painting, multiply the time and costs accordingly

I’m writing my book as a passion project. I don’t have the backing of a publisher, I plan to just put this on Gumroad and Amazon. I have no idea if anyone will buy it. So realistically I wouldn’t do any of the above: I’d have to download some free vector art online and keep daydreaming about the custom oil painting I had in my head.

One consideration is that OpenAI, the creators of DALL-E currently retain copyright over all of the images you produce on their platform, and don’t allow you to generate images for commercial use. Note that just the image itself is copyrighted, not the prompt or style, so you should be ok to use DALL-E to ideate then commission a final version based on the result (I’m not a lawyer, and this isn’t legal advice). I used Midjourney (a DALL-E competitor) in part because they allow commercial use of the content so long as you have an enterprise plan.

Here’s how it actually worked with Midjourney:

I started with the image in mind of Neal Stephenson’s “Snow Crash”, a novel about mind viruses that was a great inspiration for my own book.

Looking back, my first prompt for Midjourney was “a man with a samurai sword standing in front of an ancient babylonian gate, staring through to a futuristic cityscape”, which got me the following result:

I actually found this kind of discouraging, so I played around for a while and looked at what others were doing in Midjourney. Because it’s all run through a Discord server (similar to Slack) you can see what prompts everyone else was using and the results they’re getting.

So after some experimentation and learning the secrets of prompt engineering, I started getting far better results. The prompt “a hyper realistic photo of an ancient samurai warrior in modern day manhattan” is what got me onto the right path.

The top left image was actually pretty compelling, but it wasn’t quite right. I decided I wanted an oil painting, in part to hide the imperfections of the image. So I started researching different artistic styles and found something I was immediately drawn to “The Fall of Rome” by Tomas Cole, part of the “Course of Empire” series.

https://en.wikipedia.org/wiki/The_Course_of_Empire_(paintings)#/media/File:Cole_Thomas_The_Course_of_Empire_Destruction_1836.jpg 

Thanks to Wikipedia I learned that this style of art was called the “Hudson River School”, of which Thomas Cole was considered the founder. I had no idea! I thought this was painted by an old Italian artist, not someone in North America in the 1800s. Now I had my aesthetic, the results dramatically improved. My prompt became “Faded oil painting on canvas in the Hudson River School style of an ancient samurai soldier arriving in a futuristic utopian shanghai city. From behind. Wide angle.”

Now I had the right look, it was just a case of repeatedly iterating by generating more variations of the ones I liked. Within just a few generations of evolving the images in this way, I got the following painting, which is making it into the final book when I self-publish at the end of the this year (December 2022 - sign up for updates).

I later replicated it in DALL-E when I got off the waitlist, just to see how it compares. Even if DALL-E looks a lot cleaner, I actually prefer the style of Midjourney better. Just for fun I decided to try the ‘outpainting’ in DALL-E, a unique undocumented feature people have figured out. It works by taking your original image, and editing it in Photoshop, Photopea or Figma to have whitespace around it. Then you upload to DALL-E and use their edit feature to erase and fill the extra space. The effect is that you can ‘zoom out’ from an image, with the AI filling in the gaps. The result was the picture that adorns the header of this blog post (I've highlighted the original image with a white border so you can see where it fits in).

It’s already remarkable how far we’ve come, and what we’re able to do. I literally know nothing about design, creativity, or art. I have spent most of my career in Excel spreadsheets and Python scripts. Yet here I am illustrating my own book, in the space of a day for $30.

That alone is wildly disruptive, but it’s what I was able to do next that really changes the game. Now I had a visual style, I could kind of treat my prompt as ‘brand guidelines’ for my book, and generate as many new images as I liked! All I had to do was change the prompt from “...of an ancient samurai soldier arriving in…” to whatever other ancient people I liked. Romans. Greeks. Babylonians. Egyptians. Mongols. I could also change the city to see what pairings looked best. London. New York. Los Angeles. Miami. Delhi. In the end I created over 30 combinations in my spare time over the course of the next week, all with the same concept and a consistent aesthetic, so I can pick one per chapter or even generate one per blog post if I liked, for a completely negligible cost.

It’s not just me. I know ad agencies experimenting with Midjourney (better licensing than OpenAI) to create ads for their clients. People are creating children’s books by combining GPT-3 and DALL-E. Vogue magazine created one of their latest covers with DALL-E 2. A friend of mine even used Midjourney to illustrate his short sci-fi novel to inspire more people to care about climate change. The future of creative work is coming, and it’s best to get ahead of it. Get some experience with prompt engineering.

September 7, 2023
September 6, 2022

More to read