With the rise of search engines, we had a corresponding rise in SEO (Search Engine Optimization), the practice of making your content rank higher on Google and other search engines. Now we’re seeing AI chatbots like ChatGPT growing past 100m users as a potential Google competitor, we should expect to see an opportunity in the practice of making your content get mentioned by AI, which I’m calling AIO: Artificial Intelligence Optimization.
This field is a long way off existing, which is why I made the above joke. Having thought about it more, I actually think this is going to be a thing, and it’s worth exploring this area today to establish a few common sense best practices. In this article I’ll cover what makes AIO potentially valuable, and list out some ways you can already curry favor with the current batch of AI tools.
I’m interested in this area as a self-proclaimed prompt engineer, the author of an O’Reilly course of Prompt Engineering and also a Udemy course "The Complete Prompt Engineering for AI Bootcamp (2023)", and because AI is unexpectedly making the book I’m writing on reverse-engineering creativity a whole lot more interesting.
I plan to keep this page updated as the field develops, but everything’s happening so fast I’ll need you to message me with anything I miss: @hammer_mt
What’s the Prize?
First off, what can AIO do for you? At the time of writing only Google Bard, Microsoft's New Bing, and ChatGPT with the browser plugin have the potential to drive any serious traffic to your website, as most AIs don’t yet provide citations or links.
There are early trials with You.com and Perplexity AI, but they’re unlikely to have the volume yet to really make a difference. However people are already reporting increased traffic from Bing, and Google is rearchitecting its entire search results page around AI.
Don't expect AI awareness to always manifest as traffic either; we are already seeing reports of people listing "ChatGPT" as a source when filling in a "How did you hear about us?" form on checkout. Adding one of these forms to your checkout or registration process with a free text "Other" option, using a vendor like Fairing, is a fantastic way to spot new marketing channels, and keep up to date with which AI bots are sending users your way.
That means so far AIO is more of a brand marketing play, similar to going viral on TikTok or working with influencers. Being known by the AI is the new "ranking number 1 on Google", and brands will no doubt start to compete to find ways to get in the training data. For example how much would Converse, Vans, or Reebok pay to be in the list when you ask where to get the best sneakers? What’s the benefit to Nike for having both their main brand and a sub brand (Jordans) listed? Being included for these types of consumer queries will be golden, especially as ChatGPT gets a mobile app, and AI gets incorporated into more and more products.
This is an informational query, which is surely the top prize as it’ll drive the most awareness. However, there will also be a long tail value in simply being known by the AI, when someone looks you up. Just like having your own Wikipedia page, or owning the top slot on Google for your name, being in the AI’s training data will be a status symbol and a form of brand protection. Speaking of protection, it may make sense to start playing defensively right now to protect your good name. If the AI doesn’t know about you, it will surely hallucinate something terrible about you, like this law professor ChatGPT wrongly accused of sexual abuse. Perhaps tools will develop to update the index and fact-check errors, but we all see how well that goes on Wikipedia and Google.
If you are already ranking on SEO, investing early in AIO might put you in a good position for when mainstream adoption occurs. If people really do start getting all their answers via ChatGPT or without leaving the Google Search page via Bard, you'll see traffic decline. To identify how much of your traffic is disrupt-able by AI, Seer Interactive recommends pulling a report that shows how many of your search terms have an answer box in position #1. Ranking number one on Google Bard might be one day more important than ranking number 1 on Google, but the business models for content companies will need to change as they can't serve ads against these results.
AI stands to be a valuable research tool. Yes AIs are biased, but they’re really a reflection of our own biases. All it has done is learned our existing associations (those available on the web, anyway). I know from attending a recent conference that social scientists are actively dreaming up ways to do primary research using models like GPT: understanding how the AI sees the world might tell us something about what our society thinks too. It also stands to be a useful tool for brands who want to do research into the zeitgeist, and understand what associations their brand has, that are reflected in the AI, as well as how similar different brands are.
Software developers will likely have a harder time than other creators to get value out of being used for AI training data. While tools like GitHub Copilot can already handle a significant amount of the work, there’s usually no attribution back to source, and there have even been cases of leaked passwords and API keys. AI code generators stand to benefit the authors of popular libraries and platforms that are now more accessible than ever, as even inexperienced developers can just tab tab tab to autocomplete what used to be a complicated implementation.
This isn’t just limited to Large Language Models (LLMs), Diffusion Models like DALL-E, Midjourney, and Stable Diffusion will also be key. Companies pay hundreds of millions of dollars every year for brand advertising campaigns to reinforce their distinctive brand assets. Now one of those avenues is going to be AI. For example Coke’s brand is so distinctive that when you ask DALL-E for a can of cola, it returns images that are quite clearly of the famous brand. If your visuals become synonymous with your category, expect to get a whole lot of free branding.
For individual artists and designers, they stand to go viral from AI art, like the suddenly famous Greg Rutkowski whose magical and otherworldly fantasy style became incredibly popular amongst early Midjourney and Stable Diffusion users. Because these communities are surprisingly massive – Midjourney has 13m users at the time of writing, up from 1m in just under 7 months – this can be as impactful as going viral on TikTok is for a musician.
Brands and artists will need to figure out their strategy with respect to AI. For some with valuable IP to protect, the answer might be litigation, or for those that can't afford a lawyer, data poisoning. I can’t imagine Disney wanting to allow fan-made Star Wars or Marvel movies, and they have sued for less. We’ll have our Metallica vs Napster moment I’m sure, before platforms are forced to mature and the AI equivalent of Spotify emerges, with a business model that aligns incentives for both sides.
However smaller, more enterprising start-ups and creators might be delighted for the exposure, and embrace fan communities that build on and extend their brand. We might even see some brands release open-source fine-tuned models for faithfully replicating the brand’s tone of voice, embedding characters and products, or following the brand style guide. Expect innovation in this space, with early innovators quickly skyrocketing to household names.
How do you get in the training set?
Both search engines and chatbots have the same goal: to help you get the information you’re looking for. That means most of the things you’re already doing to optimize for SEO will also be beneficial for AIO.
- Use natural language and avoid keyword stuffing - AI platforms are becoming more sophisticated in their ability to understand natural language, so focus on creating content that uses natural language instead of stuffing it with keywords.
- Leverage structured data to help AI platforms better understand the content on your website - Structured data such as schema markup can also help search engines better understand your content and improve your rankings.
- Optimize for featured snippets by answering common questions in a clear and concise manner - Featured snippets appear at the top of search results pages and can improve your visibility and click-through rates.
- Use descriptive and engaging meta titles and descriptions to attract clicks and improve click-through rates - This helps improve your rankings and visibility on search engines.
- Focus on user intent and creating content that meets the needs of your audience - This can help improve engagement, click-through rates, and ultimately, your rankings on search engines.
- Get backlinks from reputable websites to improve your authority - Backlinks are a key factor in improving your rankings on search engines, and being featured in lots of places online maximizes your chances of being picked up in training the AI.
- Monitor your website's performance metrics, such as bounce rates and click-through rates, to identify areas for improvement and optimize your content - This helps ensure that your content is meeting the needs of your audience and providing value.
- Optimize your website's loading speed to improve user experience and reduce bounce rates - A faster loading website can improve engagement and reduce bounce rates, which can in turn improve your rankings.
- Use header tags (H1, H2, H3, etc.) to organize your content and improve readability - This helps AI platforms and search engines better understand the structure of your content and can improve your rankings.
- Use multimedia content such as images and videos to make your content more engaging and shareable - This can improve engagement and click-through rates, which can improve your rankings and visibility on search engines, and if you have a unique style you might get featured in AI art communities.
It's unclear whether Bing can actually "read" the articles it searches on SEO, or if it's using some cached version, or a limited portion of the cache. I have seen instances where it seems to have provided answers based on content within the article, for example based on the heading sections. However when you ask, Bing says that it can only read the actual title and meta description from search results, and is unable to visit a specific URL.
I've had similar issues with Google, which is also hooked up to the internet and supposed to be able to search in realtime. It frequently hallucinates in a way that is quite funny because it's directionally right, but factually wrong. When I asked it to tell me about Vexpower, it hallucinated a co-founder (the brother of someone who created a course on my platform), and that we had raised money (we haven't).
As well as being important for AIs that browse the web, search engine datasets such as Google and Bing’s indexes. Simply ranking on SEO is going to be doubly as important, because it'll help for AIO too. Of course there will be people who act to try and bias the index, just like they do with SEO. The open-source CommonCrawl is probably the most important index to be in, because it's already form the bulk of what these AI models are trained on. Google and Microsoft are the biggest players in both search and AI (Microsoft bought 49% of OpenAI), and anyone else training a model will start with CommonCrawl. For example the data for Meta’s recently leaked model LLaMA was 67% made up of CommonCrawl.
The other obvious place to get yourself inserted is Wikipedia. The open-source encyclopedia is almost always going to be included in anybody’s dataset, because it’s not just high quality and well structured data, it’s also freely available and easy to access. As is the case with SEO, becoming notable in real life is a great way to build authority online too. Coining new terms, publishing books, speaking at events, will all be ways to build your reputation not just with the public, but with AIs too.
There's this idea of an ai.txt file, an analogy to the robots.txt file that governs web scrapers for search engines. Honestly I think this is a kind of dubious analogy because robots.txt already serves its purpose for webscraping. My bet would be that instead there just becomes an additional convention in robots.txt that dictates whether the data can be used in training and/or in browsing plugins.
I think PR is going to see a real boom with this shift in technology. Suddenly being featured in a major publication isn’t just worth it for the exposure: it potentially gets you into the corpus of data they use to train the next model. OpenAI’s ChatGPT is only trained up September 2021 as it stands today, but Bing can look things up on the internet, and it’s likely Google’s Bard will have the same functionality.
For developers making sure your code is on GitHub is likely enough, though answering questions on Stack Overflow should also be a big help. There has never been a better time to write that package or library, because if it gets widely used, it will create a feedback loop where more and more people will get it in autocomplete via CoPilot and ChatGPT. Perhaps we’ll also see libraries and products developed that are optimized for AI, for example using syntax that makes autocomplete easier. I’ve already found that for example CoPilot struggles with React’s nested components.
For artists that actually want to appear in open-source datasets, rather than sue the AI companies, having your work available in the public domain, and featured on websites such as ArtStation, is a good way to get noticed: “trending on Artstation” is a common way to enhance the quality of your generated image. There are also stock photo sites like Unsplash: particularly if it's open source, it's likely to be featured. The communities for AI Art are highly mimetic – everyone copies what everyone else is doing – so seeding each Discord and Reddit group with a few stunning examples of your work, replicated or enhanced by AI, can help spread the word, and check you made it into the index on Lexica Art.
One non-obvious place to insert yourself is in the prompts you use when working with AI. It’s widely expected that the words we input to tools like ChatGPT and DALL-E will be recorded and used to fine-tune the model. They even have a thumbs up / thumbs down button on the side to get feedback on whether the generation met expectations. It may be the case that simply using your own brand name and correcting the AI might be enough to make it into the next version.
Ultimately though, the best way to insert your brand in the conversation has always been to have a strong brand. I'll leave you with this Seth Godin quote:
"If Nike announced that they were opening a hotel, you’d have a good guess what it would be like. If Hyatt said that they were going to start making shoes, you would have NO IDEA WHATSOEVER what those shoes would be like. Because Nike owns a brand and Hyatt owns real estate."