My Local Link Building Playbook

If you’re having trouble coming up with ideas on how to build high impact links for a local business or local landing page, I have some good news: you’re probably way overthinking things!

In this post I’ll walk you through a repeatable process, which requires zero creativity and zero years of experience in the field, to generate link building ideas for your exact vertical that will beat out any level of local competition.

Stop Brainstorming, Start Stealing

Whether the local business you’re trying to build links for is a photographer, criminal defense attorney, lawn care service or any other common type of local business, it’s important to realize they’re far from the only one of that in the world.

In that sense, your business is not special.

But in this specific situation, that’s actually good!

You have a huge advantage when it comes to link building that so many national business and unique startups don’t have. This is because:

  • 100s, if not 1000s, of people have dealt with your exact same problem of building links for a local business in that same industry.
  • Modern SEO tools (i.e. Ahrefs or Moz) let you see exactly(!!) how the successful ones were able to solve that problem.

So, here’s the plan:

  1. Find the highest ranking, same-vertical businesses in OTHER cities.
  2. Pull their best links from a tool like Ahrefs.
  3. Determine how they may have gotten those links.
  4. Steal the best of those ideas and copy them for your business in your city.

The insights from this process will yield far greater results than any freestyle brainstorming session. Plus, it’s simple and repeatable.

What You’ll Need to Do This

You’ll need 3 things to complete this process:

  1. A subscription to Ahrefs – other link indexes are OK, but some specifics in the steps below will be different.
  2. Access to spreadsheet software like Google Drive or Microsoft Excel.
  3. A dozen or so hours of your (or someone else’s) time.

To give an example of the process in action, I’ll be running through it in this article as if I were trying to build links for a local florist or flower delivery service (and if that happens to be your niche, then it’s your lucky day!).

Step 1: Find Who’s Winning in Other Cities

If you’re in the U.S., start with a list of the top cities by population.

Since the biggest cities are usually the most competitive (given their market size / opportunity), the businesses in them most likely spent more on SEO than ones in smaller cities, and as a result, have usually built better (and more) links than businesses in other cities. Those links are the ones we want to try to replicate.

So, start by matching a city name with your head keyword. Then Google it to find the top ranking local businesses in each of those cities. So for flower delivery, I’d start with:

From the organic results for that keyword, look for the local businesses that rank on page 1 and 2. Ignore national listing sites like Yelp and Thumbtack.

Grab their URLs and enter them into the first tab of a spreadsheet, like this one I’ve built here:

https://docs.google.com/spreadsheets/d/1Ac2P4jO_pjmC47qmhpMEiwO8d6D-NtqBfcp7E7BMid0/edit?usp=sharing

Note: local store pages of national brands (i.e. 1-800-Flowers’ New York location page) might not yield many insights for this process because there’s a good chance they’re mainly ranking because of overall domain authority, not local specific link authority. Feel free to include them in your research but standalone local businesses usually yield the most interesting insights.

For my example, I’m doing the first 4 ranking flower shops for each of the top 4 U.S. cities by population, yielding a total of 16 sites. Feel free to go way deeper than that, i.e. top 10 shops for each of the top 15 U.S. cities, especially if you’re planning on doing link building in the same vertical in multiple cities.

Step 2.A: Pull Their Historical Link Data

Plug each of the websites in your spreadsheet into your favorite link index.

If it’s a local business with 1 location, look at all links to the entire domain (“subdomains” filter in Ahrefs). If it’s the local store page of a national / multi-city business, then just look at the links to that local store page URL folder (“path” in Ahrefs).

Now navigate to their Backlinks report page:

Finally, filter for “one link per domain” and then export the data as a CSV.

Do this for each site in your list.

Step 2.B: Combine CSVs and Clean Up Data

To make our lives easier, we’re going to combine all these CSV reports into a single sheet.

To save time, it’s possible to combine a group of CSV files into a single file with automation. I’ve always used the Terminal app on my Macbook, although Googling might yield an easier solution these days.

Otherwise, you can just manually copy & paste the contents of each CSV into a single sheet.

In this new master sheet, we now need to delete some extra columns that Ahrefs gives us but aren’t useful for this use case. I like to immediately delete the following:

  • Platform
  • Referring domains
  • Linked domains
  • Content
  • Nofollow
  • UGC
  • Sponsored
  • Rendered
  • Raw
  • Drop reason
  • Discovered status
  • First seen
  • Last seen
  • Lost
  • Links in group

And then delete these ones as well, after filtering out certain rows:

  • Language – after filtering for / deleting rows with foreign language (for me, rows that either aren’t “en” or blank).
  • Referring page HTTP code – after filtering for / deleting rows with “404” or “403”.
  • Lost status – after filtering for / deleting rows that aren’t

Finally, we’ll add one new column, which I’ll call “Duplicates”. For each referring page, we’re going to check to see how many sites (of the ones we pulled the links of) have that same referring page URL in common (i.e. a single resources page that linked to 3 different flower shops). To do this, we’ll use the following formula in that new column: =COUNTIF(A:A,A2).

Step 2.C: Prepare Sheet for Scanning

During our research process in the next step, we won’t be opening up each and every referring page URL in our spreadsheet. That would be way too time consuming. Instead, the goal is to get good at quickly scanning a list of links / rows, identifying particular ones of interest and then opening those particular URLs for further investigation.

To make the scanning process easier, I recommend doing the following:

  • Freeze and bold the first row. You’ll do a lot of scrolling, so this helps you know which columns are which.
  • Set the “Text Overflow” option to “Clip” for all columns. This allows you to easily see which values are in which cells, and which cells are blank.

  • Widen the Left Context, Anchor & Right Context columns. They’ll have longer values that you’ll want to see the majority of without always having to click to see (and for this reason, I recommend completing the next step using a wide screen so all columns are easily visible).
  • Re-organize columns. To make things easy to scan, my personal preference is to have Referring Page Title / URL in columns A/B, External Links as C and Target URL / Left Context / Anchor / Right Context as D/E/F/G. (These are the values you’ll be scanning the contents of, while all the other values/columns are mainly for filtering and sorting.)

Here’s an example of what that end result spreadsheet might look like (second tab).

Finally, we can now sort the spreadsheet to see the best links of the entire group first. Since “best” is subjective, there are a number of metrics you can switch between sorting by:

  • UR (descending) – to see the most authoritative pages I usually start here.
  • Domain Rating (descending) – to see the most authoritative domains
  • External Links (descending) – to see which linking pages also have a lot of other external links on the page. This helps discover resource pages and other page types that have an abundance of external links.
  • Domain Traffic (descending) – to see the highest trafficked domains first, based on search traffic estimates.
  • Page Traffic (descending) – to see the highest trafficked pages first (also based on search traffic estimates).
  • Keywords – to see the pages that rank in Google’s top 100 results for the highest number of unique keywords.
  • Duplicates (descending) – to find which pages link to multiple sites in our original list (for the same reason we sort by External Links).

In the next step, whenever you’ve scrolled enough on a particular Sort that you’re no longer seeing any authoritative or interesting links at all, change it up and sort by a different column.

Step 3.A: Scan Rows for Interesting Finds

Let’s clarify which columns you’ll be scanning and what they mean:

  • Referring Page Title – the HTML <title> tag of that referring page. Site owners might not always set these correctly, but it’s your best shot at getting an instant idea of what’s on that specific page. Additionally, some websites append their name to the end of the title tag, which could help explain what kind of website they are (i.e. “Pro Golf Weekly” would probably mean it’s a publication about golf).
  • Referring Page URL – just like with the <title> tag, URLs (when clean & readable) can also hint at what’s on the page and what kind of website it is. For the overall website, scan the domain name for clues (i.e. “mikesgardeningblog.com”). For the page itself, check the text in the URL that comes after the domain name (i.e. “/gardening-resources”).
  • External Links – the number of unique external links on the page. It’s helpful to know if the link is on a page by itself with few to no other external links, or, if it’s on a page with dozens or even 100s of others.
  • Left Context – if there’s text preceding a link (if i.e. the link is mid-paragraph), you’ll see that preceding text here.
  • Anchor – the anchor text of the link (or for image links, the Alt text of the image).
  • Right Context – the text that comes after the link (if any).

At first, you might not instantly be able to determine, just by scanning these values, (1) what content is on that referring page and (2) why that particular site has a link on it. That’s OK.

If you’re ever unsure or straight up clueless about either, open up the referring page URL and do a full investigation. You might do this a lot at first, but don’t worry, this is part of the learning process. Open as many as you need to get some answers. Then, compare your findings with the spreadsheet values you tried scanning. Check if there were any subtle hints that you could use for future predictions.

For example, I’ve discovered that pages that have the word “Resources” or “Links” in their title and/or URL, along with a higher than normal External Links value, are likely to be resource pages.

As another example, any URLs that have subfolders for the date (i.e. “/2022/04/07/”) are usually some form of dated content, such as a news story or a blog post. This is a distinctly different type of content from i.e. a resource page, which is important since one (the dated content) is unlikely to be something that’s regularly updated, whereas the other (resource pages) is usually meant to be.

Step 3.B: Investigate Interesting Finds

Eventually, you’ll come across rows that really get the wheels turning. You might find yourself asking a question like: “Why in the hell is a dentist being linked to from (what appears to be) the homepage of a homeless shelter?”

These are rows that I like to label as “interesting finds”. They’re links from quality local sites/pages that don’t appear to fit into any box of all previous link building ideas had. These are the ones you’re looking for.

Let’s now go through the kinds of interesting things that I find to be worth investigating further. Below are 5 primary examples (but know there are others out there).

Local Organization/Entity Types

Ask yourself: “What kinds of local organizations/entities are linking to businesses in my industry? What can I learn from these links in order to get links to my website from those same types of organizations/entities in my own city?”

Most cities have the same types of local organizations/entities. Libraries exist in almost every city. Food shelters exist in almost every city. Even programming meetups exist in almost every city.

And you know what? Every library generally has the same goals, deals with the same issues and has the same kinds of webpages on their websites. The same goes for food shelters, programming meetups and other common local entity types.

So, if you can figure out why a business got a link from i.e. a library in their city, you can probably use that same thinking to get a link from a library in your city. Then replace “library” with any other local entity type you come across.

Example: while looking at links of other pet groomers, I notice a link from “XYZ Animal Shelter” or “XYZ Humane Society”. Given the relevance (animals/pets) and the fact that there are shelters and humane societies in every city, I’d definitely want to look into the context of those links to deduce why they’re linking and use that idea to get links from shelters / human societies in my own city.

Example #2: now imagine looking at links of personal injury lawyers and seeing those same animal welfare organizations linking to them. That seems… odd? I probably couldn’t guess why they’re linking based alone off of what type of organization they are, so I’d definitely want to investigate further (i.e. are they a general sponsor? Did they partner to create some kind of animal defense legal fund? Etc.).

So, in conclusion, be on the lookout for links from local non-profits, publicly funded entities, companies and community groups that exist in nearly every city. Those examples usually yield some of the most easily transferrable/replicable ways to get links.

Local News Coverage

Ask yourself: “What press coverage have local businesses in my industry gotten? Why did they get it? Can we do the same (or similar) to earn press coverage too?”

While you might not be able to carbon copy a campaign that earns national press, usually you can successfully copy a campaign that earns local press.

Example: while looking at links of other barber shops, I notice a local news article about a charitable event in which free haircuts were given out to the homeless. In the article, a specific barber shop was mentioned as the one that organized / hosted it. I won’t be original for doing the same type of event for my barbershop in my city, but chances are, it’ll work just as well for getting similar local press coverage.

Local Business Partnerships

Ask yourself: “What types of local businesses might link to the like-for-like businesses in my industry? Do they have a formal business relationship, or if not, why might they be linking?”

In some industries, there are natural synergies between businesses of one type and another (i.e. a hotel linking to nearby vacation excursion companies).

Example: while looking at links of other home inspectors, I notice some of their best links are from local real estate agents listing off preferred vendors. While it could be worthwhile to pursue a partnership with local agents solely for the leads, it might be tough with established agents that already have formal / contractual relationships in place with another inspector. In those cases, it might be worth figuring out a price tag for solely being mentioned as “a” preferred vendor on their website, nothing more, which is likely something they’d at least consider.

Local Blog Mentions

Ask yourself: “What kinds of local themed blogs have mentioned businesses in my industry? What were the specific blog posts about that included the link, and why exactly were they mentioned in that context? Can I use those insights to get links from the same kinds of blogs in my area?”

While there’s potential for getting links from national blogs (ones that aren’t specific to a given geography), we’re once again going to focus more on local ones due to the ease of replication.

Example: while looking at links to other escape rooms, I notice a local blogger wrote a full article review of an escape room in their city. Their blog has a big focus on recreational activities for families, and includes reviews on other local attraction types (i.e. theme parks, kid museums, mini golf courses, etc.). Based on that information, I can do some Google prospecting searches (i.e. for “[local attraction in your city] review”) to find the highly similar local blogs that would probably be down to visit and review your escape room too (if you reached out and offered them 2 free tickets!).

Resource Pages

Ask yourself: “What kinds of pages (with high external links counts) link to businesses in my industry? What are the topics of those pages, and what kinds of websites are they on? Do those same kinds of webpages exist in my city?”

Example: while looking at links to other bike shops, I notice that one of them is getting a link from a page with 100 other external links. The page’s <title> tag says “Where to Buy | [Small Bike Brand]”. It turns out the page is from a bike part manufacturer that lists off all the stores across the country that sell their products. This finding might lead me to make a list of all the different manufacturers/brands we carry products of, find each of their websites and look to see if they have the same type of page / link to their retailers, too. If they do, I’d reach out and make sure we get included as well.

My Findings for Florists

Beyond the previously stated examples, here are some interesting links I discovered for local flower shops and what ideas they might lead to:

  • Discount Programs – a NY performing arts center offers memberships to those who want to show their support. Membership perks include discounts at local area businesses, and listed on that page is a florist. I could easily make a list of local organizations in my area with a membership perks webpage (or similar), and reach out to offer their members a 10% discount or whatever so that we could be listed too.
  • Event Credits in Blog Posts – a wedding blogger published a post about a specific wedding with dozens of photos of the bride & groom. At the end of the post, they included links to the various vendors involved (i.e. photographer, caterer, etc.), which included a florist. I also found an example of a different type of event (birthday party) with the same result. I could find local bloggers that do full wedding write ups and offer to give free flowers for an upcoming wedding that they’re going to write a recap about.
  • Non-Profit Partnerships – a Chicago flower shop partnered with a local parks & ref foundation by donating 25% of one month’s profits, winning them both a link and a potential boost in revenue. I doubt you’d have to copy this campaign exactly to win links from nonprofits – the important thing is making sure they economically benefit in some way.
  • Wedding Vendor Lists – a local newspaper has a huge list of wedding vendors that included one of the florists in our list. There might not be a lot of these per city, but doing Google searches for wedding vendors in your city could yield a few resource pages / directories / etc a florist could be included in.

To be honest, the above 4 finds are only scratching the surface. If we spent another few hours digging through the data, I’m sure we could come up with 2-3x more examples like those.

Step 4: Choose the Best Ideas and Replicate Them

At this point, you should have a running list of ideas of possible campaigns to choose from to replicate in your city. Here are some things to keep in mind when choosing which to pursue.

Think at Scale

The best ideas are scalable – one idea being applicable to getting a link from dozens, or even hundreds, of obvious, viable targets.

So for example, if you figured out a great angle to get links from local art museums, and there’s only 1 art museum in your city, then the full potential of that campaign is 1 link (which is… not ideal).

Think by Resource Requirements

The best ideas are also ones that are viable from a resource requirement perspective. For example, you might find a company in another city got local press coverage by offering a $20,000 scholarship. For you, getting internal buy-in on a $20k expense that’s solely for link building purpose (and those links aren’t even guaranteed), is usually quite difficult.

National vs Local Opportunities

To echo what I mentioned earlier, I usually try to key more into opportunities that get links from other local websites, as opposed to national ones (although this can vary by niche). This is because the bar is (usually) lower for getting links from local websites, and you can usually carbon copy a campaign from another city and apply it directly in your city, with similar effect.

However, if you notice something you can replicate from relevant non-local sites, by all means, go wild. Just realize you don’t necessarily need to in a lot of industries.

Business Directories / Listing Sites

It’s inevitable that you’ll come across a directory or listing site in which you can just submit your website/business to be listed. Are these worth it?

I don’t think they’re all complete garbage, but in general, most national business listing sites won’t do much for your rankings (and in some cases, are crappy enough that I’d rather not have them). I really wouldn’t recommend going past the first 15 on this list (and maybe even the top 6).

The reason is because users don’t actually give two shits about a lot of these types of sites. That, and because there’s no real vetting process for being listed, so all Google sees is noise that’s better filtered out than utilized.

With that said, if you find an industry or city specific directory, that could be a worthwhile link to get. They’re just rare. Check to make sure they’re not only indexed by Google but, according to a tool like Ahrefs, actually rank for a decent amount of organic keywords / get at least a bit of organic search traffic (100+ visits/month, maybe more).

Final Thoughts

Don’t reinvent the wheel. Look at links to the same types of sites/businesses in other, more competitive cities, take the best of the ideas you can glean from them, then rinse and repeat in your city. It’s really that simple.

The Purge: Why Generative A.I. is Coming for SEO Content Traffic, Jobs & Dependent Websites

If you create content primarily for SEO purposes, or if your website is dependent on content that gets organic traffic from search engines, then there’s a very real possibility your skills and/or business will become redundant and lose significant value in the near future.

Because after what Google just announced on May 10th, showing off examples like this and this of what future SERPs might look like, Shit. Just. Got. Real.

I’m writing this so that we as an industry of SEOs and content creators collectively wake the fuck up to what’s currently unfolding, potential best/worst case scenarios of the (very near) future and what decisions we should consider making right now to adapt and survive.

Quick Disclaimers

  • I cannot predict the future.
  • I myself am an SEO consultant and owner/operator of the types of sites most at risk.
  • I’m U.S. based and therefore will focus most on the situation in the U.S., i.e. on regulation.
  • This is a rapidly developing situation. My personal views are constantly adapting as well. 
  • The changes I talk about will affect some SEO content jobs/businesses more than others.

Too Long; Didn’t Read

Google, ChatGPT and others are using new generative A.I. technology to exploit your website’s content for their profit without providing any benefit to you. They’re doing so without your knowledge or consent, taking answers from your content to give to their users without them having to visit your website and without even crediting you (they don’t cite their sources). What they’re doing has currently not been declared illegal under current U.S. copyright law. And there is little to nothing you can do, as of now, to get them to stop. So as a result, unless something drastic happens, there’s a very real scenario in which you will soon see a steady, continuous decline in organic search traffic for certain types of content, and it will have nothing to do with your content’s organic search rankings.

Table of Contents

The Past: What Just Happened & How We Got Here

The Present: Important Additional Context You Should Know

The Future: Wild Speculations About What Happens Next

The Past: What Just Happened & How We Got Here

Let’s start by reviewing the following events:

  1. A new, powerful and disruptive form of A.I. was developed (“Large Language Models”).
  2. That technology, made famous by ChatGPT, blew up and became super popular, super quickly.
  3. In response, Google made some big announcements around A.I. and Search.

Let’s walk through each.

What Exactly are “Large Language Models (LLMs)”?

Any text that Google (or anyone else) labels as “Generative A.I.” is most commonly generated by a Large Language Model (LLM), a new form of artificial intelligence.

LLMs first emerged in 2018 but only became popular in the last year or so because of the emergence of the first LLM that went mainstream: OpenAI’s ChatGPT.

If you’re not familiar with ChatGPT, it’s difficult to define because its capabilities expand by the day. But for the sake of this post, we’re going to focus on the use of ChatGPT and LLMs as chatbots: you “prompt” them with a question (or command), they send back a response.

Chatbots aren’t new. But they were never really that good. At least, not until November 2022. That was when ChatGPT (Model 3.5) was released. While past versions of GPT showed promise, it wasn’t until this particular release that the potential of this new technology put the whole world on notice: ChatGPT, and Generative A.I. tools in general, are an absolute game changer.

How Do LLMs (Like ChatGPT) Work?

LLMs are complex, but I’ll try my best to explain the basic idea of how they work.

To start, an LLM begins by being given “training data”, the information it uses to “learn” from. Think: scanned pages of books, historical financial databases, copies of every Wikipedia page, etc.

It uses this corpus of information to “learn” the laws of human language and the relationships between words. We can think of this as breaking down into two key parts (although the model itself thinks of them as one combined thing, not separate):

  1. How humans talk/write – mainly to understand what we’re asking and how to answer it.
  2. Various facts, ideas, concepts, etc – the raw inputs to pull from that forms its collective “knowledge”.

The reason for #2 is obvious: if you want an answer to a question, the LLM needs to know certain facts and background information to answer it.

But #1 is the real magic of LLMs and why they’re such a technological breakthrough: by feeding it articles, forum posts, video transcriptions, etc., it can start to understand the words we use in the order we use them and what that means (as if the answer to a question is a math formula).

Once it’s done analyzing what you’re asking for, it “thinks”, and then generates a response in natural human language that (hopefully) gives you what you’re looking for.

But here’s the key thing you need to understand: LLM responses/answers are usually unique strings of text. As in: if you were to Google the full bit, in quotes, you wouldn’t find that paragraph or bullet list or whatever word-for-word somewhere on the internet. It’s usually not using a single source and repeating an answer exactly from a particular place. Each response is the culmination of all its combined learnings from all its training data (which is important to remember when we discuss copyright law).

Hence, the name “Generative A.I.”. It’s generating a response on-the-fly.

2. Why Does Google Care About LLMs?

Google makes money because the internet goes to search engines to get answers (whether that answer is another website, a product, a fact, idea, etc.) and because they’re the most popular search engine (92% market share).

They make 56% of their revenue (as of Q4 2022) from search ads.

So, if people were to stop going to search engines for answers, and/or stop using Google in particular, then their trillion dollar market cap will start to deteriorate. They have more to lose than anyone when it comes to changes in behavior for how, and where, users go to get answers on the internet.

Which is why, just 3 months after launching in November 2022, when Reuters reported ChatGPT had crossed over 100 million users (making it the fastest product to achieve that level of adoption in all of human history), Google must have been sweating. Profusely.

For the first time since becoming the most popular search engine in the world, Google seemingly has real competition. And ironically, when I Googled to find out when they first became the most popular search engine, they incorrectly say 2000. I then asked ChatGPT, which correctly told me 2004.

That example, along with so many others, leads us to the next big reason why Google must be sweating – it doesn’t take a rocket scientist to see just how good LLMs like ChatGPT already are for a lot of things.

NOTE: if you haven’t tried ChatGPT yourself – please – stop reading this, sign up and start asking it things. Ask it for advice with an issue you’re dealing with, or recipe ideas based on the ingredients in your fridge, or to write you a poem that’s tailored to your partner’s specific interests (more ideas here). Reading about it doesn’t do it justice – the longer you wait to try it out, the further you’ll fall behind.

Now don’t get me wrong – ChatGPT and other LLMs are far from perfect. In particular, they’re heavily criticized for “hallucinating”, or confidently saying incorrect things.

But between that Reuters report, what everyone can see with the naked eye, and with what Google’s own A.I. engineers must have been telling their own executives behind closed doors, it’s pretty obvious this emerging technology is a real threat to Google’s trillion dollar monopoly.

Which brings us to the innovator’s dilemma. Google doesn’t want to disrupt themselves. They hate what’s happening with generative A.I. and LLMs in particular. At the very least, it creates a lot of uncertainty around future revenue and profits. At the very most, it could destroy profit margins or kill their monopolistic hold on question and answering entirely.

But if they don’t disrupt themselves, the thinking is, someone else will. And ideally, this “disruption” happens as slowly as possible so they have time to adapt their search ads business with minimal negative impacts. But the slower they adapt, the more they risk becoming the next Blockbuster.

So, because of this existential threat and the dilemma they’re in, Google had no choice but to respond in some big, scary ways.

3. What Exactly Did Google Announce on May 10th?

May 10th, 2023 is the day Google held their I/O conference. With investors nervous about ChatGPT breathing down their necks, it’s understandable the conference focused heavily on A.I.

A lot of things were announced that day, but the announcement I previously highlighted was, in my opinion, the biggest. It included this (excruciatingly upbeat-sounding) 92 second video.

These “SGE” features shown are only in beta. They’re currently not shown to users who haven’t specifically signed up to start seeing them (by going here). But if and when that changes, there will be massive negative ramifications for global SEO traffic.

This is because, if you look closely at the example screenshots shown, you’ll see that information and answers are provided without citing any sources. And, as opposed to current Featured Snippets, they’re showing much larger amounts of content, much richer information and even clickable links for additional follow up searches (i.e. for items listed) so that you never have to leave Google (unless you’re clicking an ad).

Which means that, if you thought the current trend in zero-click searches was already bad, buckle up. It could get so, so much worse.

And as if that’s not crazy enough – Lily Ray pointed out on Twitter that this content is sometimes a word-for-word copy from a particular website and therefore literally not Generative A.I. That’s instead just a straight up un-attributed Featured Snippet and, arguably, a blatant violation of U.S. copyright law. Even if Google considers this particular issue a bug and fixes this from happening again (although the Google rep that responded by no means responded like it was), it shows us how irresponsibly fast the Web’s biggest gatekeeper is moving here.

Make no mistake: this is how a scared, trillion dollar incumbent acts when going through an existential crisis. And if you don’t believe they are, then take it from one of Google’s own researchers.

It’s impossible to know how quickly Google’s SGE feature will improve and be adopted (whether users opt-in or not). There’s also a glimmer of hope that Google might change course and start citing their sources in this new experience.

But with that said, given the environment they’re operating in, I wouldn’t count on anything at this point.

The Present: Important Additional Context You Should Know

That brings us to the present. But before we can start speculating about the future, we also need to understand:

Where Do LLMs Currently Excel (and Not Excel)?

The distribution of disruption to online content providers by generative A.I. will be unique. It won’t all hit us the same. It will hit certain industries more than others. To understand why, it’s important to understand where LLMs currently do well, and where they fall short.

As of right now, today, LLMs are a way better experience than traditional search for:

  • Idea generation – i.e. “recipes”, “places to go” and “fan fiction” are all examples of keyword groupings that focus less on objective truths and instead on possibilities that fit within certain guidelines.
  • Information with low (to no) rate of change – i.e. “how to fill a nail hole in drywall”, “what happened at the battle of Gettysburg”, and “what is entrepreneurship” are all keyword examples that don’t really need much updating based on new information.
  • Non-fringe topics / information that has been replicated in numerous places – i.e. there have been 1000s of articles previously written on the topic of “how to get over a breakup”. This gives LLMs lots and lots of practice to figure out a perfect answer during the training process. This is why I expect LLMs will impact B2C search way more than B2B.

On the flip side, the most popularly criticized LLM use case today is when they’re asked to provide objective truths in “high stake” situations.

An example of high stakes would be a law firm using it to find case law examples to cite in a lawsuit, or a doctor using it to diagnose cancer. In both situations, their users have a much greater need for certainty. Even if an LLM was right 95% of the time, they’d still want to fact check it (given the stakes), and therefore they wouldn’t really replace a traditional Google search with an LLM prompt, they’d end up just doing both. Which wouldn’t make LLMs that useful.

So in informational areas where the stakes are higher for users, I expect LLM disruption will take an extra year or two or five to reach them.

Where do ChatGPT & Other LLMs Get Their Answers From?

Every LLM, including ChatGPT, is only as good as its training data.

An LLM’s training data is the information it’s fed to “learn”. This could be things like scanned pages of books, statistical databases (i.e. stock prices) or scraped webpages (i.e. from Wikipedia).

While clean, non-public datasets can be a huge differentiator for LLMs, the last category (scraped webpages) is the most scalable and, as a result, what the majority of LLMs depend most on for generating their answers.

The only problem is, scraping the whole internet is hard. While Google and OpenAI have the necessary resources to do so, most open source LLMs do not. Instead, they most commonly turn to one of two publicly available datasets.

The first dataset comes from Common Crawl, a nonprofit that’s been scraping the Web since 2013. Over those 10 years, they’ve gotten really good at it – their March/April 2023 dataset contains 3.1 billion webpages from 34 million unique domains. Their mission to “democratize access to web information” was originally popularized for lowering the barrier to entry for new startup search engines to compete with Google. However, most mentions of them and their datasets these days are in relation to their use by LLMs for training purposes.

The second dataset comes from Google, the most experienced and successful web scraping company of all time. What’s interesting though is that the dataset they provide, known as the C4 dataset, is simply just Common Crawl’s data but cleaned up (meaning: a smaller index of webpages with low quality ones removed).

Google’s move is understandable – they won’t, and probably never will, just freely give away what is ultimately their trillion dollar secret: the database of webpages they’ve scraped and the answers they derive from that content. Instead, they can just clean up an existing dataset, barely giving away any of their special sauce, and get all of the social credit of contributing to open source.

In the end though, it’s worth noting that Google has one of the biggest long-term advantages for building an industry leading LLM. Their decades of experience scraping the web and their internal database of already-scraped-webpages makes them an early favorite to win out in the LLM wars to come.

Do LLMs Cite Their Sources?

No. ChatGPT, Google Bard and most other LLMs do not cite their sources.

Only Google Bard freely provides sources for an answer, but only after directly asking it for sources. For ChatGPT specifically, it takes clever prompt engineering to get it to provide any sources in any situation, and even then, it apparently hallucinates and gives made up sources often.

This isn’t necessarily as much ill intent as you might think. A core issue with LLMs is that they don’t even know exactly how they came up with their answer and, hence, where that answer came from. As of today, only Google and Bing have made intentional effort to curtail this characteristic (by sometimes opting to answer based on a specific high ranking source’s information instead of a normal, “pure” LLM response).

Here’s ChatGPT’s own explanation for why it doesn’t provide sources:

> “OpenAI’s ChatGPT is a machine learning model that was trained on a massive amount of text data from the internet. The specific sources of this text data are not retained in the model because the primary focus during the training process was to create a model that can generate coherent and informative text, rather than tracking the sources of the data used to train it. Additionally, keeping track of the sources for all the data used in the training process would require a significant amount of computational and storage resources”

Is My Website Being Used to Train LLMs?

One way to find out is by searching here to see if your website is included in Google’s C4 dataset. If it is, then it means your website’s content is being used to train numerous LLMs.

Note: if you’re wondering, the answer is “no” – there isn’t anything you can do about it. There is currently no way to formally request Google, or Common Crawl, to remove previously made copies of your website’s content from their indexes. The only thing you can do is block Common Crawl from making new copies.

A second way to find out is by asking it questions about your company, product, employees, etc that are only answered on your website and not elsewhere online.

What is the Current State of U.S. Regulation on Copyright & A.I. Training Data?

Note: I’m not a lawyer and this section does not constitute legal advice. 

Congress issued a report on May 11th clarifying that we’re not sure yet if the current use of copyrighted works to train A.I. models is illegal:

> “A.I. companies may argue that their training processes constitute fair use and are therefore noninfringing…”

> “These arguments may soon be tested in court, as plaintiffs have recently filed multiple lawsuits alleging copyright infringement via A.I. training processes. On January 13, 2023, several artists filed a putative class action lawsuit alleging their copyrights were infringed in the training of A.I. image programs, including Midjourney and Stable Diffusion. The class action lawsuit claims that defendants “downloaded or otherwise acquired copies of billions of copyrighted images without permission” to use as “training images,” making and storing copies of those images without the artists’ consent. Similarly, on February 3, 2023, Getty Images filed a lawsuit alleging that “Stability A.I. has copied at least 12 million copyrighted images from Getty Images’ websites . . . in order to train its Stable Diffusion model.” Both lawsuits appear to dispute any characterization of fair use…”

While those lawsuits focus on copyrighted images, they should lead to some kind of legal precedent that applies to copyrighted text as well. The only issue is, lawsuits can take a while. The average class action lawsuit in the U.S. takes two to three years to resolve. So in the meantime, A.I. models will continue to train using copyrighted content, without anyone’s permission, whether we like it or not.

What is the Current State of the LLM Industry Landscape?

While ChatGPT (and ChatGPT alone) blazed the trail for widespread user adoption of LLMs, competition is quickly heating up.

Today there are now dozens of open-source LLM models. Some already perform better than ChatGPT, depending on the method of evaluation. One recently released open source model (MozaicML) performed quite competitively despite costing only $200,000 to train. Another (MLC) aimed to be so lightweight that it can be deployed on an iPhone. Those two examples, and many others, show that the core technology behind LLMs is quickly becoming commoditized.

On the opposite side of things (closed source), the most important one to watch is Google’s own LLM interface, Google Bard, which received a massive upgrade on May 10th. Bard can now check the Web for more recent info when its training data falls short (example). While ChatGPT’s Browsing plugin was created to do the same, Bard’s experience is much more seamless. You don’t need to tell it when it should do this – it figures out when it needs to do so all on its own. This feature, in my opinion, makes Bard arguably the most useful LLM currently available (for question and answering).

Important Note: U.S tech giants aren’t the only ones working on this. China’s Alibaba and Baidu are rapidly developing their own LLM models, too. 

The Future: Wild Speculations About What Happens Next

Finally, the fun (or not so fun) part: making wild speculations about the future of our industry.

Here are the areas I’m thinking about in order to determine what possible best and worst case scenarios could play out (and their probabilities of happening).

Will Copyright Holders Get Regulatory Protection?

Whether our government chooses to regulate A.I. or have a “laissez-faire” approach will have a huge impact on the futures of all content creation industries (i.e. music, photography, art, etc.), not just our own.

On one hand, massive amounts of job loss could result in the necessary political pressure to force regulators to step in. While this scenario is more reactive than proactive, I think it’s the best argument in the “hopeful for regulation” camp.

On the other hand, I’m personally not too hopeful for these main reasons:

  • What didn’t happen with crypto – the crypto industry has been asking for clear regulation for years. Despite the industry’s decade plus of existence and current trillion dollar plus market cap, regulators continue their reluctance in providing any kind of clear regulatory guidance.
  • What was (and wasn’t) said in the OpenAI congressional hearing – if you check the transcript, the word “copyright” was barely mentioned (only 8 times out of ~30,000 spoken words). Instead, the hearing mainly focused on A.I. licensing and safety measures and how A.I. related to each of the representatives personal agendas.
  • Competition from global political powers – stopping or slowing down A.I. development in the U.S. will just mean that tech giants in other countries get ahead. Japan just announced they won’t protect copyright holders from A.I. training. Expect China, who always has shown little to no regard for copyright law, to do the same. On the bright side, the European Union seems to be on the precipice of new protections. But overall, while it’s impossible to know how this unfold, there is no way that regulators aren’t weighing the factor of famling head the advacements made by major politocal leaders.

If They Do, When Might That Happen?

Even if regulatory protection happens, it could be too little, too late for those who currently have the most to lose.

For example, if you’re a stock photographer, you might be struggling for work right now. You don’t have 2 years to wait for regulation to protect your industry. I can’t help but think that Shutterstock, Getty Images and other companies will incur massive layoffs well before their pending lawsuits get resolved.

If They Do, What Might the New Rules Be?

The most obvious rule (in my personal opinion) that regulators could put in place would be getting the consent of copyright holders to use their works to train A.I. models.

But at this stage it’s really impossible to say what the rule(s) would be.

Maybe A.I. companies will only need to inform copyright holders that their work is being used? Or maybe just allow for copyright holders to manually request their works not to be used (like a DMCA takedown request)? Or maybe different types of content are handled differently (like text content being OK to use if properly cited as a source alongside the answer)?

Will LLMs Continue to (Rapidly) Improve?

If you’ve been paying close attention to generative A.I. and LLMs in particular since the second half of 2022, you might have had the same reaction of absolute disbelief that I had when ChatGPT 4 was released.

I reacted that way because ChatGPT 3.5, which on its own was a huge improvement on the previous model, had come out less than 5 months earlier.

It doesn’t look like that rate of pace will continue (at least in the short-term) – in April, Sam Altman said they hadn’t yet started training a model 5, making a release not too likely to drop prior to the end of 2023.

It’s impossible to know where we’re at on the curve of technological progress with generative A.I. and LLMs specifically.

Past technological hype cycles have shown us that short periods of rapid progress can be followed by years of little to no progress, such as with self-driving cars and virtual reality. Both examples made a lot of progress between 2010-2016, giving rise to grand predictions of mass adoption that would soon follow but didn’t exactly pan out.

How Might LLMs Continue to Improve?

It’s important to consider the ways that LLMs might continue to improve and for which particular use cases they will improve most for next.

At the very least, LLMs could meaningfully improve without any changes in the underlying technology. This is because, if you remember, LLMs are “only as good as their training data”. That statement is bi-directional. We could see factual accuracy increase from ChatGPT and others simply from the cleaning of existing datasets or through training on entirely new datasets.

(And if popular LLMs were to ever start training on customer data, we could see another factor of magnitude improvement in quality.)

How Else Might the Training Data Landscape Change?

Reddit and Stack Overflow have both declared they’re cool with letting A.I. models train on their user’s content, as long as A.I. companies pay for that right to do so.

I expect every major UGC site will follow their lead and have the same stance. Think: Twitter, Yelp, etc.

Which creates a very unfortunate dynamic for professional content creators: if the information contained in their content has been created and left by users on platform websites (i.e. Amazon reviews, Reddit comments, Facebook posts, etc.), that content will probably be unapologetically used against them.

Which means that, even if copyright holders win the legal right of consent for training A.I., UGC platforms will happily line up to cut deals with A.I. companies and sell everyone out.

For example, if Amazon decided to license their customer review data to Google for training their LLM, all Amazon affiliate sites would essentially be screwed. Because it means Google would then be able to provide no-attribution-needed content directly on SERPs (i.e. like this) in a way that’s not legally or ethically questionable long-term.

How Will the User Adoption of LLMs Play Out From Here?

Even if the tech and user experience of LLMs gets unanimous acclaim as being objectively better than traditional web search, it doesn’t guarantee new users will stop using Google.

According to the founder of Neeva, a now-defunct startup search engine, the hardest part of mainstream adoption of LLMs might not have anything to do with the tech or UX:

> “Throughout this journey, we’ve discovered that it is one thing to build a search engine, and an entirely different thing to convince regular users of the need to switch to a better choice,”

> “Contrary to popular belief, convincing users to pay for a better experience was actually a less difficult problem compared to getting them to try a new search engine in the first place.”

So not only does Google have a huge advantage in the coming LLM wars due to its extensive experience with web scraping, they also have what’s likely the biggest advantage of all: pre-existing mass adoption by an incredibly sticky user base. This is why, out of all scenarios, it’s most likely that users will “adopt” LLMs without a single change in behavior by continuing to go to Google.com but getting generative A.I. content served to them instead of featured snippets.

But, let’s put that scenario aside for a second and consider the adoption of LLMs beyond Google’s SGE feature.

At What Rate Might ChatGPT & Other LLMs Win User Adoption?

To assess how likely native LLM interfaces are to gain mainstream user adoption (and at what pace they might do so at), let’s consider arguably the best thing ChatGPT (currently) has going for it: Plugins.

Plugins allow developers to integrate ChatGPT with their products, essentially functioning as ChatGPT’s app store and increasing ChatGPT’s utility by an order of magnitude.

ChatGPT can now solve complex math equationsorder groceries and talk to all your favorite work and non-work apps. And those capabilities are just a few examples of what’s to come. Most developers that signed up to start building their own Plugin are either still on the waitlist or only recently got off it.

This is why you can’t write off LLM interfaces as Google replacements. Because so far, we’ve barely seen what they’re capable of. With Plugins, ChatGPT could completely change the game of question and answering by allowing users to then do things with that information without any break in between.

Imagine this LLM prompt: “look up the best 10 sushi restaurants in New York, call them to see if they have any open reservations for Friday night, then text my girlfriend the list of available options, which times are available to book and pictures of their menus.”

In a future where that prompt is flawlessly executed, information gathering is just one piece of the application’s directive. The information gathering process essentially becomes invisible to the user. That future is what Google should most be scared of.

Will Users Trust LLM Answers?

Mainstream user adoption of LLMs requires users to trust their answers in two important ways:

  • Factual accuracy
  • Bias

The first will be much easier for ChatGPT and others to deal with long-term. This is because factual accuracy is (relatively more) measurable. There’s most likely some kind of break-even point where users are OK with the rate of error, at which point users could start being convinced that they can now blindly trust responses as being objectively true.

The other type of trust, bias, is much thornier to future user adoption of LLMs. If users are skeptical of answers and think LLMs are trying to manipulate them, it might be incredibly difficult to change their minds.

Enter: the U.S. mainstream media.

How the media covers ChatGPT and other generative A.I. tools in the near future will most certainly play a big role in their adoption. We’ve barely seen the potential of this so far but I believe this could be the biggest dark horse to mainstream adoption and could change everything on a dime.

Imagine negative anecdotal headlines like:

  • “ChatGPT caught spreading misinformation about new statewide book ban.”
  • “Racist logo for kids’ soccer team was ‘Midjourney’s idea, not mine, I didn’t even notice,’ says local mom.”
  • “Dog owner trusted ChatGPT to find out if their beloved pet, Snickers, could safely eat avocados. ChatGPT was wrong and Snickers is now gone.”

While only 1 example (the first) attempts to stoke fears of intentional bias, all are headline types that could sway public sentiment on A.I. tools before they even try them.

What Are Google’s Next Moves?

Undoubtedly, the biggest question surrounding A.I.’s future impact on our industry is the potential rollout of Google’s new SGE feature (in its current form).

On that front, it’s important to consider the following:

  • Will a meaningful number of users find out about and opt-in to SGE?
  • Will Google reverse course on providing clearer sources/citations for answers?
  • How quickly might Google roll out SGE for non opt-in users?
  • Will Google roll out SGE content for some query types vs others?

The answer to those questions are anyone’s guess. I’ve barely formed any kind of real opinion of substance on any of them so, for now, I’ll keep my thoughts to myself.

Instead, I want to focus on another developing Google situation worthy of a place on your radar.

How Will Google Crawl the Web in the Future?

The best thing going for Google Bard at the moment is its ability to scrape the web for more recent information when it needs to (i.e. when it’s asked to summarize a recent event).

Since Google Bard is scraping a website and providing little to no benefit to that website in return (by stealing a potential web visit and not even crediting it as a source without asking), the owner of that website might wisely wonder: “Should I block Google Bard from crawling my website?”

For now, I believe the answer to that is a resounding “Yes”. Don’t let them eat your lunch for free.

But what if it wasn’t that simple?

Consider this terrifying reality:

Does Google crawl websites for Search and Bard using the same crawling agent name? Meaning, in that scenario, that it would be impossible to block Bard from crawling your website without also de-indexing yourself?

While I’m not 100% sure what the answer to that question is today, I did find coincidental evidence that indicates Google isn’t currently doing this.

On April 20th, just a few weeks before Bard rolled out live web scraping, Google launched “googleother”, a new crawling agent. It’s currently unclear if this agent name is what Bard is crawling under. There is nothing in that release that actually indicates when Google uses that agent. Their own support docs also have zero mentions of “Bard”.

Whether or not Google is currently crawling for Search and Bard under different crawling agent names is, to be frank, not too important. Bard has barely any user adoption at this point.

But what if that were to change? What if Bard goes neck-and-neck with ChatGPT in terms of users? If it does, Google has a trap card in its back pocket to extort the Web into playing ball with it while giving back increasingly less value to the websites it’s pulling answers from.

How Will We as an Industry Respond to the Threat of A.I.?

Lastly (and probably of least importance, unfortunately), it’s time to consider how we individually and collectively respond to this growing threat.

Will the threat of A.I. be taken seriously in our industry (by listening more closely to these people), or will we continue to be dismissive of technology that’s currently being compared to nuclear weapons?

If we do finally take it seriously, what will we do in response?

  • As individuals, will we choose to block Common Crawl, Googleother (Bard), ChatGPT’s Browsing plugin and others in our robots.txt files?
  • As a group, will we support organizations like the Human Artistry Campaign that are fighting for our rights as copyright holders?

I have no idea what the answers to those questions are. I do hope, though, that more of us are now at least considering them.

Special thanks to Kai Isaac, Paul May, Ryan Mclaughlin, Grant Merriel and Alan Morte for the discussions that lead to the takeaways in this article.

SEO & Generative AI: Roundtable Discussion

After Google’s May 10th announcement of new generative AI functionalities coming soon to search, I wanted to host a discussion (recorded on May 16th) with a few of the SEO and digital marketing industry’s brightest and most respected visionaries to talk about what this new future may look like and what it could mean for SEO as a marketing channel.

My guests for this discussion were:

With that said, there are other experts that weren’t in this discussion that I equally recommend following and keeping up with if you’d like to stay up to date with how generative AI is changing SEO (and digital marketing in general). They include:

Beyond that – I hope you enjoy the discussion!

Note: to help those out of the loop, I started the video by recapping the specifics of Google’s announcement, Google’s history of increasingly displaying info directly on search results pages, what “Large Language Models” are (a.k.a. the tech that Google coins as “Generative AI”) and how LLMs work. If none of that is news to you, then skip straight to 17:01 for when the real discussion begins. 

Video Transcript

Jon Cooper (00:00:00):
Awesome. Okay. Hello everyone. My name is Jon Cooper from Hyperlinks Media, and I am joined by three colleagues of mine that are incredibly intelligent and experienced in the world of SEO and digital marketing. I brought them here today to have what I think is a very important discussion related to a announcement that Google made. At the time of this recording, it was six days ago. It was on May 10th, 2023 in which Google announced their plans for new technology being integrated directly into search results in a much more native experience. And so I thought this could be something that could really impact the field of SEO. This could really impact the way that people behave on the internet. And I think that a lot of people that do SEO, whether as a consultant or as an online business that just depends on SEO as an important marketing channel.

(00:01:14):
I think that there, at the very least, is potential for some big shakeups in terms of those traffic channels potentially being in jeopardy. And just making sure that everyone’s aware of what’s happening, what’s changing based off of all these different advancements in AI, and specifically what Google is working on and building that into their native search experience. So with that said, I’m going to introduce these three amazing guests. So we’ve got AJ Kohn from Blind Five Year Old. We’ve got Marie Haynes from Marie Haynes Consulting. And Cyrus Shepard from Zyppy SEO. Three people that have been in this industry for a decade plus, worked with some of the biggest companies in the net, and also people that I read, people that have helped me understand about how search is going to change. And so that’s why I thought these three people would be great to be pulled into this discussion.

(00:02:26):
So to start off, some of you watching might not necessarily be familiar with the broad context that we have about the announcement Google made exactly what’s happening with AI in general, why it’s important that we understand how this new artificial intelligence works because it’s going to change the internet, and specifically where information comes from on the internet. And so I’m going to spend the first 5 or 10 minutes with a quick presentation. I put together some PowerPoint slides just to make it easier with visuals. For those listening, I will do my best to explain exactly what’s on the screen. But with that said, I’m going to rapid fire, sort of bring everybody up to speed here. So first off, starting this discussion. So yes, update or a news analysis was made on May 10th. Google talked about integrating AI into their native search experience. It was first talked about by the New York Times back in April under a different term called Project Magi.

(00:03:49):
So if you see different names for this, it’s because there is an undercover name that the New York Times brought about, but essentially Google’s now public about this new foray into using AI with search. They gave examples, and we’re going to go way more into some of the examples that Google gave. But one really quick example is if you were to search, what planet is most similar to Earth, this is what a search results page would look like in a new era of what they call generative AI. It’s a much richer experience, very different than maybe a snippet of text in a list of links. And so what we’re going to be talking about is exactly what we can glean from Google’s announcement, and then also bringing people up to speed with sort of how did we get to this point, which is important to understand.

(00:04:50):
Also, part of the reason we’re doing this, because a lot of you are mad, me included, I didn’t include an example of a screenshot that I tweeted pretty angrily non-thinking, Google’s treating the web kind of fairly. But essentially there’s a lot of people talking about this. And again, that’s why I sort of we’re doing this discussion. So quick background, Google’s been getting their feet wet with pulling information directly into search results without people ever leaving Google. The earliest examples I can find, and I’m sure the three people on this call, might have more accurate dates and timelines here. But I found examples as far back as 2011 of people typing in super basic facts that super common knowledge such as the population of the city. I found examples in 2012 to where if you Google a very famous thing, Google might pull in some, let’s say a description in this case of what the Taj Mahal is. Instead of people having to click through to a Wikipedia page, getting a quick little answer of what the heck the Taj Mahal is.

(00:05:59):
And from there, Google has more and more aggressively wanted to keep people on Google.com, not leaving their app, not leaving their website, getting the answers and the information they want, while also playing nice with the producers of that content. Because Google is a search engine, they are not the ultimate providers of information up until very recently. They are simply just pulling information from across the web. And SEOs and web publishers have had to play nice because they either help Google provide this information and hopefully get some of those clicks back to their website or they don’t play ball. Then not showing up in Google is kind of a big detrimental part to growing a business in the internet age. So anyways, Google’s been slowly creeping into, showing information directly in their search results pages, and that leads us to now.

(00:07:06):
Now, the term that Google’s using to explain this whole new big change is generative AI. That is a term that is their way of explaining things, but a term that you should be familiar with that is a more industry term is what’s known as a large language model. ChatGPT, if you’ve heard of that, you know what a large language model is. It’s the most popular one. It was the most pioneering one, but it’s important to know there’s a bunch out there. There are four different screenshots in the presentation that I just gave, but there are tons of them being built, and it’s important to understand exactly what they are and why they’re different. So early internet, you may have came across a chatbot, something that you can ask questions to and get answers to. 10 years ago, it was never that good. It misunderstood your question. It didn’t give you information you’re looking for. It felt like a robot. It didn’t feel like a human being.

(00:08:14):
But essentially new technology was developed. Ironically, it was Google that put out a paper in 2017 that talked about a new way of essentially learning, a new way for computers to learn just like brains. And that opened up a whole new technical revolution. So to think about how that work and what generative AI is doing is think about your brain. You’re learning information from a bunch of different places, and when you’re asked a question, you’re usually not pulling it from one file in your brain. Usually you’re taking all this information you learned, all these different experiences about all these different subjects. If you ask any long phrase question, you have to understand what each one of those words means. You need to understand what context those words are used. And so what this technology is doing, being able to consume large amounts of information and to understand it just like a human brain does.

(00:09:23):
And that is what’s leading to, again, the closest comparison I can give you is a much more sophisticated chatbot. So that [inaudible 00:09:35], and by all means, that was a super brief explanation. You should learn more about large language models and the revolution because it’s happening way more than in just text. It’s happening in images, it’s happening in all kinds of different content. Or essentially, artificial intelligence is taking large amounts of information, making sense of it, and then being able to provide answers to it, and being able to prompt it. But let’s look at live examples of what Google showed in their video. I went frame by frame in their video. They tried so hard to make it seem so cool, new age, and gen Z, and all the change in the world. But as SEOs, what we really want to know is what do these changes actually look like?

(00:10:19):
So for those, again that are just listening, unfortunately, it’s going to be hard to exactly see what these changes are, but I’ll explain it. The first example is just imagine asking an open-ended question like which planet is most similar to earth? And some really key differences in this screenshot that you might notice is instead of just a snippet of information that gives you an answer, they’re giving much richer information. They’re giving pictures of those planets. They’re making it very easy for you to continue your search by clicking on each of those different planets and learning more about them. Another example is if you Googled for what are good plants to use in a dark dorm room, it was just text. It was maybe the names of five different plants. In this new world, you’ve got pictures of each plants. You’ve got facts of each plants in a super formatted way, in a format that is exactly what you would look for in an article that just shows organized information very presentable and whatnot.

(00:11:34):
A silly example they threw in there just to make them feel like they’re not an evil company, is using generative AI to write a poem from scratch. So these AIs analyze the history of all poems written. They understand sort of what makes a poem and what doesn’t. And give you this subject, and in this case, it came up with a totally made up poem about somebody’s mischievous cat, and providing that information directly in Google. Other examples I found, I’ll skip ahead a little bit. I did make a list of every single possible example I found. The only interesting examples that I found where imagine if you are asking Google for a good lunch thought, you get narrowed it down to two different ones, and having a follow up prompt to Google where you can say, “Okay, those two restaurants sound good, but how does it compare to a third restaurant?” And being able to pull in information about a third restaurant, and being able to compare all of these things side by side with as simple as a text-based search. Pulling in photos of restaurants that reviews what people thought.

(00:12:49):
It’s just going to be a much more interactive experience where instead of just doing a Google search. You’re just doing a lot more. You’re doing a lot more on Google. You’ve got follow up questions. You’re interacting with information in a much greater way. One last example to look at before we get into our discussions that I think is super important is, Cyrus did a great job of pointing out that a lot of the examples that Google liked to show were shopping related. So an example that they used in their video that, to be honest, this is the one that blew me away, was somebody typing in Bluetooth speaker for a pool party, which is a specific search. In the past, if you Googled this, what Google would show you is a bunch of new sites, a bunch of editorial sites that considered a bunch of different Bluetooth speakers. They considered what would make a Bluetooth speaker better for a pool party than not, and then giving you their recommendations of which Bluetooth speakers meet that criteria being better or worse for a pool party.

(00:14:07):
Well, an example they give [inaudible 00:14:09] for you, they showed you exactly what are the things that you should be taking into account when it comes to choosing a Bluetooth speaker natively on Google search results pages, and then linking you directly to where to buy those specific speakers. Again, in the past, if you did this search, you would just maybe get a text list from a major publication that did these reviews that just showed you here’s their top five picks. In this case, they’re not just showing you their top five picks, they’re actually, Google’s giving you direct links to make those purchases. They’re giving you additional information about each of those products. They’re giving you the decision making information on how to choose directly in search results. And again, it’s something that we will talk more about that Cyrus has pointed out.

(00:15:05):
Where is this information coming from? There’s not citations, and that is essentially the biggest change to content in a world with generative AI. The same way that your brain works, it’s pulling information from a bunch of different places. It’s learning about what is a pool party, what makes a speaker bad or a good a pool party. It learns that it should be waterproof, it should learns that it should have a long battery life. There’s so many different experiences you have to have. Pieces of knowledge that you have to have. As an adult, those things come super innately. But in this case, as a neural network that is figuring this stuff out, it’s not figuring out all this information to present on a search results page from one place, which historically, with featured snippets, when you ask a question, you get a answer from a specific website that specifically answered it.

(00:16:08):
In this case, because it’s pulling information from a 10 of different places, there isn’t a source for this information. And so that is part of why publishers are worrying and why I think this is a great discussion to have. Last few slides, I guess, there’s a lab example of what it looks like to do a search for something product related, and showing how people are sort of staying on search results pages. But honestly, no need to go too much further into this. I’ve got more slides. I want to talk to you guys. So I’m going to stop sharing my screen. And I want to start with you, Cyrus. Based off the questions I sent you, I want to start with let’s just super broad, how serious the average SEO should be taking. I know I’ve inserted my opinions into my explanations here throughout these slides, but completely honest opinion as the average SEO-

Jon Cooper (00:17:03):
Completely honest opinion as the average SEO, as the average online business that takes organic traffic seriously as an important traffic source. How should they be taking serious or not these changes that Google’s seems to want to make at some point in the future?

Cyrus Shepard (00:17:20):
Thanks for the intro, Jon, and that presentation. I want to first say none of us know what’s going to happen. No, I don’t think anybody can predict. It’s such a fluid situation with so many moving parts. But from an emotional point of view, when I first saw ChatGPT released, I mean within 15 minutes, my initial thinking was oh crap. Within three years, SEO traffic is going to be shrunk by 50% across the entire industry. That was my initial gut reaction. And today, so many unknown factors have been thrown into that.

(00:17:57):
I don’t know if that’s that 50% number is true or what’s going to happen, but I think something significant is going to happen, especially to very certain verticals and industries. And it’s going to impact everybody and it’s not going to be all at once. It’s going to take a lot of time. These features aren’t available in Google right now, but it’s going to be a boiling frog situation. I think of it analogous to when Google introduced, not provided in keyword traffics for those longtime SEOs. And Google was like, this only affects 1% of queries. Don’t worry about it.

(00:18:27):
Don’t worry about it. And 1% turned into 5%, 10%, 20%, and now it’s 99.9% of everything. So it’s going to sneak up on us and we’re going to have time to adjust, but some businesses aren’t going to make it and it’s going to be the most significant thing that’s hit our industry in a long time, I think. That’s my prediction. I’d love to hear what others think.

AJ Kohn (00:18:51):
I’ll jump in real quick because I want to be sure whoever watches this understands. Large language models do not understand things. They do not understand any of these things. They’re not thinking machines. The best thing you could read is the WolframAlpha piece about what ChatGPT is and does. It is auto complete on steroids. That’s what it is. It just looks for what the next word is and it does it amazingly fast. I’m not saying this to say that it isn’t great technology, but I’m very cognizant of the Arthur C. Clark quote, which is that anything which is sufficiently advanced to technology will seem like magic.

(00:19:40):
And I think too many people think that this is magical and that it’s, oh my gosh, and if I ask it to reconsider, it actually thinks about it and reconsiders it. No, it doesn’t. Doesn’t do any of that. So I want to be sure that there’s this understanding that super cool technology. I don’t think anybody in the AI community thinks that this will lead to AGI, which is artificial general intelligence. This is not the way forward to that. If you track anybody particularly on Mastodon and track this, that’s not where this is going.

(00:20:15):
There are other things that could lead to it, but this is not one of them is my sense. That is completely divorced from the impact that this can have. And yes, obviously you put a whole bunch of this at the top, you’re going to have a bunch. I think Rand hit it pretty well, in my opinion, in saying the rich are going to get richer in many respects and that ranking well organically is probably your best shot of appearing in the SGE part of the results. And so I don’t know, I picture this as SEOs will be in even more demand because it’s not going to be enough to rank sixth.

(00:21:01):
Because at that point, six is being on page two. So you better rank either in the SGE or at the top of the organic results to really get any amount of traffic. So I think that’s certainly a big difference. I am though, and I don’t know if we want to go there. I’m not convinced that these are going to be available for that much longer. There’s a lot of legal headwinds. Steve Huffman has already said, you want to use Reddit, going to have to pony up and pay for that content. I work with other clients. I know they are saying the same thing.

(00:21:45):
And so I actually am not sure that the large catchy PT models and the T4s and the C4s models out there are going to persist. I actually find more interest in the Google IO around Project Tailwind, which if you were paying attention was actually really like a micro LLM. That to me is more interesting. And I think if people understood, and actually there’s a Luke W. I don’t know if anybody’s familiar with Luke W. He’s a famous UX designer, used to work at Google. But he actually had a post where he trained an LLM on his own content, thousands of posts, thousands of all these things.

(00:22:34):
And then you could actually ask questions of his corpus and it would answer based on his corpus of content. That to me, I think is where a lot of this will go is people are going to figure out, hey, hugging face has a model I can just download and then I can use that and I can use my own corpus and deliver these answers. If that’s the case, more and more people will basically say, you can’t use my content in your large language models, and it kind of falls apart. So I’m not saying it’s over, but I see, I don’t know where it will lead.

(00:23:10):
But I would say OpenAI made it really clear that the deal people had with Google was actually better than they thought, which is pretty amazing. Few people can make Google look good to a lot of people. But Google was like, hey, we’ll scrape your content and we’ll use it in the index and in return we’ll send you some free traffic. And people went, okay, I guess that’s good. But then OpenAI, it is like ChatGPT is like, oh, we’ll scrape all your content, we’ll munch it all together and homogenize it, and then we’re not going to tell anybody that it’s yours and send them to you.

(00:23:48):
And people were like, that seems less good to me now. And so because of that, I don’t think that the economic forces are aligned for this to be a sustainable long-term endeavor. So I’m interested in it all, but I’m reserving my sort of sky is falling predictions until I see how sort of the landscape sort of clears. I’m more interested in unlocking the content. And I talked to Jon about this offline. There is content that we literally couldn’t produce and optimize for, but with these types of generative AI models, we actually could bring some of those to search results.

(00:24:41):
So I think the one that I like the most is school board meeting minutes. No journalists, no paper can afford to send a journalist to cover school board meetings and to report on them. There’s still a lot of interest in it, but if you can actually throw a large language model at that and say like, hey, write a story based on the meeting minutes for the school board, that’s pretty cool. Then suddenly there’s lots of new capabilities and lots of new niches. And if you’re an SEO, it’s all about finding the niches where people aren’t.

(00:25:15):
Where are people not? That’s where I can make my money. So I see lots of danger, but I also see lots of opportunity.

Jon Cooper (00:25:25):
Gotcha. Marie, what do you think?

Marie Haynes (00:25:29):
I think that the search generative experience is something that, I mean, yes, Google says that it’s an experiment at this point. And so there’s two ways that it could go. One is that it remains an experiment and we say, well, that’s kind of cool that AI can do that thing. And maybe people don’t like it. When Bard first came out, and it still, it’s not great for a lot of things, and I thought it was going to blow me away more than it did. And I think people are underestimating what Google can accomplish with the SGE.

(00:26:04):
So in one sense, it’s possible that this remains an experiment and it doesn’t impact websites, and we continue to go to websites the way that we always have. I think the second scenario is more likely though, and I don’t want to say that I’m in the sky as falling category, but I think that things are changing dramatically and they’re going to change really quickly. Sundar Pichai from Google said that he believes that AI technology, what we’re seeing happen today with large language models, with the things that we’re doing with AI is more profound than electricity or fire.

(00:26:42):
That’s like a pretty big statement. I don’t think that’s something that he would say lightly. So now it’s possible maybe that’s just marketing and maybe Google’s trying to make it sound like they’ve got more in store than they really do. But I do believe that what we saw announced that IO is the start of world changing things. So we can talk about SEO. SEO will still exist because businesses exist and businesses need to be found. The problem is that a lot of the businesses that exist today only exist because of the opportunity that Google gave us.

(00:27:17):
So for many, many years, a lot of businesses that I consult with create content that answers questions for people, and Google gave us that opportunity to create content and then they would send us traffic. And as we’ve seen over the years, they’ve sent us less and less traffic. Well, Google’s mission has always been to organize the world’s information and make it accessible to everybody. That’s their mission. And what they’re doing with AI today is not like I hear people saying, oh, Bing forced them to do this.

(00:27:57):
It wasn’t in the plans. And now Google’s coming, scraping together a plan. It is very interesting that you said it in your presentation, Jon, that you first started seeing Google provide these answers in 2011. So Panda is what started … The Panda algorithm started in 2011, early 2011. And when you look at the blog post that Google gave us to announce that Panda was coming out, we all took it as, oh, this is about thin and duplicate content. But they gave us this list of questions, which are essentially the helpful content questions today.

(00:28:31):
And they told us that we build algorithms to try to reward content that fits these criteria. And at first, when that happened, SEOs, we couldn’t grasp. Well, we understood information retrieval, we understood page rank, we understand many ways that a rules-based algorithm works, but we have no concept of the time of what deep learning could do in this. And so I don’t know how much we want to get into here, and a lot of this is theory, just like Cyrus said, we’re all guessing. We’re all taking our best guess.

Jon Cooper (00:29:04):
Sure.

Marie Haynes (00:29:05):
But what I see is that Google is successfully rewarding content that aligns with what they’ve told us, and they’re using AI to do that. And this is just the logical next step that Google provides information. And a lot of the websites that currently provide information are not going to be needed. And that’s huge. And I think that a lot of our industry has been, and some rightfully so, critical about AI and about ChatGPT and about Bard and things that large language models make all sorts of mistakes and do some things that are not good.

(00:29:44):
But I think we’re doing our clients and our industry a disservice to not be finding all the things that it can do-

Jon Cooper (00:29:51):
Sure.

Marie Haynes (00:29:51):
… Because it’s incredibly powerful. So I personally believe that this is not just an experiment from Google, that we’re going to see them answering questions. Pretty much any informational question can be answered by AI. And in the next model that they talked about at IO with Gemini, that’s going to involve also generative Google actually creating images and video and all sorts of things that I think we can’t even comprehend. I think I’ll finish with this that I’ve been telling clients that it’s me making predictions is sort of like a blacksmith trying to describe how to merge onto the freeway. I don’t think we can comprehend the things that are coming in store, especially with the tool set that Google’s giving us to use AI in our businesses as well. So I’ll stop there because I could babble on forever, but I do think this is a big change.

Jon Cooper (00:30:46):
No, this is awesome. I’m also, can I pay you, Cyrus and AJ, to not have to leave in 20 minutes? Because I would kill to have you guys, I would have so many questions for you. We can condense it, but I’m putting in the back of your mind. If you-

AJ Kohn (00:31:03):
I’m fine.

Jon Cooper (00:31:05):
… Can somehow get out of that, I’ll pay you money.

Cyrus Shepard (00:31:07):
Let me send a message.

Jon Cooper (00:31:09):
Okay.

Cyrus Shepard (00:31:10):
I’ll see what I can do.

Jon Cooper (00:31:12):
We’re scraping the surface, and there are so many implications. Each one of you talked about entirely different facets that I want to explore. And so I’ll definitely prioritize the ones that have to go first. But I guess, so the first thing I want to clarify. So, AJ, you mentioned regulation and that there’s a big potential roadblock ahead with the law surrounding the training data used to generate these models. And trust me, every conversation that I’ve had with you, every other conversation that I have behind closed doors, I mean, we’re all rooting for system to come in here.

(00:32:04):
And especially for summer industry, like graphic designers, I don’t know if you guys have seen the stories about just graphic designers just being absolutely laid off in bulk at some of these really tough industries because Stable Diffusion and Midjourney are already good enough that they just don’t need nearly as many graphic designers. And yet it’s training data that is literally used based off of their previous work. So listen, regulation is this big thing. What I want this conversation though, in terms of what I want to explore with you guys is I want to play through a scenario where there is no savior.

(00:32:43):
No savior comes or regulation comes, but it takes a while to play out because I think we could see class actions. I think we could see lots of internet publishers that come in and say, hey, this is not cool. We think this is illegal. What you’re doing with generative AI and ChatGPT, you’re training. So for context, I have block Common Crawl from all my websites now. They are used to train ChatGPT and some other different models. I’ve formally reached out to Common Crawl to remove my websites from their previous datasets. They didn’t budge and they don’t care.

(00:33:27):
They’re not doing anything. I’ve looked into removing my websites from the C4 dataset, which is what Google uses. No luck. Okay. So I’ve made those attempts. But I want to have conversations where, let’s say there isn’t a savior here from the legal system. So with that in mind that regulations aren’t coming or let’s pretend that they don’t come based off of the other questions that I had pulled up for you guys. But I guess, yeah, I guess my next question is-

Jon Cooper (00:34:03):
… pulled up for you guys. But I guess, yeah, I guess my next question is, AJ, I guess looking down the road years from now where, let’s say, Google greatly expands these generative AI searches, who are the biggest winners, and who are the biggest losers here?

AJ Kohn (00:34:23):
I mean, I think it depends on how they cite things and how they’re doing that. I can’t imagine that they’re going to cite the smaller publications. It’s just not how I see it working. I also, just to be clear, I don’t think there’s a governmental regulation solution, at least here in the US. The European Union might do something. They got more teeth to try and do stuff, but I don’t think Chuck Grassley, who still thinks that the internet is made of tubes or something, is going to get something done. So I don’t think so.

(00:35:02):
What I think will happen is people will yell loud enough at Common Crawl or at Google and the C4 dataset and say, “Yeah, if you don’t do this, we’re just going to sue you.” And that’s how things get done in the US.

Jon Cooper (00:35:19):
Sure.

AJ Kohn (00:35:20):
Enough people with enough clout sue enough people. If we want to talk about “not provided.” That’s a great one. That’s a woman in Texas who sued Google. That’s why we have “not provided.”

Jon Cooper (00:35:31):
But how long did that take? How long did that take?

AJ Kohn (00:35:34):
Took a long time. Took a long time. Took about three years for that to go through the process before Google was finally like, “Fine. We’ll just hide your damn queries from people.” And that was it. But that was the story.

(00:35:48):
So all that said, at least for the time being, yeah, the rich are going to get richer from what I can tell. I think if you are doing well, if you’re a highly regarded source for this content, you’re going to continue to do well. So I think that’s the interesting part. I don’t know how much, again, I’m more interested in the business alignment. There’s going to come a time where Google makes money by not just having ads on search results, but sending them to sites which also have, oh, ads on them that they control as well.

Jon Cooper (00:36:31):
Bingo.

AJ Kohn (00:36:32):
And they were very clear at the beginning of this whole episode that they support an open ecosystem of publishers, and it was like, “Well, no duh, because that’s how you make a lot of your money.” If the SGE results do not send enough traffic offsite to make them money, they’re not going to deploy them largely over the corpus of queries. That would be detrimental to their business.

(00:37:04):
So as much as the search team, and I chat with a lot of them, they’re doing God’s work in their version. They are about doing the right thing, about satisfying users, about getting them to the right information. But there’s somebody on top who’s always thinking, “Yeah, but we can’t do that that way because we we lose money, and that’s not good.” So I think there’s a give and take here, and I think that’s part of the experience that they have to figure out when they’re rolling this out and seeing how people interact with them.

(00:37:52):
I will say, even though there is ample evidence that you put something in front of people that the clicks go towards that area. I’m always surprised at how many clicks go to knowledge panel sort of sections for those of you in food ordering or get a book or all those. I have clients in both of those. It’s pretty amazing how much traffic goes there.

(00:38:19):
That said, I make a fair amount by targeting instant answers queries, which I know for a lot of folks are like, “Why would you target instant answer queries?” I’m not going to tell you how I do it, because that would be bad, but people aren’t going to stop just because they get the first answer. And a lot of that has to do with the way people compile information, which is called a berry picking. So the way that it was described to me from search professionals, hardcore developing search methods and all that sort of stuff, was when you go to get raspberries, you have to look at it, okay, is that ripe or not? And you pick it, and that’s how you fill your basket. You don’t just walk up to the tree and shake it and have everything come down and be done.

(00:39:23):
And so do I think it will change a lot of this? Yes. But I think people are still willing to say, “I need to see more. I need to see more.” And if you work in the health vertical, you see this all the time. People triangulate. “Okay, that’s what WebMD says. Now what does Healthline say? Okay, what does [inaudible 00:39:46]? Okay, what does the NIH say? Okay, [inaudible 00:39:48]?” And it’s like you’re going to check four, five sites until your paranoia finally goes away.

Jon Cooper (00:39:54):
Google said they’re not doing your money, the YMYL stuff.

AJ Kohn (00:40:00):
Yeah/

Marie Haynes (00:40:00):
No, no, no. That got misinterpreted.

AJ Kohn (00:40:03):
Oh, really?

Marie Haynes (00:40:04):
Well, The Verge published an article saying, and that’s what everybody quoted, saying that Google would not generate the answer for YMYL topics. But there’s a PDF that Google published to describe all about SGE, and it says it will be more selective in deciding when to do it. We should probably get the exact wording. You know what? I can find it for you because it’s important. You know what? It’ll take me a second. But it did not say that they were going to exclude YMYL. So let me find it.

Jon Cooper (00:40:37):
Okay. You look that up. Cyrus.

Cyrus Shepard (00:40:39):
I would be highly surprised if Google did exclude those queries because we’ve seen them. Any health query you do now, they have the little conditions summary in the side. I think chatGTP is actually better than at Google at summarizing health information, even though it’s prone to hallucination. It’s just easier to read and summarize vast amounts of data that people are looking for. I’d be very surprised if they didn’t. I think it’s a superior experience.

Marie Haynes (00:41:10):
Good.

AJ Kohn (00:41:10):
One thing I do find interesting is, in all of this, there’s some research on it, but there are interesting ways to poison these things as well. And I think that’s a vector that you will see rise if this gains more traction, that great, I’ll spin up, I mean, we’re used to private blog networks and splogs and whatever. Well, now you can create these things that are literally there for like, “Hey, Common Crawl. Come and crawl this.” And now when it says, what’s the best Bluetooth speaker? It’s my speaker, damn it. And so if it’s munged together, suddenly, you unsettle it. So there’s interesting-

Jon Cooper (00:41:56):
That’s your new profession.

AJ Kohn (00:41:57):
… that you could do.

Jon Cooper (00:41:58):
That’s your new profession.

AJ Kohn (00:41:58):
Absolutely.

Jon Cooper (00:41:58):
SEOs become…

AJ Kohn (00:42:02):
LLM poisoners. Yeah.

Jon Cooper (00:42:05):
Eos, or I don’t know. Yeah. Holy shit.

AJ Kohn (00:42:08):
Yeah, so interesting-

Jon Cooper (00:42:10):
That’s a good point.

AJ Kohn (00:42:11):
Interesting ways. And they’ve already shown how it can happen. There’s some interesting papers on it, but that’s the other one.

(00:42:20):
Also, there’s a privacy issue with a lot of these things because you actually can, there are attack vectors where you can actually figure out where this stuff comes from and dig some stuff out of it. That’s also, again, one of the other things that people are dealing with.

Jon Cooper (00:42:37):
Awesome.

AJ Kohn (00:42:37):
So lots of fun stuff to talk about.

Jon Cooper (00:42:39):
So Cyrus, thank you so much for pushing your call because I want to hear more from you. I want to come back to a question, which is, what sites do you think will be the biggest winners and biggest losers coming out of these kinds of changes if they get rolled out? Obviously, we’ve talked about all kinds of issues with, it’s never going to be as smooth as we think it is. It might not happen quickly. But if you are an online business owner, and you’re looking at these things, which ones should be the most concerned as being on the wrong side of this versus being in the clear?

Cyrus Shepard (00:43:22):
I have a strong opinion, so I will speak up.

Jon Cooper (00:43:25):
Yes, good.

Cyrus Shepard (00:43:27):
And the stronger my opinion, the more wrong I usually am. So we can talk about that. I liked your premise, John, of no saviors coming to the rescue, because I, just to touch on that point, I think one of the biggest risks… I was really impressed with Google I/O. I thought they took a semi-responsible approach to this generative search experience. I think one of the risks is that Google loses market share to someone who isn’t as responsible. If they lose their Apple deal, or Nokia phones don’t want to put Google search, I think they’re thinking very hard about these things.

(00:44:04):
But going specifically to your question, I think if people are looking for very simple answers, this might be an oversimplification. I think the two biggest areas where we’re going to see people Google generative search experience appear are any query right now that generates a featured snippet, because these are ripe to put AI answers on.

(00:44:29):
I think Google might hesitate to put generative search experience on very ad-heavy queries right now because they don’t want to cannibalize their ad business, even though they’re showing ads in their screenshots and their presentations.

(00:44:45):
But going back to your opening example, best Bluetooth speaker, I think affiliate sites that generate reviews and links to various content, I think those are the biggest at risk. And I think there’s a big reason for that is those experiences right now suck. Google created this expectation during all of the search to this traffic, and it, affiliate content right now is kind of a mess, and it’s the number one thing people complain about on Hacker News, on Reddit. Review sites are terrible. And they can, instead of looking at… The truth is you go to most review sites, they’re just using the affiliate links that have the biggest payout or the top five. They just do lazy research. They’re listening to top five products on Amazon. Google can create an experience now where they’re collating all the mentions of a product on Reddit, on Hacker News, on forum sites. And they can provide such a superior experience for reviews and directly monetize it. I think review sites are most in trouble.

(00:45:52):
Then you have a chicken and egg problem. Who’s going to produce that content? Who’s going to do the reviews? I don’t know, but that’s my simplification.

Jon Cooper (00:46:02):
I do know. I do know, and I just, so last night preparing for this because I was so excited to talk to you guys, and I just want to have my shit together when I was talking in public about it. But I was thinking about, you literally, one of my clear takeaways is those affiliate review sites are, lack for a better word, I think they’re fucked. I think their days are numbered.

(00:46:25):
And in terms of how do you replace authentic reviews, there is already plenty of people that give reviews that aren’t expecting anything out of it. Amazon reviews, anybody’s leaving a review anywhere on Amazon, on Google Local, on Trip Advisor. There’s already plenty of people that are giving authentic experiences about whether they liked or didn’t like something.

(00:46:56):
And you know what’s an amazing piece of technology to analyze large amounts of text and pull out answers from it? Large language models. So imagine a scenario where you’ve got 2000 user reviews of a Bluetooth speaker, and imagine Google had a deal in place with Amazon to be able to have that data. They can then analyze these 2000 reviews to figure out what are the most common issues people have with this product? What are the reasons people like this product the most? What are the types of users these are best for, that these are not best for?

(00:47:39):
That’s what affiliate review sites are doing. If you look at any search for the best dog collar, you’ve got all these affiliate review sites that haven’t tried the product, and their general format of information the user seems to want are pros and cons, deciding factors, and who is this best for? What’s the best within this budget? All of that information is sitting in these large user data sets from people who don’t give a shit about consenting for it to be used or being compensated for it. And so I just think all of that data is there, or all that text databases are there to be used to create better, more authentic experiences in the exact way that current affiliate sites are doing. So I don’t even think it’s going to be hard to replace them in this new future.

(00:48:31):
And I think you hit the nail on the head that for anything commerce related, review related, that’s going to be one of the clearest, just not immediately. It never happens immediately. But if you own a product review site, and you don’t have, let’s… If you’re the Wirecutter, listen, you’ve got name recognition. People trust you. They know you do your homework. You buy things. You try things. I think if you are a review site that has built a reputation, that’s one thing. And I think also, if you’re an influencer that people come to because they trust, and they’re like, “I trust Rand Fishkin. I know what he’s talking about. So if he recommends this product, I trust that.” But I think these general non-brand recognition review sites on the internet are just, their days are numbered. And that’s just one of the clear takeaways.

(00:49:28):
So didn’t mean to interrupt you. I just wanted, I’m still on the same wavelength as you on that particular group. Are there other winners and losers you’ve thought about in this new generative AI world?

Cyrus Shepard (00:49:45):
No, I’d like to hear what the others think, winners and losers. Marie, who’s going to win? Who’s going to lose?

Marie Haynes (00:49:54):
Who’s going to win? Who’s going to win is companies with data. So we have this talk about privacy and about restricting our information from being crawled and perhaps being used to train language models. But what about the opposite? So Google talked in their I/O presentation about privacy and about, it really sounds like they said that we’d be able to train models on our own, on our websites. I think AJ was talking about Project Tailwind. That sounds to me like you’ll be able to train an LLM on your Google Docs. Now, AJ, is that how you interpreted that?

AJ Kohn (00:50:36):
That is how I interpret it. They didn’t cast it in that light. They made a pitch for more educational purposes. But it’s a pretty easy pivot to just sort of say, well, why not just link it all to my Google Docs-

Marie Haynes (00:50:50):
Exactly.

AJ Kohn (00:50:51):
… that are on this topic, and suddenly I can ask it any question I want.

Marie Haynes (00:50:55):
So picture my own business. I, for 15 years, have been documenting everything I can about-

(00:51:03):
… For 15 years have been documenting everything I can about what Google said, what Google search said, what changed in Google’s documentation. Like everything. And I’m not sure why I do it. I mean, obviously I enjoy it and it makes me money, but obsessively the number of Google Docs I have, I’ve paid people the… It’s insane. Imagine that I can take that, and now when people come to me for help with assessing a traffic drop, assessing what the heck is happening with AI, when SGE comes out, I can only talk to so many people. And what happens to somebody who is seen as a professional, is that you are limited and you either have to see fewer clients or you charge significantly more. And my goal would be to help as many people as I can.

(00:51:52):
Well, what if I have a language model that you’re essentially dialoguing with… Not me, but all of the data that I’ve collected. All of my, here’s set of advice for this situation. Here’s the advice I’ve given to other clients on this situation and putting it all together where the questions that come to me like, “Oh my goodness, my traffic dropped starting on November 15th last year, and I can’t figure out why.” Well, my LLM could answer all of that, combined with the knowledge in the knowledge graph, grounded by that knowledge. So to me, I see this huge opportunity. Now I’m just an individual, but I think there’s two ways that I can go with this excitement. One is for other businesses. So say you are a large business that has tons of financial data that you’re collating and you’re doing all sorts of things with. I think there’s tons of stuff that you could do in allowing people paid access to that data in certain ways.

(00:52:56):
But also for individuals. So we talk about, I do agree that if you have an affiliate website where all you’re doing is collating information and saying like, “Oh, these reviews say this.” And stuff that could be answered by an AI generated answer, those sites are not going to last for very long. But there’s a reason why Google’s pushing experience, and they told us just recently that an upcoming update to the helpful content system is going to be rewarding personal experience. And so there’s certain things that can be told to you by information, but then there will also be people who seek out that blogger who has used the product and is like, “Here I am using it right now. Here’s actual firsthand experience.” I believe those are going to be the types of websites that appear next to the AI generated answer. And the opportunities for content creation are huge.

(00:53:49):
Now people would say, “Well why would anybody create content? Because how am I going to make money from it?” Nobody’s paying attention to, Google just announced the Reader Revenue program, and I mean, it’s been around for a while, but now it’s open not just for news sites but for-

Jon Cooper (00:54:03):
The what program?

Marie Haynes (00:54:04):
It’s Reader Revenue Manager. And I got an email a few weeks ago saying that it’s open to… I don’t know if bloggers is the right word, but to individuals. And it’s super easy. You set it up on your website and then you can charge for subscriptions. And so I could very easily create a folder on my website and say, look, I’m going to put all the new information I learned about AI, all the stuff that I’ve been advising clients, all the strategy that I’m allowed to share with people. I could put that in that folder and very easily with Google Reader Revenue Manager, and it’s also Subscribed with Google is the program that works. I can get people to subscribe to that.

(00:54:45):
Now, I think I’m seeing way ahead because I don’t think it’s quite ready yet, and I can’t actually use it for what I want to do right now. But I see the day where imagine that I’ve got my LLM that’s trained on my Google Docs and all of the advice that I’ve not only given over the years, but I continue to give, and it’s continually updating, I don’t know if that capability will be there. And then I can charge access to that, of which I believe Reader Revenue Manager, I think Google takes 5%. So to me, that’s the new business model because people will stop going to websites if the AI generated answer gives them the information that they’re looking for. And the only reason they’ll go to websites is when people have real world experience that they want to share with people.

(00:55:32):
So that’s why experience is so important to Google. And so I think we’re going to see a massive shift from, not people who know how to do SEO and know how to create content that are succeeding, but instead the people who are succeeding are the ones who actually have the skills to share. And it cuts away all of the… When I publish my newsletter, the red tape that I have to go through, the WordPress stuff and the like, oh Stripe’s not working and all of these things. I think Google’s going to strip away all of that over the next few years and make it so that content creation is actually what it’s supposed to be.

(00:56:12):
Not let’s create content to see what we can make the most money on by gaming Google’s algorithms. But instead, let’s actually create content that people are searching out and seeking beyond the AI generated answer. And then people are going to profit from that. So that’s what I see. I’m grieved over the… I think there’s going to be a lot of job loss as we see this shift. But I also see so much opportunity for making people’s lives better and for making money as well in this new system.

Jon Cooper (00:56:44):
Totally. So just to jump in, we’ve got 10 minutes left. I got one question that I’m going to ask each of Cyrus and AJ. And I just respect your guys’ time here. So Cyrus answer first if you want to, if AJ, you want to jump in by all means. But one of the questions I put in the question list I sent over is, as SEOs and as online businesses that let’s say have heavily invested in content marketing in the past, how do you think about investing in content marketing in the future, in the medium to long term?

Cyrus Shepard (00:57:28):
Yeah, I want to jump in because I want to piggyback off something Marie said, which is companies with data. And John, you came up with an example of graphic designers getting decimated across certain industries. That’s really relevant to me because my wife, who’s in the next room working her job, she’s a graphic designer. But she makes maps, she makes highly specialized maps, and her job is not at risk because it’s going to be years before AI could do what she does, I think. And so going back and we mentioned Rand Fishkin, his classic advice, if you want to succeed in online marketing, do things that don’t scale. And I think those are always going to be the winners in any sort of online marketing. And that’s not producing content that anybody else can produce, but producing content that’s hard to produce. With unique perspectives, unique data, whether it’s tools, images, videos.

(00:58:27):
We saw a lot of videos linked to in these screenshots by Google. Sure they go to YouTube, they go to other Google owned properties. And I think that’s something we haven’t talked about, that a lot of the links go to Google owned properties. But if I were a business right now, there’s two ways to think about it. One, there’s a group of people right now that are using AI to generate as much content as they possibly can, and it’s cheap, and it’s not the greatest content in the world, but they want to take advantage of whatever gold rush they can before the tide starts to shift. But long term, I think you’re investing in things that are very hard, that are unique to your business, your unique knowledge, and doing things that cannot be produced by ChatGTP or generative search, because those are things that are going to win long run, are going to earn links, attention and market share. And that’s my five minute nutshell, things that don’t scale.

Jon Cooper (00:59:23):
Oh yeah. Awesome.

AJ Kohn (00:59:24):
Yeah, I’ll piggyback on that as well. I always tell people, most people are allergic to work, and if you are not allergic to work, you will usually win. That’s just basically how it works, which is just sort of another… That’s another kind of a twist on what Rand said, do the thing that doesn’t scale because no one wants to do the things that don’t scale, right? It’s like, oof, that’s work. I’m allergic to work. I don’t want to do that, right? I mean, there are times when I talk to clients and I’m like, they’re like, “Oh, well, we’d have to do this over 20,000 products.” I was like “Uh-huh, what’s the problem? Do it.” Do you not have someone who can just make their eyeballs bleed for two weeks to update all these things? It’s not tough. It’s not fun sometimes, but it’s not tough.

(01:00:21):
So I agree. My advice generally has been, don’t use generative AI to write article content on stuff. I just don’t think that’s a great use of it. Of “Hey, write an article about the best tips on international SEO.” It’s just going to be garbage at the end of the day. It’s not going to do that much. I mean, there were people who were doing stuff, interesting stuff about like, “Hey, take the S1s and write a story based on the S1.” That’s more interesting. That’s stuff where it’s like, all right, that’s a dense freaking document. I mean, Bill Slosky, may he rest and peace. We don’t have somebody who’s looking at Google patents, but okay, can we have articles written about Google patents based on this? Might be interesting, right? No one else is doing it now.

Cyrus Shepard (01:01:23):
That’s a fantastic idea.

Marie Haynes (01:01:25):
It is yeah, I like it.

AJ Kohn (01:01:28):
And just have it spit it out, right? New patent file, here it is. So I think there are interesting methods. I’ll use it all day long to write alt text, to write meta descriptions. Go for it. It’s things that I don’t care that much about, that don’t scale well, that I don’t think you should invest the time in. I mean, Azure is a favorite of clients, does a great job with alt text, and that’s not a large language model, but it’s another type of AI and they work fantastically. So I think anybody who’s willing to implement these technologies the right way, I do like the data. If you have data to use and to do that, or if you can find the data sources that no one else is using. I mean, think about all of… I know John knows a little bit about this.

(01:02:29):
All the government feeds about data and the stuff that they have that are kind of not particularly accessible. Well drag it down and point it out while I’m at it and suddenly you’ve got stuff. So I think there’s lots of creative uses for it. But the way that most people are using it is like, let me vomit up 20,000 pages of aggregated stuff of this, that, and the other thing that it looks like everything else. And so I guess what I’m saying is, I think most people are viewing this as article spinning 2.0 and that’s what they’re using it for. Instead of finding a different, how to unlock different content and then bring it to people and find those niches.

(01:03:25):
All of my side hustles and sites are built on very niche, long tail queries that even when I found them, I was like “Really? There’s that much volume here?” Yep, there is. People search for this stuff and not many people will optimize for it, right?

Jon Cooper (01:03:45):
Yeah.

AJ Kohn (01:03:46):
So I think there’s a world out there-

Jon Cooper (01:03:50):
Totally.

AJ Kohn (01:03:51):
Of stuff to do. And you know what I’m most interested in, and that I have no idea is, I tell clients all search is going more long tail, right?

Jon Cooper (01:04:07):
Yeah.

AJ Kohn (01:04:08):
I’ve never been someone, because of the clients that I work with, I don’t chase head terms or even torso terms. It’s always long tail, right? Let me go to the long tail, that’s where I get all my traffic. So I’m most interested in like, well, how much are they going to try and hit the long tail? And does this just extend the long tail even more? Instead of going for seven word queries, do I need to go for the nine word queries now? I don’t know., but-

Jon Cooper (01:04:34):
Yeah, definitely.

AJ Kohn (01:04:40):
I think it will… Unfortunately, I think the last thing that I would say it, because we have a really bad dynamic in this industry right now of the people who need our services, the people who need our services versus the people who give the services. Service providers like us have way too much power and the people who need it don’t, and this loop that we’re talking about is going to make that much worse. Because I just don’t think that the normal client, the normal provider, the people looking for these services and the number of people who are going to be able to implement something ethically or well done is going to shrink. So more demand, less good providers, it’s going to be rough.

Jon Cooper (01:05:30):
Yes, awesome. Thank you all three of you so much for joining me, for doing this, totally out of the blue on totally last second. Y’all are so busy and I appreciate your time. I am looking forward to seeing the discussion that comes from this video, and I hope I get a chance to have these kinds of discussions. I think there’s so many more things for us to cover, but thank you all for joining me and if you’re listening, thanks. And hope you have a good rest of your day.