Should You Block ChatGPT from Crawling Your Site?

Aug 17, 2023featured, It Depends - An SEO Podcast, SEO

One of the many recent AI developments is the ability to block ChatGPT  from crawling your website using a site’s robots.txt but this opens the door for countless questions like:

  • How do I stop ChatGPT from crawling my site using robots.txt?
  • Should I block ChatGPT from crawling my site?
  • What are the benefits of letting ChatGPT crawl my site? What are the risks?
  • What are the SEO Impacts of ChatGPT crawling my site and using my information?

Check out our podcast, It Depends – An SEO Podcast, to discover our thoughts and conversation around blocking ChatGPT. 

What is ChatGPT?

ChatGPT is OpenAI’s natural language processing tool driven by AI technology that allows users to have “human-like” conversations with the chat bot. ChatGPT has around for six or seven years at this point, but really we’re talking about the version that has like taken the world by storm in the last 12 months where there’s a simple web interface that you can type something and have a continuous conversation. 

How do I stop ChatGPT from crawling my site using robots.txt?

According to OpenAI’s site, you can block ChatGPT from crawling your site using a simple robots.txt directive. 

how to block chatgpt using robots.txt

Should I block ChatGPT from crawling my site?

 Short answer, probably not. 

The reason it’s not a straightforward yes or no is because nobody really knows for sure what will happen if you block ChatGPT from your site. The whole idea of blocking these crawlers is that some websites aren’t thrilled about ChatGPT using their content to build its language skills; however, just because you block the ChatGPT crawlers doesn’t mean that they won’t retain what they have already learned from your site.

There is also a lot of potential risk in reducing ChatGPT’s ability to crawl your site. People are using this platform to ask questions, looks for products and more – you don’t want to exclude your site from being shown in conversations that are relevant to your brand.

 

Need Technical SEO Support?

Are you struggling to understnad the ever-evolving world of tehnical SEO? Reach out to us to learn more about how we can support your business or your agency.

What Sites are Blocking ChatGPT?

As of October 2023, we are starting to see more and more sites block ChatGPT from crawling it’s content. Ecommerce sites like Amazon has blocked the crawler as well as major news publishers like The New York Times and CNN. 

Full Podcast Transcript - Should I Block ChatGPT use Robots.txt?

Jay:
hey Lindsey.

Lindsie:
Hey Jay, how’s it going?

Jay:
It’s going alright, how are you?

Lindsie:
Oh, I am just fine. Should we just jump right into it? I know we have a lot to talk about.

Jay:
What are we talking about?

Lindsie:
ChatGPT. I mean, we’re gonna talk about chat GPT because that’s all that people talk about now, right?

Jay:
I wasn’t ready for this. I thought that we were gonna talk about something else, but okay, what are we talking about with ChatGPT?

Lindsie:
Okay, chat GPT, I think everybody at this point, any marketer knows what it is, probably tested, tried it. The newest thing that we’re hearing is that you can actually block chat GPT or OpenAI from crawling your site. And the conversation becomes. Should a marketer do that? Should a webmaster do that? What is the benefit of it? So tell me a little bit more about what you’ve heard and read about using robots.txt to block chat GPT from crawling a site.

Jay:
Okay. Well, first let’s, uh, let’s set some background that like chat GPT. I mean, I think they’ve been around for six or seven years at this point, but really we’re talking about the version that has like taken the world by storm where there’s a simple web interface that you can just like type things into. And it, it talks back to you.

Lindsie:
Almost like a human.

Jay:
So we’ve got like a few months now of really being in this world. It’s evolving all the time. Google has probably just like blindly launched five new AI features, uh, while we’ve been having this discussion.

Lindsie:
Oh my god.

Jay:
So to, to like spoil this a bit, the idea that like anybody knows whether or not you should block chat GPT from crawling through your site is kind of insane. Uh, we don’t know. I think a lot of this comes from the idea that there are websites that are pretty unhappy about chat GPT that like Reddit’s a really popular one that, you know, it’s one of the biggest sites on the internet, just tons of original content is created by its user base every second

Lindsie:
Right.

Jay:
Chat GPT spent a lot of time digesting the way Redditors talk. in order to learn and develop its language model. So Reddit feels like they should get some sort of credit for the development of these large language models. And that probably means money.

Jay:
That’s led to a lot of people just feeling like we’ve been ripped off, you know, like our intellectual property has been stolen by this tool and they’re mad. And so now there’s a way to block it from crawling your site. So people are jumping on board or theoretically jumping on board with doing that. Problem is it’s already crawled your site or it’s already crawled a lot of the internet.

Lindsie:
Right, well and I think that like the conversation too of like, okay, there’s Reddit, there’s Wikipedia, there are these huge platforms, but the average website. So a client that we work with or a marketer for, you know, even a large site that produces content about their products solution, their industry, is there like any reason that they wouldn’t want chat GPT or any kind of these language tools to crawl and consume their site and their information? Are they really providing anything that astonishingly knew that there is any benefit to blocking chat GPT or any of these types of tools?

Jay:
That’s a good question. And this is hard because normally you just like turn around and say, you know, what are you trying to accomplish by doing this? And I think the answer for everyone is we don’t know. It’s just we need to make a decision. And like the status quo is a decision. You know, if we let it continue to crawl our site, we’re deciding to do that. So.

Lindsie:
Mm-hmm.

Jay:
I mean, one important ground rule for this is this new robots.txt directive is only for chat GPT. Like Bing has its own separate AI tool, even though Bing acquired, or I forget, they partially acquired open AI.

Jay:
Google has its own tool, Facebook has its own tool. There are more being developed. So like you’re stopping one of them. There’s no way to stop the others currently. It’s just ChatGPT was the first, so it has all the attention.

Lindsie:
So question backing up a little bit, why do we think that ChatGPT has published this, allowed for their bot to be crawled? Was this a request? Was this, again, we’re not seeing it from these other AI tools, so why has ChatGPT come out and said, okay, you can block us if you want? What is their benefit to this?

Jay:
I think one, there’s always a legitimate concern of resources for anything that crawls websites. every website that gets any amount of traffic to it from humans is probably getting an equal amount of traffic from bots. Whether it’s like Google and Bane the search engines trying to find everything or all of these random tools that like monitor websites and traffic and you know foreign countries trying to take down the internet whatever it is. So well it’s a thing that happens. So like chat GPT is another thing that is coming to your site. It has like its own user agent, which is kind of it’s declaring. the name of the bot when it comes to the site. You know, some bots are just anonymous, but the bigger companies that use bots for legitimate reasons, they usually give a name to their bot and it’s saying like, hi, I’m ChatGPT, I’m ready to crawl your site now. And that consumes some amount of resources as it’s going through every page that it finds on your site. And that costs people money in terms of you spend money for servers this stuff and some of that money is going to all of these bots and supporting their traffic. So like any good company that is deploying bots to just scan huge chunks of the internet every day should follow some protocol. Either you know you can go into your robots.txt and say I want to block all bots which is bad if you want to keep getting traffic.

Lindsie:
Don’t do that.

Jay:
Yeah. So, you know, ideally your bot has a name and people can be selective about like, you know, I don’t want you coming to my site at all, or maybe you can come to my site, but like. this part of this like big part of my site, you really have no business going there. So I’m going to keep you out of that for whatever reason. So like resources is one thing. And I would, I would say lawsuits are another, um, and regulation because there are big companies that are making these copyright claims that like chat GPT has learned using their original information. Yeah. And. I, you know, if, if I was running chat or open AI, if I was running one of these companies, I would probably bet on the idea that like Congress and regulators aren’t very smart about this stuff. And we can say like, Hey, we implemented a way to block people or block our bot from coming to your website. We did our job.

Lindsie:
we did it. We allowed people to not have us look at their site. So I guess to transition this a little bit from maybe why they did it to why somebody would choose to block chat GPT or any of these type of tools. I mean, it seems like this may be like blocking Google? Why would you block something that crawls and pulls your content? Is it because it’s not credited? Where at least in many cases, not all of them, Google is giving credit and driving traffic to your site. Whereas these are just essentially pulling information as if it knows the information. But is it like blocking Google? I mean, are we talking about the same type of thing here?

Jay:
For all the hype that these large language models are getting as being the future of search on the internet, it sure seems like you’re blocking Google at this point because,

Lindsie:
Right?

Jay:
you know, like, yeah, there’s some things that lawyers are going to have to fight out about, you know, is it like breaking copyright law or something like that. But right now, what we’re seeing is these generative AI large language model things in search will give answers to questions and in the cases where like there’s more reading to do or products to explore or things like that, they’re linking to sources. Um, they’re maybe not doing it as much as we would hope.

Jay:
Yeah, but but it’s also changing every day. I mean, it’s the typical tech company thing of like, let’s just roll some stuff out and figure it out as we go. So the latest thing from Google when they use their large language model answers is they’ll have like… you know, here’s a complicated word and you can hover over it and we’ll give a definition. And if they pull that definition straight from another website, it seems like they’re linking to that website.

Lindsie:
Right.

Jay:
you’re preventing yourself from showing up in those things. And those things could be a bigger and bigger portion of the search results.

Lindsie:
Yeah, and I mean, just thinking through the downsides. So blocking them or blocking chat GPT or any of these type of tools at this point seems like there is risk there in terms of not being, I mean, one, it’s already crawled your site. Those bots have been there, whether you block it or not. So you could argue it’s future content protecting yourself in the future perhaps, but as it sits today, I think there is likely more risk. But what is the reward here? Why would somebody make a decision to block these bots today?

Jay:
Fireworks. Anyway, so yeah, like let’s say we’re afraid of the downside risk. We want to allow ChatGPT to keep crawling our site. There’s still the resource question. You know, if you have a big website that has hundreds of thousands or millions, tens of millions of pages, these bots are presumably coming back every day or on a regular basis and it’s going to keep growing. So. You know, at some, as this stuff expands, how much of your hosting budget goes to supporting bots, just scanning your content so they can regurgitate it in a slightly differently worded way somewhere else.

Lindsie:
Right, and I feel like I can hear a developer in a conversation like it’s slowing down our site, we need to block all these things, it’s using up our budget, we have to block it. But, I mean, is there a benefit to blocking portions? Like let’s say it’s an e-commerce site, right? Blocking all products, okay, chat, GPT, any kind of these tools that we can or cannot block. Don’t look at our products, like you’re not gonna gain anything interesting there, but like… our blog or our resource center because there’s actually content there or maybe the vice versa. When do you make that decision?

Jay:
I mean, I think you have to look at the, the financial drivers of things. So products I would probably leave open to crawling because you can say I’m looking for, I don’t, I don’t know. I’m looking for a microphone pop filter, which I’m staring at right now, but I want one that is, is like neon green. you know, the AI tools can give you a link to a product. So if you block them from crawling, then that won’t happen. And then you’re not gonna sell that, you know, obnoxiously colored $15 pop filter. But like, you could see how that could scale into a pretty big problem for revenue. Where…

Lindsie:
Well, but then I mean at the same time, but if you talk about, okay, I wanna block content, right? I wanna protect what I’m saying about like that my pop filter, right, is the best one because it has all these wonderful features and I wrote this long form content piece about it and how garbage the other ones are. Well, I would want the AI tools to crawl that. I want them to pull that information just as much as the product information. So are we just back to… Just don’t do it. Just don’t block anything.

Jay:
Maybe. Yeah. I, I think this kind of, to me reminds me a lot of all of the different search features that, that Google has rolled out over the years to keep people on Google.

Lindsie:
Mm-hmm.

Jay:
And So if you’re asking Google for, you know, what time is it? There are websites out there that the only purpose that website serves is to show you what time it is in your time zone. You don’t need to go to a website. I mean, for one, we all have phones and stuff, but like Google will tell you that. And

Lindsie:
bright.

Jay:
they’ve rolled out, you know, much more complex iterations of that. And it’s kind of come down to like websites. and people that run them have gotten upset. Like we’re losing all this traffic. We made all these lists and it’s not working.

Lindsie:
I mean, it’s back to the same idea that just writing content, providing information, like it has to be unique. You have to have a perspective. You have to essentially have an opinion and something quality and unique or else why are you doing it? I mean, we’ve had this conversation over the years about like, you know, these tourism sites where it’s like, well, here is this, you know. that everybody in San Francisco goes to and sees, and I have a page about it. It’s like, well, so does every other website that ever talked about this, and Google doesn’t need you anymore. So how do you make yourself valuable for all of these different spaces?

Jay:
Right. And, and then we’re going to see an influx of SEO people calling it like optimizing for the large language model instead of optimizing for search order. But it’s still kind of the same concept. We’re just kind of right now hoping that. these large language models are going to continue linking out to websites where they get their source information or where you can learn more, take some action, whatever.

Lindsie:
Right, right. So that there is some, again, back to the money of it, all these sites that are investing in creating content and unique content and quality information with a perspective, and then it’s just getting pulled into a tool like ChatGPT, like what? They need some sort of financial benefit from all this. They need an ROI, period. And so… I mean, maybe that’s where chat GPT or these types of AI tools have to go is like, how do they give an ROI to pulling information from their sites?

Jay:
Right. And I’m still running with the belief that search works and has worked for so many years because people like to feel like they’re in control of their decision-making process

Lindsie:
There’s options.

Jay:
and yeah. And voice search hasn’t taken off in part because the technology kind of sucks. and you don’t always get great answers or answers at all. But the other problem with it is you get one answer. And now Google, Siri, whoever is making the decision for you. They’re saying, you have a question, you’re looking for something, I get to decide what’s right for you. And I’m gonna give you my one view on this issue or question. And search for, you know, all the… the things you can point out as flaws, like you have 10 or 12 results on a page of different viewpoints, different options, and you get to decide which one you think is the path you want to go down. Or you get to pick a whole bunch of them. You can go through pages of results and click on a hundred different websites if you really want to,

Lindsie:
I do agree with that. So I think to get us back to this main conversation of, okay, chat GPT has come out and said, you can block us. You can put us in your robots.txt file. You can block parts of your website, all of your website. There are going to be people that… want to do this or don’t want to do this? Do we have an answer on that or are we going to come to it? It depends. Do we have an answer for people today?

Jay:
You said the magic word.

Lindsie:
Hahaha

Jay:
So don’t get me wrong. I don’t know that I would word it quite this way if it was like one of our clients asking the question, but as of this recording, I looked at, uh, Amazon, Wikipedia, Reddit, Google, Facebook X, uh, some, some sites I won’t mention the name of that are in the largest, most visited sites in the internet, uh, that are not safe for work. and looked at the robots.txt file, none of them are blocking chat GPT. So you can say, you know, like, oh, the SEO team or whatever, or the DevOps team at Facebook doesn’t know what they’re doing or whatever, and make those like one-off declarations of opinion.

Lindsie:
Mm-hmm.

Jay:
But if you’re saying I am going to block chat GPT, you’re saying like, I know what to do with this better than the largest websites on the internet. And that’s a bold statement to make.

Lindsie:
Yes, but what I also wonder is, does most people even know that this exists? Like, and I know I’m kind of derailing a little bit in terms of what our final conversation is, but this is all new. So maybe it takes the smaller sites, or the medium-sized sites to say, we’re gonna try this and we’re gonna test this. and then we’ll see these larger sites take action. I don’t know that it’s fair to say, well, Reddit’s not doing it, so we’re not gonna do it either. Is there value in being the guinea pig in trying it and taking that step? And I think there’s a lot more risk for these larger sites than maybe some of the smaller ones.

Jay:
That might be fair. I just think for all of the hand wringing that’s gone on from some of these big sites about chat, GPT has, has like stolen our information, you know, they’re, they’re making claims of like actual harm being caused. Like, like I get big companies can move slow on decision-making, but you would think that at least some of them, they would be doing this day one.

Lindsie:
Like right away. Yeah, I hear

Jay:
Yeah.

Lindsie:
that for sure. Hmm.

Jay:
Uh, yeah. So I would still lean towards. You’d have to, you’d have to like individually sell me on the idea that your business is going to get some benefit out of blocking chat GPT, not just we don’t like it or it’s scary or we think it’s stolen from us. And so we’re going to stop it from coming back. Like. Again, what is the benefit? I think we can outline the benefit of like, this looks like it’s going to be a huge part of search in the future and probably

Lindsie:
Right.

Jay:
pretty critical for driving traffic to websites and building awareness of businesses. Like there is a benefit to staying in this world going forward. What is the benefit to blocking yourself out? Sell me on that for you as an individual company or individual website. And I haven’t heard the case yet, other than it’s scary and we don’t like it.

Lindsie:
Right, they’re stealing from us. But even that is like, well, you’re also putting

Jay:
Yeah.

Lindsie:
information into the public ether. I mean, it’s the same conversation of like, anything on social media is public forum and if you put it out there, it’s public to a large degree. I mean, you’re publishing content online, bots are crawling it every day.

Jay:
Yeah, and like, I like to play music. I play the same 12 notes that Jimi Hendrix played.

Jay:
do them in a slightly different order at different times. You know, I mean, we all, we all like use things that we’ve learned and experience in order to create new things in the future. and the chat GPT stole from us, I kind of feel like it’s the idea that like, there’s no such thing as an original idea.

Lindsie:
All right, so I think what we’ve come down to, well, is one, is this is just new, right? Like there is no true testament of what ChatGPT or these large language tools are going to be for search, what it means to block it or not block it. But right now we don’t see any great benefit to pretty much any site at this point from blocking it to a large degree. If you do it, let us know. I think that would be interesting. How does that work for you? Is there any level of impact? Do you stop seeing traffic from like Bing because of how they use these open AI tools within their search results? Not that anybody’s getting a ton of traffic from Bing, but you know, maybe they are. So I think it would be interesting if anybody does play with it, what happens?

Jay:
Yeah, let us know and good luck.

Lindsie:
Good luck.

Jay:
Alright Lindsay, we’ll do this again.

Lindsie:
Okay, let’s do it again. Talk about the next cool thing or interesting thing or thing we can argue about, we’ll see.

Jay:
Cool, talk to you later.

Lindsie:
Bye.

SEO Site Migrations – Development Phase

Listen to the experts discuss.

Understanding and Leveraging Keyword Bias in SEO Strategy

Keyword bias and bias in general tends to be seen as taboo. Typically we hear “bias” and think that it’s likely something that is clouding our judgment unfairly and therefore not allowing us to make sound judgments. These biases can be something that clouds our minds...

SEO Site Migrations – Planning Phase

Listen to the experts discuss.

MozCon 2024 Recap

Listen to the experts discuss.

The Google Leak

Listen to the experts discuss.

Google Expands AI in Search: What Marketers Need to Know

Table of Contents How will AI Overviews Impact User Behavior? How to Change Your SEO Strategy for AI Overview? How to Leverage Your Search Agency for Success It’s finally happening. Google is launching AI Overviews in its live search results. What was previously...

Why Does Everyone Hate SEO?

Listen to the experts discuss.

Google’s Next Spam Update

Listen to the experts discuss.

Short Videos Feature in Google

Listen to the experts discuss.

SEO Content Optimization: 3 Easy Steps to Get You Organic Visibility!

Search Engine Optimization (SEO) involves a wide variety of tactics for web owners and marketers to elevate their organic presence. One of the most common and most important SEO tactics is SEO Content Optimization. Most folks know that their content needs to be ideal...

What do you think? Share your thoughts below.

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

Want to learn even more cool stuff? Lindsie Nelson can help!

Reach out today! We’re happy to help talk more about whatever search marketing issue is keeping you up at night.