Social Media

Light
Dark

Your website can now opt out of training Google’s Bard and future AIs

Large language models are trained using a wide range of data, much of which appears to have been gathered without individuals’ knowledge or consent. You are now faced with a decision regarding whether to permit Google to utilize your web content as a resource for its Bard AI and any forthcoming models it opts to create.

The process is straightforward: you can prevent the use of “User-Agent: Google-Extended” in your site’s robots.txt file, which instructs automated web crawlers about the content they can access.

Despite Google’s assertions that it is developing its AI in an ethical and inclusive manner, it’s essential to acknowledge that AI training differs significantly from the web indexing process.

“We’ve also received feedback from web publishers expressing their desire for increased choice and control over how their content is utilized for emerging generative AI applications,” notes Danielle Romain, the company’s VP of Trust, in a blog post. This statement may appear somewhat surprising.

Interestingly, the term “train” does not appear in the post, even though it is evident that this data is employed as raw material for training machine learning models.

Instead, the VP of Trust poses the question of whether you genuinely do not wish to “contribute to the improvement of Bard and Vertex AI generative APIs” – “to aid these AI models in becoming more precise and capable over time.”

In essence, this isn’t about Google taking something from you; it’s about your willingness to assist.

On one hand, this approach could be seen as the most appropriate way to present the question, as consent is a critical element in this equation, and actively choosing to contribute is precisely what Google should be requesting. However, on the other hand, the framing loses credibility because Bard and its other models have already been trained on vast amounts of data extracted from users without their consent.

The undeniable truth demonstrated by Google’s actions is that it leveraged unrestricted access to web data, achieved its objectives, and is now requesting consent retroactively to give the appearance that prioritizing consent and ethical data collection is important to them. If it genuinely were, this option would have been available years ago.

Incidentally, Medium just announced today that it will block crawlers like this universally until a more refined solution is in place. They are not the only ones taking this approach by a long shot.

Leave a Reply

Your email address will not be published. Required fields are marked *