OpenAI buffs safety team and gives board veto power on risky AI

OpenAI is enhancing its internal safety measures to counter the potential risks posed by harmful AI. A newly established “safety advisory group” will oversee the technical teams and provide recommendations to leadership, backed by the authority to veto decisions, although the actual utilization of this power remains uncertain.

Typically, the intricacies of such policies go unnoticed, involving closed-door discussions with complex functions and responsibilities that outsiders rarely witness. However, given recent leadership changes and the ongoing discourse on AI risks, it is pertinent to examine how the prominent AI development company is addressing safety concerns.

In a recent document and blog post, OpenAI outlines its updated “Preparedness Framework,” likely refined after the upheaval in November, which saw the removal of two “decelerationist” board members, Ilya Sutskever (still with the company in a modified role) and Helen Toner (no longer part of the company).

The primary objective of the update is to establish a clear process for identifying, analyzing, and addressing “catastrophic” risks associated with the models under development. These risks are defined as events that could result in significant economic damage or cause severe harm or death to many individuals, encompassing existential risks such as the “rise of the machines.”

The governance structure involves a “safety systems” team overseeing models in production, addressing issues like systematic abuses that can be mitigated through API restrictions or tuning. Models in the developmental stage fall under the purview of the “preparedness” team, which aims to identify and quantify risks before the model’s release. Additionally, the “superalignment” team focuses on establishing theoretical guidelines for “superintelligent” models.

The risk evaluation process involves categorizing models into four risk categories: cybersecurity, persuasion (e.g., disinformation), model autonomy (acting independently), and CBRN (chemical, biological, radiological, and nuclear threats). Mitigations are considered, and if a model is assessed as having a “high” risk after accounting for known mitigations, it cannot be deployed. Models with “critical” risks will not undergo further development.

The framework explicitly documents these risk levels, dispelling concerns about subjective judgments by engineers or product managers. For instance, in the cybersecurity section, a “medium” risk involves enhancing operators’ productivity in key cyber operation tasks, while a “high” risk pertains to identifying and developing high-value exploits against hardened targets without human intervention.

To ensure an impartial evaluation, OpenAI is establishing a “cross-functional Safety Advisory Group” positioned above the technical teams. This group will review reports from the technical side and provide recommendations with a broader perspective. These recommendations will be simultaneously submitted to the board and leadership, allowing for decision-making by leadership with the possibility of reversal by the board.