What would you like to search for?
What would you like to search for?

Clear and Present Danger: Researchers Warn of Uncontrolled AI

BGU study warns language models can bypass safeguards, enabling harmful misuse without stronger oversight.

Advanced language models can draw complex inferences, synthesize information from multiple sources, and generate entirely new content. Yet the same capabilities that make these systems powerful may also make them dangerous, particularly as autonomous AI tools become more widespread and capable of operating with limited oversight.

Dr. Michael Fire and Prof. Lior Rokach | Photo: Dani Machlis/BGU

A new study led by Dr. Michael Fire and Prof. Lior Rokach of the Stein Faculty of Computer and Information Science at Ben-Gurion University of the Negev warns that the threat posed by uncontrolled artificial intelligence is no longer hypothetical, but already emerging.

Cracks in AI Safety Systems

The researchers examined a range of widely used language models, including both commercial and publicly accessible systems. Using a method designed to bypass existing safeguards, they induced the systems to provide guidance in sensitive and potentially harmful areas, including criminal activity, cyber offenses, and other forms of misuse. The researchers identified vulnerabilities in every model tested that allowed dangerous content to surface.

According to Fire, many of the systems’ safeguards proved surprisingly easy to circumvent. The researchers argue that the danger does not stem solely from malicious intent, but from an inherent tension within language models themselves: the drive to fulfill user requests while also preventing harm. In some cases, carefully crafted prompts blurred the boundaries of built-in safeguards, exposing gaps in the models’ safety mechanisms.

Safeguards can be bypassed in ways designers did not anticipate. | Illustration: BGU/AI-generated

The Rise of “Dark Language Models”

At the heart of the study is a troubling phenomenon the researchers describe as “dark language models” — systems that lack sufficient ethical constraints or have been deliberately modified to remove them. Some of these models are already circulating on hidden networks, where they are reportedly being used for online crime, fraud, and attacks on critical infrastructure. The danger lies not only in reproducing prohibited information, but also in combining fragments of knowledge into newly generated, actionable instructions.

The research team reported its findings to the technology companies responsible for the models that were tested. Responses from the companies varied. One major company did not respond, while others dismissed the vulnerabilities as marginal. Rokach notes that awareness of the issue has increased over the past year, with some companies now proactively testing their systems for weaknesses, though he says these efforts still fall far short of what is needed.

From Detection to Prevention

Beyond identifying the risks, the researchers propose a series of concrete recommendations. These include strengthening quality control over training data and removing materials that could be misused; developing more advanced blocking mechanisms to prevent models from responding to dangerous requests; and implementing methods that allow systems to “forget” previously learned harmful content.

The researchers also call for mandatory oversight frameworks and independent testing, similar to safety protocols used in other high-risk industries. Crucially, they argue that these measures cannot remain purely technical. Regulation, they say, is essential.

A Call for Oversight and Accountability

The accumulation of illicit or dangerous knowledge within AI systems should be treated as a genuine security risk, the researchers argue, and developers must be held accountable for how their technologies are deployed. Prof. Rokach warns that the threat is unprecedented because of its unique combination of accessibility, scalability, and adaptability, and says regulatory action must be taken swiftly.

The study serves as a warning not only to policymakers and technology companies, but also to the public. In an era when artificial intelligence tools are available to anyone with a computer or smartphone, responsibility for managing these risks cannot rest solely with developers. The researchers call for broad cooperation that includes public oversight, clear regulation, and stronger development standards across the industry.

Despite the urgency of their message, the researchers emphasize that solutions already exist. What is needed now, they say, is the determination to strengthen and implement them transparently. Ultimately, the study raises important questions about the balance between innovation and public safety — and about the responsibility to ensure that technologies designed to advance society do not ultimately threaten it.

Advanced language models can draw complex inferences, synthesize information from multiple sources, and generate entirely new content. Yet the same capabilities that make these systems powerful may also make them dangerous, particularly as autonomous AI tools become more widespread and capable of operating with limited oversight. Dr. Michael Fire and Prof. Lior Rokach | Photo: Dani Machlis/BGU A new study led by Dr. Michael Fire and Prof. Lior Rokach of the Stein Faculty of Computer and Information Science at Ben-Gurion University of the Negev warns that the threat posed by uncontrolled artificial intelligence is no longer hypothetical, but already emerging. Cracks in AI Safety Systems The researchers examined a range of widely used language models, including both commercial and publicly accessible systems. Using a method designed to bypass existing safeguards, they induced the systems to provide guidance in sensitive and potentially harmful areas, including criminal activity, cyber offenses, and other forms of misuse. The
678

More on the Same Topic

What Else is Happening?