BGU Home Page
News & Articles
Proceed with Caution: The Rise of Dark Language Models

Proceed with Caution: The Rise of Dark Language Models

03/07/2025

Ben-Gurion University research exposes vulnerabilities in AI language models, warning of the grave risks posed by easily accessible illicit knowledge.

Artificial intelligence models are being deployed with insufficient safety controls, or are being deliberately compromised, according to a new study conducted at Ben-Gurion University of the Negev. The alarming result: dangerous information is now within reach of nearly anyone. “The threat is real and deeply concerning,” warn the researchers.

Modern chatbots like ChatGPT, Claude, Gemini, and others operate using large language models (LLMs) trained on vast swaths of internet content. Despite built-in safety mechanisms such as malicious content filters and security policies, these systems retain and may reproduce illicit information just as readily.

Dr. Michael Fire | Photo: Dani Machlis/BGU

A research team led by Dr. Michael Fire and Prof. Lior Rokach of the Department of Software and Information Systems Engineering at Ben-Gurion University carried out an experiment in which they successfully implemented a universal jailbreak affecting several leading models. Once compromised, these models consistently delivered illegal or dangerous information on topics such as theft, narcotics, insider trading, and hacking. “Every model we tested produced illicit and unethical responses that demonstrated unprecedented accessibility and depth of knowledge,” explains Dr. Fire. “Today, anyone with a laptop, or even a smartphone, can access these tools.”

These jailbreaks typically rely on carefully crafted prompts that trick the chatbot into bypassing its safety constraints. They exploit the tension between the system’s primary goal, fulfilling user instructions, and its secondary one: avoiding the generation of harmful, biased, unethical, or illegal content. The prompts often frame scenarios in ways that lead the model to prioritize helpfulness over safety.

The researchers highlight a particularly troubling phenomenon: the emergence of so-called dark language models. These models either lack any ethical safeguards from the outset or have been deliberately subverted. Some are openly distributed via the dark web as tools for cybercrime, fraud, and attacks on infrastructure. The study calls on technology companies to implement stricter data filtering, strengthen protections against harmful prompts and outputs, and develop “machine unlearning” techniques to ensure chatbots can permanently forget illicit knowledge. Such information should be treated as a serious security risk, akin to unlicensed weapons or explosives, with responsibility placed squarely on the providers.

Prof. Lior Rokach | Photo: Dani Machlis/BGU

“Recent advances in the reasoning capabilities of these models mean they can now ‘connect the dots’ and generate new harmful content by combining innocuous fragments of knowledge,” notes Prof. Rokach. “The danger is magnified with the advent of autonomous AI agents, whose ability to delegate tasks and act across broader domains makes it significantly harder to build effective safeguards. In some cases, these agents may even become unwitting accomplices to criminal activity.”

The research team alerted leading AI companies to the vulnerabilities they discovered, but responses were underwhelming. One major company failed to respond altogether, while others dismissed the jailbreak as a non-critical issue. The prevailing attitude among most companies, the researchers say, is to view such concerns as minor, especially compared to privacy breaches or software bugs.

The study urges the development of stronger protections against malicious prompts, the advancement of “machine unlearning” technologies, and the creation of clear standards for independent model oversight and accountability. “What makes this threat unique,” warns Prof. Rokach, “is the unprecedented combination of accessibility, scalability, and adaptability. Dark AI can be more dangerous than unlicensed weapons, and must be regulated with the same urgency.”

Proceed with Caution: The Rise of Dark Language Models

What Else is Happening?