- Responsible by ClearOPS
- Posts
- Responsible, by ClearOPS
Responsible, by ClearOPS
The Dark Side of AI: The Who, What, Why Effectiveness of Red-Teaming
Sometimes I wonder, who thinks of this stuff anyway? That’s exactly what I thought when I learned about Grok 3’s disastrous antisemitic outburst. It got me thinking about AI prompt‑injection red‑teaming, a method of brainstorming ways to coax an AI into producing unsavory responses. It is a fascinating area that we are going to explore today.
Today’s newsletter will challenge your ethics and your morality.
You have been warned.
What I have for you this week:
Snippets About What is Going On in Responsible AI
Caroline’s weekly thoughts
Chef Maggie Recommends
Useful Links to Stuff
Start learning AI in 2025
Everyone talks about AI, but no one has the time to learn it. So, we found the easiest way to learn AI in as little time as possible: The Rundown AI.
It's a free AI newsletter that keeps you up-to-date on the latest AI news, and teaches you how to apply it in just 5 minutes a day.
Plus, complete the quiz after signing up and they’ll recommend the best AI tools, guides, and courses – tailored to your needs.

A couple of models have behaved poorly this past week. First Grok 3, just before they announced the release of Grok 4, displayed antisemitic outputs. When I see news like this, I am always surprised and think, “who and what is being prompted to generate this sort of output?” I think it has to take a certain type of person, with a certain type of mind, to go places with these AI chatbots to get them to generate racist, hateful, gross, deceitful, unethical, etc. (I could go on and on) output.
And I think it is hard for companies to hire these people and unleash them on their models for the purpose of testing and fine-tuning. I know I don’t want to work with one of those people. Similarly, OpenAI disclosed that it was pulling back the release, scheduled for next week, of its highly anticipated open models for further safety testing and review of high risk areas. The implication is that these models generated unsafe, high risk outputs. So how exactly do model providers make their models…safe and ethical?
Many years ago, at an event I met Brenda Leong who was at the Future of Privacy Forum focusing on AI. I introduced myself to her because in my role as the GC of Clarifai, I thought it was important that we know each other. Fast forward to today and she is at the firm Zwillgen, that does AI red-teaming for GenAI. Having spoken to Brenda, I cannot imagine that she is someone with a devious enough mind to easily input illegal activities, model bias, toxicity and hate speech, privacy harms, security attacks, and hallucinations off the top of her head. Clearly, there is a checklist (I know, the irony of me always going for jobs with checklists!).
Last Fall, at the AI Risk summit in London, I was introduced to Babl, a company that audits your AI systems for Responsible AI practices. Clearly, the red-teaming industry is booming and it is little surprise why when you see the model providers struggling, like OpenAI and Grok 3 did this past week.
But the most interesting red-teaming company I have seen is Lakera who published a game, called Gandolf, that encourages the community to perform activities so that the Lakera team learns how users exploit these AI systems. It’s smart and clever so if you try it, let me know how it goes for you. Although I have not read their privacy terms and I am guessing there are implications to being a user.

AI red-teaming is lucrative. So far, the estimated total addressable market is over $1 billion and the amount of publicly reported investment dollars that have gone into it from the VC world is around $80 million (note, this is for tech startups and does not track the investment into consulting firms, where most red-teaming currently is performed). When you think of AI in the context of people losing jobs and the fact that AI red-teaming is sort of a check the box exercise, that is a lot of new jobs for people who do not necessarily need to be highly skilled!
Note, I am aware that some red-teaming does require someone familiar with how machine learning systems work, so please don’t get mad at me for saying it might not necessarily need highly skilled labor.
So is red-teaming effective? Is the delay of OpenAI’s open models going to do something? Or are we losing “free speech” as some commenters on Elon Musk’s X post complained? Okay, first, AI is not human and is not entitled to any free speech rights. No court has expanded it to AI systems, so let’s make sure we know what we are talking about here. Second, it is up to the model provider to do something about the results of a red-teaming, such as put in place policies that guardrail the AI models outputs. Third, and finally, there are some common frameworks that teach companies how to establish a benchmark and measure the improvement after creating policies from red-teaming. Claims have ranged from 2%-99% so of course the answer to the question about whether it is effective? It is “it depends.”
Reply