- Responsible by ClearOPS
- Posts
- Responsible, by ClearOPS
Responsible, by ClearOPS
Welcome to the dark side of the force.
Responsible is a newsletter about Responsible AI and other responsible business practices. Announcement! I am officially a teacher of AI governance and my next course is selling out. Need templates, forms and how to’s? Well, that’s what my course provides. It isn’t theoretical.
What I have for you this week:
The dark side of the web, and I am not talking about Tor
What the heck is distilling?
Caroline’s weekly AI Governance tips
Chef Maggie Recommends
AI Tool of the Week
AI Bites
I know I have talked about this before, but it is worth repeating. AI works because you feed an algorithm, called a model, with data. We call this data “training” data.
It’s an over simplification, of course, because that data is actually transformed, so we aren’t literally feeding it pictures or text. We are feeding it mathematical renderings of those pictures or text. Can they be reverse engineered? Possibly and it depends.
But if you watched the show Silicon Valley, then the episode of “hot dog, not hot dog” is relevant to the ethical dilemma I am about to explore. Plus, it had me laughing out loud, so it is worth watching. When I was working at Clarifai, I learned that in order to build a model to recognize a hot dog, you had to train it on hundreds of thousands of pictures of a hot dog, and also the same amount of pictures that were not hot dogs, but similar.
As the de facto AI Ethics officer, I often thought that Clarifai’s technology could be used for so much good, like taking revenge porn pictures or child porn off the web. If you could train the model to recognize those pictures, then it could automatically identify them and take them down.
Do you know how it is done right now? Humans do the work. And it is causes those humans to suffer immense mental health effects, as I am sure you could imagine.
So, back in my Clarifai days, I tentatively embarked on the mission to uncover what was possible and here is the problem that I found: it is illegal to hold those images, let alone use them for training data, even if they are transformed into mathematical renderings. So no company can build the model that automatically detects illegal pictures on the web. This means that, while the laws are full of good intentions, the ability to detect those images remains poor.
So while I applaud the UK’s announcement to target child sexual abuse images generated by AI, I still think legislation should go further to go after the actual criminals without inadvertently removing the good that technology can do to fight the bad. This is not a new problem, but it is getting worse, as the Guardian just reported.
We need stronger laws and stronger tech to fight these crimes.
Last week, I wrote about DeepSeek and many of you commented on the post. Shortly after I released that newsletter, OpenAI came out and made a claim that DeepSeek used “distilling.”
What the heck is distilling?
Distilling is sort of like reverse engineering a generative chatbot: using the outputs of one chatbot as the training data for another model. In this case, OpenAI’s o1 model became the teacher and DeepSeek the student, learning how to generate the same outputs as o1.
Why would someone use this method? Because it is cheaper and requires less compute. The irony is not lost on me that my whole argument last time was that using less compute made what DeepSeek did really compelling.
But there is an ethical question here, which is, is distilling stealing?
I have heard a lot of people say, “well, OpenAI stole our data to train the models in the first place, so serves them right that someone stole from them.”
Is it though? Is getting as good as you give creating balance or tipping the scales in the other direction?
Besides, isn’t it just further exploitation of that original data?
Reply