Responsible, by ClearOPS

Does open source mean privacy preserving?

Caroline McCaffery
28 Jan • Estimated Reading Time: 10 minutes

In partnership with

Welcome back to Responsible! A newsletter about Responsible AI and other responsible business practices. This is the best newsletter about the AI revolution we are currently experiencing because it contains original thought. No AI in this AI.

What I have for you this week:

The story behind DeepSeek and the market upset
Open source vs closed source
Caroline’s weekly AI Governance tips
Chef Maggie Recommends
AI Tool of the Week
AI Bites

Let’s address the elephant in the room - DeepSeek

This past weekend, my social media feeds exploded with how the R1 model was astounding people.

Honestly, I don’t get it. An open source model that is as good as OpenAI’s o1 model which was released 5 months ago? Why are people impressed with a model that reaches parity with another model that has been on the market for a while?

Let me tell you a little story about a boy from China who went to school and studied engineering. He was obsessed with making money and found a hit when he combined AI algorithms to finance, so he rejected all the offers from big companies and decided to launch his own hedge fund at the age of 30 with a few buddies from school.

Within 5 years, they had 5 billion yuan under management and the boy, now man, realized that AI and trading was his future, so he bought a bunch of chips from a company called Nvidia. Little did he know that the government where Nvidia operated from was going to prevent Nvidia from shipping more chips to China, which means he was sitting on a gold mine. But how to optimize?

Well, he had enough AI chips to build an LLM and so he launched DeepSeek in May 2023 and by the next year pushed out the first version of the technology, sharply cutting its commercial price, only to find out how sensitive the tech market was to price. Now this genius, who understands how algorithms affect markets, and how sensitive the markets are to the price of foundational models, comes up with a plan to take advantage of both learnings. He launches a new open source model that his team built from the currently available best open source model and gets it to parity with the best closed source model on the market. Then he severely undercuts the pricing of that best closed source model on the market by launching a commercial version of his model. He then initiates a huge marketing campaign that preys on geopolitical fears, shorts public company stocks in AI, and reaps a huge profit.

The question is, will he use that profit to continue building LLMs?

On Monday morning, the first thing I did was visit DeepSeek’s website. Two things sparked this.

First, I was curious about how easy it would be to connect to it for ClearOPS as an open source model. If it is as good as GPT 4, then maybe it would be worth it to us to switch.

Second, I heard that they also offered a chatbot (like ChatGPT) and an API. People started making waves that, since the company DeepSeek is located in China, that there are probably backdoors and so someone else responded “not in the open source version, but there might be in the chatbot or the APIs.” And then someone else argued that the US LLM providers do the same thing so what’s the fuss about. And then someone else said that maybe the open source version calls a library or has some code in it that could still send data back to servers in China. And then, well, you get the picture.

And here is what I want to say about all of it. The best thing about DeepSeek is that they proved exactly what I predicted. That these LLMs can be run on much less compute, which causes less damage to the environment and also brings down costs. This is a win:win:win in my book.

And if you are worried about the Chinese government having access to your inputs or outputs, then don’t use the DeepSeek models. Pay more for US model providers to violate your privacy.

See what I did there?

But in all seriousness, open source means that you are running the code on your own machine and that all the documents etc. that you have the algorithm evaluate are also kept locally on your machine. It is privacy preserving to use open source. Maybe there are some libraries or code that could potentially be a hack into your device, but I am pretty sure if that was possible, someone will be finding it within about a month.

The APIs and chatbot, however, are fair game and most likely the data will be sent to a server in China, if not also stored there. So, pick your battles according to your own ethics and geopolitical views.

Subscribe to keep reading

This content is free, but you must be subscribed to Responsible by ClearOPS to continue reading.

Already a subscriber?Sign in.Not now

Reply

or to participate.