Responsible
Posts
Responsible

Responsible

Chevron, Web Scraping and VC Pitching

Caroline McCaffery
02 Jul • Estimated Reading Time: 9 minutes

In partnership with

Hello, and welcome to Responsible, by ClearOPS, a newsletter about ResponsibleAI and other responsible practices.

If you don’t know me, I’m a nice person, with a strong sense of responsibility, who happens to be an attorney, a mom, a wife, an amateur chef and a dog lover. I also geek out on technology and beat Super Mario about 10 times when I was young. BTW, I feel compelled to apologize for the 2 week hiatus these last two Tuesdays. I am the only creator of this email and I went to Italy on vacation :)

Here’s what we have for you this week:

The Chevron decision, here’s my take.
Is scraping data online a bad thing?
Caroline’s weekly musings

Funny that the generated image is a hooded man. I did not prompt for that!

Last night, driving back from Manhattan, I was listening to a program where they explored a highly controversial topic, euthanasia, which is legal in the Netherlands. In this particular episode, a couple had been approved to receive the toxic chemicals and die together. I realize this is a sensitive topic, so if you want to learn more about their story, please go to the link.

What I want to focus on is that the Netherlands has put in place a lot of process and ethics oversight before approving a euthanasia case. Clearly, this is a topic and situation where ethics is very, very difficult, but important. How do governments balance the desire of the individual to have autonomy and the need for government imposed safeguards against abuse? Something we are grappling with in AI.

It made me think about the recent Supreme Court case to overturn their previous Chevron decision, which is all the rage in my legal circles. In a nut shell, the Chevron decision gave federal agencies significant leeway in how they interpreted silence or ambiguity in a law passed by Congress. The judiciary deferred to the administrative branch. It’s reversal means the opposite could be true.

While we can only speculate if this will be terrible or great for our country, I want to pose this as a responsibility question. In the Netherlands, the euthanasia law is enforced by a committee, similar to the US using agencies. The committee must review a doctor’s decision and determine if they agree or disagree. In the U.S., it is possible that decisions will now be made by a judge or jury. While similar, the distinct difference that everyone is talking about is the expertise of those involved in making the ultimate call. Do you want it to be a group of individuals with expertise in the field and also with experience in making this specific decision? Or would you rather it be a group of dispassionate non-experts who are highly experienced in making decisions, but more broadly?

Who should be responsible?

Snapshot of a report generated by ClearOPS on AWS via scraping

I’ve spent a lot of time studying web scraping. Back at Clarifai, I actually gave a presentation on it to the developers and engineers. Web scraping has been around a long time and challenged several times. With the use of scraped content for AI training, it is coming under intense scrutiny these days, but I don’t think it will become illegal, nor do I think it should.

But I am incredibly biased!

We use scraping at ClearOPS and while it is not with the intention of training an AI model, it is meant to give businesses information about themselves and their vendors. I think that is a “higher purpose” because we are “tracking the trackers” i.e. fostering transparency.

Search engines, like Google, and the way back machine scrape web data but most companies want them to scrape their websites. Frankly, I don’t see much difference between an AI company scraping for training a chatbot and Google. Isn’t everyone using ChatGPT as a search engine anyway? 😉

And does Google get a pass because we all want them scraping our websites for SEO purposes? There is an interesting point. Maybe Google wins the AI race for this simple reason alone i.e. we ask Google to scrape our websites and ignore our copyright!

I’m being a little sarcastic here because I hate the trackers. The companies who scrape personal information in order to use it for an unintended purpose. ClearViewAI is one of those companies.

In this short piece, I have explained three uses of web scraping, one for a higher purpose, one that is useful and one that is a clear violation of so many privacy rights. In short, this issue is complex but my hope is to convince you that not all web scraping is bad.

Subscribe to keep reading

This content is free, but you must be subscribed to Responsible to continue reading.

Already a subscriber?Sign in.Not now

Reply

or to participate.