Microsoft AI CEO: Content on the open web is "freeware" for AI training

zohaibahd · 2024-07-01T08:36:00-0400

What just happened? The use of copyrighted material to train AI has become a hot-button issue, with experts divided on whether it constitutes theft or a legitimate form of study akin to artistic training. Microsoft's AI top executive thought it would be a good idea to add fuel to the fire by making some bold claims about what companies can legally do with online content when training their AI systems.

Mustafa Suleyman, who's been heading Microsoft's AI efforts since March, told CNBC in an interview that material published openly on the web essentially becomes "freeware" that anyone can copy and use as they please.

"I think that with respect to content that's already on the open web, the social contract of that content since the '90s has been that it is fair use. Anyone can copy it, recreate with it, reproduce with it," he stated. "That has been 'freeware,' if you like, that's been the understanding."

That's certainly a spicy take – and an inaccurate one – you only need to look at the FAQ page from the US Copyright Office. One answer therein states that "your work is under copyright protection the moment it is created and fixed in a tangible form that it is perceptible either directly or with the aid of a machine or device."

The same FAQ adds that you do not even need to register "to be protected." The only time registration is needed is when you wish to file a lawsuit for infringement. So it's safe to say fair use doesn't come from any "social contract" as Suleyman suggests.

Suleyman did seemingly acknowledge the importance of the robots.txt file, stating that mentioning "do not scrape or crawl" on a website might make scraping a "grey area." But adhering to this basic protocol blocking web crawlers is more of a courtesy, not something that needs to "work its way through the courts," as he suggested.

Not surprisingly, even robots.txt is being ignored by various AI companies including Anthropic, Perplexity, and OpenAI.

This isn't the first time an executive working on AI advancement has made controversial claims. A big reason behind the prevalence of such statements is likely that despite over a year since ChatGPT's launch, the legal grounds are still being mapped out regarding training data and copyright.

Microsoft and partner OpenAI are indeed facing multiple lawsuits from publishers over allegations of using copyrighted online articles to train their powerful language models without permission. However, these cases have yet to reach final resolutions that could provide more legal clarity.

Suleyman's statements reflect a view of AI's scraping of the internet similar to how artists have always studied great works while learning their craft. "What are we, collectively, as an organism of humans, other than a knowledge and intellectual production engine?" he mused in the same interview.

However, the difference between AI and artists is that only one is capable of ingesting and regurgitating the world's content into profitable AI products and services on an unprecedented scale.

Permalink to story:

Microsoft AI CEO: Content on the open web is "freeware" for AI training

wiyosaya · 2024-07-01T09:42:18-0400

What else would this jack *** say when his job depends on his, and everyone else's, AI committing copyright infringement?

Years ago, IIRC, M$ was strictly against this sort of thing. M$, you cannot have it both ways.

NumberNine · 2024-07-01T09:46:38-0400

Big corporation no like give.

Only take.

Benses · 2024-07-01T10:04:16-0400

The most valuable company in the world doesn’t want to use things without paying. What a chocker!

Squid Surprise · 2024-07-01T10:13:59-0400

Eventually the laws will change - there is no way to enforce protection of the net, so they’ll just give up.

Burty117 · 2024-07-01T10:21:15-0400

Surely, any website that wants to protect its content, Netflix, Disney+, AppleTV, they all force you to login with an account right? So they aren't "open web"?

Even YouTube, you have to login to see stuff that's marked as more adult content or "not kid friendly" content.

Soo... if you don't want your content crawled, why not lock it behind a login of some kind? What am I missing here?

Let's say you're walking down the public high street, female model's in windows for clothing brands, Warhammer 40k giant figures at the Games Workshop, Shoes all over the walls in the shoe shops. If I took photos of any of these publicly accessible places, am I breaking copyright laws in some way?

If these shops weren't showing stuff in their front windows, and doors were closed, and the only way to get in was a pin, enter your email address, okay here's the pin to get in (as an example), then once I was inside, and then started taking photo's, then I'd get it, I'm actively exploiting something they don't want the world to see.

I absolutely might not have read this correctly though and I've got the wrong end of the stick.

Squid Surprise · 2024-07-01T10:26:22-0400

Burty117 said:
Surely, any website that wants to protect its content, Netflix, Disney+, AppleTV, they all force you to login with an account right? So they aren't "open web"?

Even YouTube, you have to login to see stuff that's marked as more adult content or "not kid friendly" content.

Soo... if you don't want your content crawled, why not lock it behind a login of some kind? What am I missing here?

Let's say you're walking down the public high street, female model's in windows for clothing brands, Warhammer 40k giant figures at the Games Workshop, Shoes all over the walls in the shoe shops. If I took photos of any of these publicly accessible places, am I breaking copyright laws in some way?

If you start selling those photos you are…

Burty117 said:
If these shops weren't showing stuff in their front windows, and doors were closed, and the only way to get in was a pin, enter your email address, okay here's the pin to get in (as an example), then once I was inside, and then started taking photo's, then I'd get it, I'm actively exploiting something they don't want the world to see.

And now they get less visits…

Burty117 said:
I absolutely might not have read this correctly though and I've got the wrong end of the stick.

gingerbill · 2024-07-01T10:33:23-0400

I am glad you pointed out in the article his claims are complete bullshit . Surprised even he would stoop to such obvious bare face lies.

Squid Surprise · 2024-07-01T10:43:31-0400

gingerbill said:
I am glad you pointed out in the article his claims are complete bullshit . Surprised even he would stoop to such obvious bare face lies.

While, legally, they are certainly lies… in reality, they are the truth.

MS, OpenAI and every other AI company have been, currently are, and will always use the internet for training and there is virtually nothing law enforcement can or will be able to do about it. At most, they will get a fine - which will represent a tiny percentage of their ENORMOUS profits - and will proceed to train their AIs as they please.

viperfl · 2024-07-01T11:48:42-0400

Going by his way of thinking I guess if a copy of Windows 11 or Microsoft Office was in the wild, I guess we could download it, install it, and use it for free? I could use the same excuse and say if I am using it for training purposes I shouldn't have to pay for it. I guess now we can go to Microsoft and let them know we are using their programs and not paying for them because your CEO of AI said we are allowed

wujj123456 · 2024-07-01T13:32:08-0400

More such statement please. I expect they will be very handy in court to prove the companies either intentionally violated the copyright law, or willingly neglected copyright protection.

GoldenGoat · 2024-07-01T14:03:28-0400

Mustafa Suleyman is obviously wrong. The articles on the "open web" are absolutely copyrighted. Plagiarizing them is illegal. However, the question for AI should not be if the "open web" is copyrighted. The question is if an AI reading in copyrighted martial to learn from is ok. I don't think there is any law saying it's not. A good AI never exactly reproduces the original material. It doesn't store the original material so it can't cut and paste it back into a new article. It's no different from a person reading an article and learning something form it and then telling a friend about it in their own words. That is not a copyright violation to do that. But the government could change copyright law to make it illegal to train an AI without paying. I don't think there is any law saying that is required right now.

Alfonso Maruccia · 2024-07-01T14:17:33-0400

gingerbill · 2024-07-01T15:06:04-0400

Squid Surprise said:
While, legally, they are certainly lies… in reality, they are the truth.

MS, OpenAI and every other AI company have been, currently are, and will always use the internet for training and there is virtually nothing law enforcement can or will be able to do about it. At most, they will get a fine - which will represent a tiny percentage of their ENORMOUS profits - and will proceed to train their AIs as they please.

true

Karlos95 · 2024-07-02T00:09:42-0400

All M$ products are now free to use. Pirate them I say. If they don't want to pay for their "data" then we don't have to pay for theirs. I think it is that simple.

Microsoft AI CEO: Content on the open web is "freeware" for AI training

zohaibahd

Posts: 204 +3

wiyosaya

Posts: 9,837 +9,779

NumberNine

Posts: 184 +273

Benses

Posts: 15 +37

Squid Surprise

Posts: 5,709 +5,317

Burty117

Posts: 5,191 +3,822

Squid Surprise

Posts: 5,709 +5,317

gingerbill

Posts: 254 +94

Squid Surprise

Posts: 5,709 +5,317

viperfl

Posts: 69 +79

wujj123456

Posts: 93 +78

GoldenGoat

Posts: 243 +271

Alfonso Maruccia

Posts: 1,159 +340

gingerbill

Posts: 254 +94

Karlos95

Posts: 351 +246

Similar threads

Latest posts