Microsoft AI CEO: Content on the open web is "freeware" for AI training

zohaibahd

Posts: 204   +3
Staff
What just happened? The use of copyrighted material to train AI has become a hot-button issue, with experts divided on whether it constitutes theft or a legitimate form of study akin to artistic training. Microsoft's AI top executive thought it would be a good idea to add fuel to the fire by making some bold claims about what companies can legally do with online content when training their AI systems.

Mustafa Suleyman, who's been heading Microsoft's AI efforts since March, told CNBC in an interview that material published openly on the web essentially becomes "freeware" that anyone can copy and use as they please.

"I think that with respect to content that's already on the open web, the social contract of that content since the '90s has been that it is fair use. Anyone can copy it, recreate with it, reproduce with it," he stated. "That has been 'freeware,' if you like, that's been the understanding."

That's certainly a spicy take – and an inaccurate one – you only need to look at the FAQ page from the US Copyright Office. One answer therein states that "your work is under copyright protection the moment it is created and fixed in a tangible form that it is perceptible either directly or with the aid of a machine or device."

The same FAQ adds that you do not even need to register "to be protected." The only time registration is needed is when you wish to file a lawsuit for infringement. So it's safe to say fair use doesn't come from any "social contract" as Suleyman suggests.

Suleyman did seemingly acknowledge the importance of the robots.txt file, stating that mentioning "do not scrape or crawl" on a website might make scraping a "grey area." But adhering to this basic protocol blocking web crawlers is more of a courtesy, not something that needs to "work its way through the courts," as he suggested.

Not surprisingly, even robots.txt is being ignored by various AI companies including Anthropic, Perplexity, and OpenAI.

This isn't the first time an executive working on AI advancement has made controversial claims. A big reason behind the prevalence of such statements is likely that despite over a year since ChatGPT's launch, the legal grounds are still being mapped out regarding training data and copyright.

Microsoft and partner OpenAI are indeed facing multiple lawsuits from publishers over allegations of using copyrighted online articles to train their powerful language models without permission. However, these cases have yet to reach final resolutions that could provide more legal clarity.

Suleyman's statements reflect a view of AI's scraping of the internet similar to how artists have always studied great works while learning their craft. "What are we, collectively, as an organism of humans, other than a knowledge and intellectual production engine?" he mused in the same interview.

However, the difference between AI and artists is that only one is capable of ingesting and regurgitating the world's content into profitable AI products and services on an unprecedented scale.

Permalink to story:

 
Surely, any website that wants to protect its content, Netflix, Disney+, AppleTV, they all force you to login with an account right? So they aren't "open web"?

Even YouTube, you have to login to see stuff that's marked as more adult content or "not kid friendly" content.

Soo... if you don't want your content crawled, why not lock it behind a login of some kind? What am I missing here?

Let's say you're walking down the public high street, female model's in windows for clothing brands, Warhammer 40k giant figures at the Games Workshop, Shoes all over the walls in the shoe shops. If I took photos of any of these publicly accessible places, am I breaking copyright laws in some way?

If these shops weren't showing stuff in their front windows, and doors were closed, and the only way to get in was a pin, enter your email address, okay here's the pin to get in (as an example), then once I was inside, and then started taking photo's, then I'd get it, I'm actively exploiting something they don't want the world to see.

I absolutely might not have read this correctly though and I've got the wrong end of the stick.
 
Surely, any website that wants to protect its content, Netflix, Disney+, AppleTV, they all force you to login with an account right? So they aren't "open web"?

Even YouTube, you have to login to see stuff that's marked as more adult content or "not kid friendly" content.

Soo... if you don't want your content crawled, why not lock it behind a login of some kind? What am I missing here?

Let's say you're walking down the public high street, female model's in windows for clothing brands, Warhammer 40k giant figures at the Games Workshop, Shoes all over the walls in the shoe shops. If I took photos of any of these publicly accessible places, am I breaking copyright laws in some way?
If you start selling those photos you are…
If these shops weren't showing stuff in their front windows, and doors were closed, and the only way to get in was a pin, enter your email address, okay here's the pin to get in (as an example), then once I was inside, and then started taking photo's, then I'd get it, I'm actively exploiting something they don't want the world to see.
And now they get less visits…
I absolutely might not have read this correctly though and I've got the wrong end of the stick.
 
I am glad you pointed out in the article his claims are complete bullshit . Surprised even he would stoop to such obvious bare face lies.
While, legally, they are certainly lies… in reality, they are the truth.

MS, OpenAI and every other AI company have been, currently are, and will always use the internet for training and there is virtually nothing law enforcement can or will be able to do about it. At most, they will get a fine - which will represent a tiny percentage of their ENORMOUS profits - and will proceed to train their AIs as they please.
 
Going by his way of thinking I guess if a copy of Windows 11 or Microsoft Office was in the wild, I guess we could download it, install it, and use it for free? I could use the same excuse and say if I am using it for training purposes I shouldn't have to pay for it. I guess now we can go to Microsoft and let them know we are using their programs and not paying for them because your CEO of AI said we are allowed
 
Mustafa Suleyman is obviously wrong. The articles on the "open web" are absolutely copyrighted. Plagiarizing them is illegal. However, the question for AI should not be if the "open web" is copyrighted. The question is if an AI reading in copyrighted martial to learn from is ok. I don't think there is any law saying it's not. A good AI never exactly reproduces the original material. It doesn't store the original material so it can't cut and paste it back into a new article. It's no different from a person reading an article and learning something form it and then telling a friend about it in their own words. That is not a copyright violation to do that. But the government could change copyright law to make it illegal to train an AI without paying. I don't think there is any law saying that is required right now.
 
While, legally, they are certainly lies… in reality, they are the truth.

MS, OpenAI and every other AI company have been, currently are, and will always use the internet for training and there is virtually nothing law enforcement can or will be able to do about it. At most, they will get a fine - which will represent a tiny percentage of their ENORMOUS profits - and will proceed to train their AIs as they please.

true
 
All M$ products are now free to use. Pirate them I say. If they don't want to pay for their "data" then we don't have to pay for theirs. I think it is that simple.
 
Back