OpenAI will soon ask to use your content as training data

OpenAI will soon ask to use your content as training data | TechCrunch Minute

Hi, this is Wayne again with a topic “OpenAI will soon ask to use your content as training data | TechCrunch Minute”.
Open AI is building a tool, that’s going to allow creators to opt out of being training data, but it’s not clear at this juncture exactly how that’s going to work so meet media manager. The new tool that open AI hopes to have in place by 2025. That will allow creators and content owners to identify their Works to the AI Giant and specify how they want them to be included or excluded from both AI research and AI training. As open AI wrote in a company blog post quote, this require cuttingedge machine learning. Research to build a first ever tool of its kind to help us identify copyrighted text, images, audio and video across multiple sources and reflect Creator preferences.

So media manager is it kind of seems, a response to the growing criticism of open ai’s method of training, its AI models, which largely involves scraping the entire internet of any available data that it can. That can be articles, social media sites, blogs, you name it, but so far, if someone somewhere hits the post button open, Ai and its peers can use it. That doesn’t necessarily mean that they should be able to. But how things have gone? That’S been the setup. It is worth noting, though, that the EU has been able to opt out of their information being used as training data. So there is some precedent for this setup just here in the US. We don’t have it all right, let’s back up a little bit and provide some context. Generative AI models, like those used by open AI, are trained on a simply enormous amount of data.

OpenAI will soon ask to use your content as training data | TechCrunch Minute

They need a ton of examples in order to generate high quality text and images and video and roughly the more data they have, the smarter they are and then therefore, the better the result. So more data is more good. Luckily, for AI companies there is a ton of information out there on the internet. Today. Now, it’s not quite so great for many media Brands and creators who weren’t exactly asked if their information could be scraped and then used to help, make models to date, and so some companies have gotten a little bit mad about this. Recently, eight us newspapers, including the Chicago Tribune, have sued open AI for IP infringement.

OpenAI will soon ask to use your content as training data | TechCrunch Minute

This is kind of similar, but a little bit different to the suit we’ve seen from The New York Times that it also brought against open AI, clearly Legacy. Media brands are not cool with some upstart tech company showing up taking their stuff and then charging other people money for what they have built, often without even crediting the source. And while AI companies in including openai have argued that this is all under the umbrella of fair use, openai has said that it would be impossible to create useful AI models absent copyrighted material.

So the question is: where should the value flow and how should that value transfer happen? It does seem that AI companies are at least trying to appear more thoughtful about these issues today, which is welcome with the announcement of media manager and the recent pledges to counter risks of deep fix during an election year. Ai companies are trying to find ways to ensure that their impact is positive, or they might just be trying to avoid controversy, but in the world of corporate comms. Isn’T that the same thing either way A system that allows creators and media companies to opt out of the training process is the answer for those who are Savvy enough to do so? I’M excited about this.

OpenAI will soon ask to use your content as training data | TechCrunch Minute

It’S the right move but open AI. Where is my check? More tomorrow, .