It's unfortunate we are seeing all of these data platforms get locked off, because this is not going to affect AI development from big companies, it's only going to affect the ability for individuals to run AI development of any form in their home.
I hope the data that has been found so far is going to big enough going forward, but it's incredibly unfortunate that this is happening.
I hope all the people making these decisions wake up with a bad headache and severe heartburn tomorrow.
Suppose that deep-pocketed AI companies were paying Reddit, Stack Overflow, etc. to make it harder for other AI upstarts to access those data. I.e., to build a mote by denying competitors access to previously accessible data sets.
Would that violate antitrust laws in various major markets?
You are only helping it with puns and we know that puns are the gateway to consciousness! They are like the corpus callosum of language, serving as the bridge that spans the moat between the castles of wit and creativity.
> What we need is a legal way for companies to keep the data open, but also require OpenAI and friends to pay them for it.
Couldn't that be accomplished by a law or ruling that using something for training AI doesn't exempt you from having to follow its license? OpenAI is already in blatant violation of both the "BY" and "SA" parts of the existing license.
Arguably, a model created by training on a corpus of data is a derived work of that corpus.
Let's say I take a collection of images and use a program to compress them. When decompressed, the images are close to, but not exactly the same as the originals. Despite being in a different format, and despite not being exactly the same as the originals, the copyright to the compressed images is still held by whoever previously held it.
If I take the collection of images from earlier and train a diffusion model based on it, I'm essentially just compressing it a different way. With the right prompt, you can get out something very similar to what you put in.
By this logic, isn't remembering something in your brain also a derived work? But that would not make any sense to protect until you create and distribute something based on that memory. The same logic should be applied to this.
If you remember it from your brain and perform it live, that’s perfectly fine. So there’s a line to be drawn somewhere and I don’t think it’s super clear cut in most cases.
No, according to copyright if you remember something from your brain and perform it live you are very much in violation.
If you remember it and make something that is a distinct work, something that may be steals the idea without reproducing any of its elements, that's never been considered under copyright.
I think that's going to be the litmus test for these AI. If you can get them to produce out both that is this things from anything else, it's not going to be a copyright violation because it's not a copy of anything.
Open except you have to pay if it gets big enough seems perfectly reasonable to me.
I understand the idea, it's not truly open in that case, but so long as the ability to build new things on it and prosper from it is preserved im alright.
The key is that it's not doing something like trying to restrict you from using it in a certain way, only requiring you give a fair share of profits.
This was, fun fact, the original purpose of patents. They weren't designed to keep things closed and owned by individuals, they were designed to allow people to freely share and make a profit so that ideas could be built on by each other. The patent system is turned into this corrupted terrible mess where things are almost never shared through licensing or payment, and it's just a way to build monopolistic enterprises nowadays.
An open source system that allows for this sort of payment would also allow for many many more things to be open when currently the bad actors who will build and take that work and just never pay you back for it.
There are ways to require payment for some uses of things that are legitimately open. As an example, consider the practice of selling exceptions to the GPL, as is done for Qt.
you are not understanding. why is it not open? Who is the authority of "open". Why is CockroachDB not open? I can see their source on GitHub.
"open" is not like 1+1=2. ultimately it is arbitrary. one definition of open is "to make available", by that definition all of them, including CockroachDB are "open".
open does not necessarily mean you can use it, just like how an open door does not necessarily mean you can enter the house.
"Open" is a pretty vague word which could mean all sorts of things.
"Open source" is defined by the Open Source definition according to the OSI [1]. In saying that, I realize that every couple of years somebody tries to claim that their understanding of the term "open source" should trump the one the community has settled on. I personally am not ready to acquiesce to this semantic drift, at least, not yet.
It's not even real semantic drift. It's basically the astroturfing version of it. The people trying to change the meaning are doing so because they want to capitalize on its good name without meeting the true definition.
I hope the data that has been found so far is going to big enough going forward, but it's incredibly unfortunate that this is happening.
I hope all the people making these decisions wake up with a bad headache and severe heartburn tomorrow.