Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's unfortunate we are seeing all of these data platforms get locked off, because this is not going to affect AI development from big companies, it's only going to affect the ability for individuals to run AI development of any form in their home.

I hope the data that has been found so far is going to big enough going forward, but it's incredibly unfortunate that this is happening.

I hope all the people making these decisions wake up with a bad headache and severe heartburn tomorrow.



IANAL, but I'm curious:

Suppose that deep-pocketed AI companies were paying Reddit, Stack Overflow, etc. to make it harder for other AI upstarts to access those data. I.e., to build a mote by denying competitors access to previously accessible data sets.

Would that violate antitrust laws in various major markets?


Nitpick, also because the contrast is kind of funny:

mote: a small particle, speck, atom, "mote of dust"

moat: a deep ditch, often filled with water, as a first line of defence around a castle.


Hopefully this comment won't be demoated by the algorithm - it truly holds water on its own!


You are only helping it with puns and we know that puns are the gateway to consciousness! They are like the corpus callosum of language, serving as the bridge that spans the moat between the castles of wit and creativity.


Given that this seems to happen all the time without antitrust issues it probably wouldn't, even though I feel like it should.

What we need is a legal way for companies to keep the data open, but also require OpenAI and friends to pay them for it.


> What we need is a legal way for companies to keep the data open, but also require OpenAI and friends to pay them for it.

Couldn't that be accomplished by a law or ruling that using something for training AI doesn't exempt you from having to follow its license? OpenAI is already in blatant violation of both the "BY" and "SA" parts of the existing license.


Arguably, a model created by training on a corpus of data is a derived work of that corpus.

Let's say I take a collection of images and use a program to compress them. When decompressed, the images are close to, but not exactly the same as the originals. Despite being in a different format, and despite not being exactly the same as the originals, the copyright to the compressed images is still held by whoever previously held it.

If I take the collection of images from earlier and train a diffusion model based on it, I'm essentially just compressing it a different way. With the right prompt, you can get out something very similar to what you put in.


By this logic, isn't remembering something in your brain also a derived work? But that would not make any sense to protect until you create and distribute something based on that memory. The same logic should be applied to this.


If you remember it from your brain and perform it live, that’s perfectly fine. So there’s a line to be drawn somewhere and I don’t think it’s super clear cut in most cases.


No, according to copyright if you remember something from your brain and perform it live you are very much in violation.

If you remember it and make something that is a distinct work, something that may be steals the idea without reproducing any of its elements, that's never been considered under copyright.

I think that's going to be the litmus test for these AI. If you can get them to produce out both that is this things from anything else, it's not going to be a copyright violation because it's not a copy of anything.


> If I take the collection of images from earlier and train a diffusion model based on it, I'm essentially just compressing it a different way

Not really. If diffusion models were compression they'd be so lossy as to be totally worthless


> What we need is a legal way for companies to keep the data open, but also require OpenAI and friends to pay them for it.

inherently not possible as then it would not be "open" to begin with.


Open except you have to pay if it gets big enough seems perfectly reasonable to me.

I understand the idea, it's not truly open in that case, but so long as the ability to build new things on it and prosper from it is preserved im alright.

The key is that it's not doing something like trying to restrict you from using it in a certain way, only requiring you give a fair share of profits.

This was, fun fact, the original purpose of patents. They weren't designed to keep things closed and owned by individuals, they were designed to allow people to freely share and make a profit so that ideas could be built on by each other. The patent system is turned into this corrupted terrible mess where things are almost never shared through licensing or payment, and it's just a way to build monopolistic enterprises nowadays.

An open source system that allows for this sort of payment would also allow for many many more things to be open when currently the bad actors who will build and take that work and just never pay you back for it.


There are ways to require payment for some uses of things that are legitimately open. As an example, consider the practice of selling exceptions to the GPL, as is done for Qt.


> As an example, consider the practice of selling exceptions to the GPL, as is done for Qt.

There are people who do not consider that "open".

See the whole debacle about what exactly constitutes "open source"


> There are people who do not consider that "open".

Who? Even Richard Stallman is okay with what they do: https://www.fsf.org/blogs/rms/selling-exceptions


Richard Stallman isn't necessarily the sole authority on such things. Consider creative common vs. AGPL. Is CockroachDB "open"? etc.

In any case the software world has changed drastically since that article has been published.


Those all seem black and white. Creative Commons' NC and ND licenses are not open, but the rest are. The AGPL is open. CockroachDB is not.


you are not understanding. why is it not open? Who is the authority of "open". Why is CockroachDB not open? I can see their source on GitHub.

"open" is not like 1+1=2. ultimately it is arbitrary. one definition of open is "to make available", by that definition all of them, including CockroachDB are "open".

open does not necessarily mean you can use it, just like how an open door does not necessarily mean you can enter the house.

in any case, we can agree to disagree.


You're confusing "open" with "visible".


I literally quoted a definition from Webster for "open". let's just stop this pedantry. I'm going to go back to "Open"AI.


"Open" is a pretty vague word which could mean all sorts of things.

"Open source" is defined by the Open Source definition according to the OSI [1]. In saying that, I realize that every couple of years somebody tries to claim that their understanding of the term "open source" should trump the one the community has settled on. I personally am not ready to acquiesce to this semantic drift, at least, not yet.

[1] - https://opensource.org/osd/


It's not even real semantic drift. It's basically the astroturfing version of it. The people trying to change the meaning are doing so because they want to capitalize on its good name without meeting the true definition.


Does this prevent any external contributions in GPL?


It means external contributors need to agree to a CLA for their changes to be incorporated into upstream Qt.


> It's unfortunate we are seeing all of these data platforms get locked off

Are there any AGPL-like licenses that address this?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: