Hacker Newsnew | past | comments | ask | show | jobs | submit | stalluri's commentslogin

Always wondered FT and Census might come into the each other's territory. Good to see both are merging forces together now!


Models absorbed the pirated content. Now Meta is distributing those models. Is that considered distribution?


For that argument I believe the question becomes "is the output of a model considered a derivative work of the training data?"

https://www.copyright.gov/circs/circ14.pdf


What else could it be?


An original composition based on a statistical analysis of the training data. Statistical data about a copyrighted work obviously isn't necessarily a derivative of that work. Otherwise Tolkien could sue me for telling you how many times The Lord of the Rings uses the word "the".


Can it reproduce training data? Then its not analysis but compression, lossy compression.


For most LLMs, with most works, no.

If you trained an LLM repeatedly on nothing but the text of LOTR until it could re-produce the books verbatim and then tried to sell copies of that LLM, then I agree that would be blatent copyright infringement, yes.


The industry is banking on Author's Guild v. Google to be precedent in such a way that it's functionally transformative enough to be a completely new work.

https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,....

I think they have about a coin flip of a chance that it passes muster in the courts.


I don't know what the legal answer will be, but I believe it should be considered distribution. A model is basically a highly lossy and extremely compressed copy of its training data, available as a content-addressable database. To anthropomorphize, the model is trying to perfectly replicate its training set, its brain just isn't big enough to do so.


It really should be.


Of course not.

I listened to other people's music and learned some of their songs before writing my own music, that doesn't mean my songs are distribution of theirs.

I read other people's books and short stores and news articles before writing my own, that doesn't mean my writing is distribution of theirs.


How about if I play your song at just the right speed with just the right EQ and I can get an exact reproduction of some of the songs you claim to have written? Because we can get large excerpts of exact copies of short and long form content as demonstrated clearly by the New York Times research on chatbots and their own content.


Vstream looks super cool. Can we also use it create subscriptions that can bind with ReactHooks on the front-end ? I think PlanetScale can easily deliver amazing or better than firebase subscriptions. All we need is React and NextJs SDKs to get started with :-)


Supabase does real-time subscriptions really well!

And it does have great guides for use with React and Next.js


This would be such a nice addition to apps deployed on fly.io


Author here. Funny you mentioned that, we worked with the Fly.io team to get this guide created for just that: https://fly.io/docs/app-guides/planetscale/

(I swear we didn't plan this response.)


Very cool; will try it out. Thanks!



I think PlanetScale folks have blogged well about how schema migration tools work and their traffic splitting, rewinding abilities are very nice.

https://docs.planetscale.com/learn/how-online-schema-change-...

https://planetscale.com/blog/its-fine-rewind-revert-a-migrat...


Intelligently understanding the architecture and appropriately asserting prometheus metrics using a knowledge graph.



Simple and easy interface


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: