It’s nothing like photocopying a book. It is very, very similar to the analogy given above, of someone learning the information and profiting from it. For the AI model to “learn” the information during training, it takes apart the information one piece of a word at a time, and reorganises it for quick access. Information is categorised by metadata like topic, source, date, etc; there are approximately 1536 “tags”, so to speak, which OpenAI’s ChatGPT uses for categorising what it learns.
Copyright of words has the order of those words as an integral part of the legal standard, and the standards for what infringes are actually pretty strict (https://fairuse.stanford.edu/2003/09/09/copyright_protection_for_short/). Training an AI is definitively transformative work which does not retain the order of the words in the finished product, merely a weighted likelihood of what word fragment will come next in a given context, so it’s protected under Fair Use.
I don’t think it’s that simple. Like I said it’s a paradigm shift. It doesn’t fit into existing laws well. My point is what we consider fair use now, summarizing a book or movie by a human, is based on the limited abilities of humans. When you have AI with limitless abilities, that will change things. The same rules abs considerations may have to be rethought.
Au contraire, it is that simple and it is covered by existing law just fine in the very specific case we’re talking about, which is whether training a model is “transformative work” by the definition in IP law. It is. The law looks very specifically at the fact of the case, not hand-waving masquerading as an argument.
You are making this technology out to be something it isn’t; there’s no mystery to how AI works, and it does not have “limitless abilities”. In fact, it is very limited, but that isn’t relevant. What the law considers “fair use” isn’t based on human ability at all, it’s based on how completely the work is reproduced and the context the original work is being used in. You clearly have access to the internet, you can verify the standards required to show breach of copyright yourself if you don’t believe me.
It’s nothing like photocopying a book. It is very, very similar to the analogy given above, of someone learning the information and profiting from it. For the AI model to “learn” the information during training, it takes apart the information one piece of a word at a time, and reorganises it for quick access. Information is categorised by metadata like topic, source, date, etc; there are approximately 1536 “tags”, so to speak, which OpenAI’s ChatGPT uses for categorising what it learns.
Copyright of words has the order of those words as an integral part of the legal standard, and the standards for what infringes are actually pretty strict (https://fairuse.stanford.edu/2003/09/09/copyright_protection_for_short/). Training an AI is definitively transformative work which does not retain the order of the words in the finished product, merely a weighted likelihood of what word fragment will come next in a given context, so it’s protected under Fair Use.
I don’t think it’s that simple. Like I said it’s a paradigm shift. It doesn’t fit into existing laws well. My point is what we consider fair use now, summarizing a book or movie by a human, is based on the limited abilities of humans. When you have AI with limitless abilities, that will change things. The same rules abs considerations may have to be rethought.
Au contraire, it is that simple and it is covered by existing law just fine in the very specific case we’re talking about, which is whether training a model is “transformative work” by the definition in IP law. It is. The law looks very specifically at the fact of the case, not hand-waving masquerading as an argument.
You are making this technology out to be something it isn’t; there’s no mystery to how AI works, and it does not have “limitless abilities”. In fact, it is very limited, but that isn’t relevant. What the law considers “fair use” isn’t based on human ability at all, it’s based on how completely the work is reproduced and the context the original work is being used in. You clearly have access to the internet, you can verify the standards required to show breach of copyright yourself if you don’t believe me.