The difference is that when the robot reads that book, it maintains a verbatim copy of that book as part of it’s training material indefinitely and can reference and re-reference that material infinitely. That is not how it works when a human reads a book.
However, that is how it works when a human memorizes a copyrighted work. If I memorize a poem, I may then reference it from my memory without further need for the original text before me. If I am an actor and learn my lines for a play, I commit them to my memory.
Which is not an infringement.
The infringement happens if the human performs or publishes that work; e.g. reciting that copyrighted poem or play from memory before an audience; writing that work down from memory and publishing it; etc., without a copyright license for that performance or republication.
I suggest merely applying the same standard: infringement doesn’t happen when a work is read, indexed, scanned, etc.; it does happen if that work is then recited.
For instance, ChatGPT currently knows the text of the Harry Potter novels, but it does not recite them when asked to do so. (Try it! It will answer questions about the text, but it will freeze up if asked to recite it; evidently because it has a filter against reciting copyrighted material.)
No, the reason ChatGPT can’t recite the text of Harry Potter verbatim is because it doesn’t actually “contain” it. It learned from it, but it doesn’t “remember” it word-for-word. There is no filter against reciting copyrighted material. Try asking it to recite a scene from a Shakespearean play, for example - that’s out of copyright and ChatGPT was almost certainly trained on it. It may be able to quote some famous lines because it’s been overfit like crazy on them (“To be or not to be” is probably everywhere on the Internet) but that’s not a verbatim chunk.
I’ve actually experimented with this myself on my local machine, I took one of the smaller open-source models and I gave it additional training using 20 megabytes of My Little Pony fanfiction. The AI knew a lot about the fanfic afterward but it was clearly just picking up tidbits of general knowledge rather than “remembering” the whole thing.
ChatGPT currently knows the text of the Harry Potter novels, but it does not recite them when asked to do so.
I tried that several weeks ago while discussing some details of the Harry Potter world with ChatGPT, and it was able to directly quote several passages to me to support its points (we were talking about house elf magic and I asked it to quote a paragraph). I checked against a dead-tree copy of the book and it had exactly reproduced the paragraph as published.
This may have changed with their updates since then, and it may not be able to quote passages reliably, but it is (or was) able to do so on a couple of occasions.
However, that is how it works when a human memorizes a copyrighted work. If I memorize a poem, I may then reference it from my memory without further need for the original text before me. If I am an actor and learn my lines for a play, I commit them to my memory.
Which is not an infringement.
The infringement happens if the human performs or publishes that work; e.g. reciting that copyrighted poem or play from memory before an audience; writing that work down from memory and publishing it; etc., without a copyright license for that performance or republication.
I suggest merely applying the same standard: infringement doesn’t happen when a work is read, indexed, scanned, etc.; it does happen if that work is then recited.
For instance, ChatGPT currently knows the text of the Harry Potter novels, but it does not recite them when asked to do so. (Try it! It will answer questions about the text, but it will freeze up if asked to recite it; evidently because it has a filter against reciting copyrighted material.)
No, the reason ChatGPT can’t recite the text of Harry Potter verbatim is because it doesn’t actually “contain” it. It learned from it, but it doesn’t “remember” it word-for-word. There is no filter against reciting copyrighted material. Try asking it to recite a scene from a Shakespearean play, for example - that’s out of copyright and ChatGPT was almost certainly trained on it. It may be able to quote some famous lines because it’s been overfit like crazy on them (“To be or not to be” is probably everywhere on the Internet) but that’s not a verbatim chunk.
I’ve actually experimented with this myself on my local machine, I took one of the smaller open-source models and I gave it additional training using 20 megabytes of My Little Pony fanfiction. The AI knew a lot about the fanfic afterward but it was clearly just picking up tidbits of general knowledge rather than “remembering” the whole thing.
I tried that several weeks ago while discussing some details of the Harry Potter world with ChatGPT, and it was able to directly quote several passages to me to support its points (we were talking about house elf magic and I asked it to quote a paragraph). I checked against a dead-tree copy of the book and it had exactly reproduced the paragraph as published.
This may have changed with their updates since then, and it may not be able to quote passages reliably, but it is (or was) able to do so on a couple of occasions.
deleted by creator