I’m glad I hadn’t committed to a topic for today’s Well, I Didn’t Know That! post in last week’s Sunday Summary. A topical piece of news has come up this week that I would like to discuss – the lawsuit multiple authors are filing against ChatGPT owner, OpenAI.
On Tuesday, a lawsuit was filed on behalf of high profile authors including Jodi Picoult, David Baldacci and George R.R. Martin by the Authors Guild. The filing claims that OpenAI’s use of the authors books in order to train its language model without permission has infringed the authors’ copyright. BBC News state that copies of the books are alleged to have been accessed through e-book repositories and fed into the language modelling technology.
On the other hand, OpenAI state that the use of data analysed falls within fair usage, and does not infringe authors rights.
Whilst OpenAI are not disclosing the exact works used to train its model, this New York Times article shares how the company has admitted to using copyrighted material in its large language model.
Based on comprehensive summaries ChatGPT can produce (including details about books and minor characters that are not available in the likes of online reviews), as well as the tool’s ability to mimic the writing style of the authors filing against OpenAI, the evidence doesn’t look good.
George R.R. Martin has proven to be a particular victim of the technology. CBS News report that programmers have used the technology to write its own versions the final two books in his A Song of Ice and Fire series, (which have since been posted online), as well as derive a prequel novel.
Taking a look at copyright laws, copyright include rights of the author pertaining to both reproduction of existing material, and more interestingly in relation to this lawsuit, protecting their rights concerning derivative works. If OpenAI’s large language model is using what it has learned from copyright material in order to produce derivative work, as it seems to be doing, then surely this is an infringement of copyright. The question here is really where the line is drawn with technology.
The other thing to consider here is plagiarism. If somebody (a human) were to steal an authors work and try to pass it off as their own, authors would definitely have a legal leg to stand on in a lawsuit. However, as the offender in this case is or involves a fairly new piece of technology, what are the rules? If authors decide to use the likes of ChatGPT in order to create or augment works they later publish, could they unwittingly find themselves in a copyright battle if the technology derives any of that content from other authors or sources in their end product?
As a new technology, there is very little legally in place currently concerning artificial intelligence and learning technology. These lines will be drawn over the course of time by lawsuits such as these. However, whilst we may not have a definitive line to draw right now, we can consider the position from a morality perspective. Is it right that an incredibly advanced piece of technology is using the written works of creators in order to learn and impersonate and or copy them? Personally, I don’t think so. The fact that companies like Amazon have restricted the ability to publish more than three Kindle books a day highlights how content created by AI could very easily flood the market due to ease of production.
What do you think about AI and its potential use of books in its model without author permission? Do you agree with the authors filing a law lawsuit against OpenAI, or is the technology something we should embrace?