Navigating copyright in the age of AI

 

Ms Victoria Caplan
Head of Research & Learning Support,
HKUST Library,
HKUST

Mr Christopher Chan
University Librarian,
HKBU

SUMMARY KEYWORDS
Copyright, AI, generative AI, computer-generated works, text and data mining, intellectual property, Hong Kong ordinance, legal advice, public consultation, AI guidelines, scholarly publishers, citation styles, AI licencing agreement tracker, visual artists, copyright protection.

 

Christopher Chan 00:07
Welcome to this conversation around navigating copyright in the age of AI. My name is Chris Chan, and I'm the university librarian here at HKBU, and it's a real thrill to be able to talk about this. I mean, copyright is something that librarians get asked about all the time. So, I hope we can provide some insights here and to explore this topic. I'm very pleased to be joined by my colleague Victoria Caplan from the University of Science and Technology, Victoria, would you like to introduce yourself?

Victoria Caplan 00:43
Yes, my name is Victoria Caplan, and I'm the head of research and learning support at HKUST, where I've worked for over 30 years, and I've been "Miss Copyright" in the library there since 2002 something like that, a long time.

Christopher Chan 01:01
So, you're the perfect person, really, to have this conversation.

Victoria Caplan 01:04
I don't know about that, but let's just see how we move forward. Okay!

Christopher Chan 01:09
Now, before we dive into things, I did want to note that we are not lawyers, and this podcast is not legal advice. So that's just to protect ourselves.

Victoria Caplan 01:17
Yes, this does not constitute legal advice.

Christopher Chan 01:19
Exactly. So, first of all, I thought we'd start by just exploring, really, the foundations of “What is copyright”. Let's not make assumptions about our audience or even you know ourselves here.

Victoria Caplan 01:32
So, copyright is a monopoly ownership of the concrete expression of somebody's mind. So, you can copyright a picture, you can copyright any form of writing, broadcasts, all those sorts of things, but it requires concrete expression. You cannot, you cannot copyright an idea. So, when I teach, one of my usual examples is I have a photo I took at Chinese New Year many years ago, and it has some flowers and one of those pomelos, and it's a very colourful one. And it's, I called it Chinese New Year in Hong Kong. And I explained that I cannot copyright the concept of Chinese New Year in Hong Kong, but I own that photo. So, and then, because it's a right, authors creators have the right to their own work, and they can sell it, they can rent it, which is a form of licencing, or they can give it away. And so, everybody who writes, takes a picture, does any form of recording. They are all creators. So many of our listeners might not think of themselves as information or knowledge creators, but they are.

Christopher Chan 02:57
One additional point there is that that copyright, you don't need to register it. That's something that people often misunderstand: As soon as you take that photo or write that script or whatever you own, the copyright.

Victoria Caplan 03:10
Absolutely. And the other thing to also note is that copyright law follows jurisdiction, so therefore the copyright ordinance and copyright law in Hong Kong governs things in Hong Kong, and that's different from, say, Macau or Mainland China or Australia or Germany or any other jurisdiction on the planet. However, there are certain commonalities done by the Berne Convention. But let's not go a little bit too deep, so we'll back away from that. Okay, so hopefully that gives people an idea that about copyright.

Christopher Chan 03:48
Great. Thanks Victoria. I think that gives us a brilliant foundation to get more into the topical area of copyright and computer-generated works, because obviously this is where we're seeing more and more questions around copyrights. So, do we want to talk about the situation in Hong Kong and how this is being sort of considered, bearing in mind that it's still very early days?

Victoria Caplan 04:20
Oh, absolutely, it is early days. Now in Hong Kong, the ordinance does allow the copyright of non-human authored works. It allows the copyright of computer-generated works. And that's been in the ordinance for a while. Whereas in for example, the United States, only humans are allowed to copyright something. So, the famous selfie made by the monkeys. Yes, the photographer, nobody has copyright to that, but we're in Hong Kong, and therefore the rules of Hong Kong matter.

Christopher Chan 04:59
Can you just talk about, Victoria, because some listeners may not be familiar with the monkey selfie copyright dispute, and so if you're not, it was a really interesting case from, I think, almost 15 years ago now, where a wildlife photographer set up a camera in the jungles of, I think, Indonesia, and some monkeys actually came up. He had set it up and, on a tripod, and they pressed the button and took photos of themselves. It was a selfie of a monkey, and the photographer was licencing this, was selling it, but he was sort of sued by PETA, I think, the organisation for ethical treatment of animals, who said, “Hey, under US copyright law, there needs to be some human authorship for copyrighted works.” And the courts did agree that he could not own the copyright in those photos because there was no human authorship involved. Now is a little bit more complicated than that, but it's a really great illustration of the differences right in the law, because, as Tori said in Hong Kong, it is in the ordinance. And I think I've written down the exact wording here, the author for copyright for computer generated works, if you have made the arrangements necessary for the creation of the work, then you can be recognised as the author.

Victoria Caplan 06:29
Yes, and so that then leads us to where last summer, the intellectual property department and I think, the Commerce and Economic Development Bureau held a public consultation on the copyright and generative AI, so that really raised the issue higher. And I was a member of a group of librarians who were trying to figure out how to respond to this, and before that, I hadn't been aware of that section of the ordinance, because the copyright ordinance has many, many, many sections, but the owner is the person who made the necessary arrangements and who is that, and that is very unclear, because at this point in Hong Kong, nothing pertaining to AI-generated content has made it to the courts, or at least as far as I know. And so, I mean, I could imagine that let's say somebody working at a university that has a subscription to Midjourney, creates an image that goes viral, and they want to sell it on T shirts. And so, who made the necessary arrangements? The person who prompted mid journey, the university that paid for the subscription to mid journey. Midjourney itself, I don't know. Thankfully, I'm not a lawyer, and this is completely made up idea. But I mean, just when dealing with law, one often tries to imagine things. It's kind of like being a novelist or short story writer, I suppose so at this point, who is the person who made the necessary arrangements? Isn't quite sure. But the other thing that was the real thrust of the queries was also about using copyrighted material to train AI and to do something which is called text and data mining, or as we call it in the library land, TDM. And so because many different database providers, for example, Nexus, they're a very famous database provider, they if you want to do text and data mining with their content, they'll licence you to do that. And it's not cheap, okay? And right now, again, in the United States, which is separate jurisdiction, the New York Times and some other big owners of news and other material are suing OpenAI for having text and data mine their material to train their chat, G, P, T and other things. So just last month, in February, the HKSAR government, our government released its response to the public consultation, saying that they intend to amend the copyright ordinance to allow text and data mining in some circumstances without necessarily the people running the large learning model having to seek permission. I mean, it will, of course, be setting limits. But what also, I found quite interesting was that it won't just be for nonprofit or educational use. It could also be for commercial use.

Christopher Chan 10:08
And the government, in that paper do say specifically that one of their goals is to kind of help the industry as well as research. So yeah, that's kind of late breaking news that the government does support this TDM exception, and I think we are following many other jurisdictions in adding this. I think the UK, Singapore already has this type of exemption.

Victoria Caplan 10:39
Yes, so more will be revealed as time goes on. So great.

Christopher Chan 10:45
Up to this point, we've kind of looked from the perspective of someone wanting to create generative AI tool and to use copyrighted materials as training data. But what advice would we give to anyone really who wants to incorporate such AI-generated materials into their own work.

Victoria Caplan 11:08
Well, I think that first of all, we have to think of them as a tool and as one of many different tools, and just as in any endeavour, to then use the right tool for the right purpose. There's the old saying of, if you have a hammer, everything looks like a nail. So if you have a Gen AI tool, every you know you're running around looking for things to use it with. So therefore, I think that it's important to think when you can use it when you cannot. And if, for example, you're using a Gen AI tool to start brainstorming something, well, that's good. But then after you've done that, you know, take a look at it, and then maybe turn off all your devices and go for a walk and think and try to play that way. So I think that we want to make sure that we're using the right tools for the right purpose. And on my on my library's website, we have a lot of different guides that compare different tools for different purposes, and so that also could be useful to think about when you want to use, for example, a tool that is very good for exploring a topic like perplexity, except for sometimes perplexity will generate imaginary...

Christopher Chan 12:42
Or hallucinated?

Victoria Caplan 12:43
Well, the thing is, hallucination implies the mind behind it. Just it false. It generates wrong things, misinformation, it generates citations that do not exist. Okay, sometimes, so in that case, then, in addition to Perplexity, maybe you might want to use one of the research assistants, which are available in a lot of the different Hong Kong library catalogues and search tools that they are running their Gen AI across the content that the library already subscribes to and owns, articles and books. And so then you'll know that, okay, these five articles or books that are listed are from our collections, for better or for worse, but, and then still again, remind ourselves that, okay, these might be good, but maybe this excellent article. Well, gosh, it was very relevant, but it was published 17 years ago and so I think always think.

Christopher Chan 13:45
Yeah, that's a really good point. We still need to think critical thinking. Oh, absolutely. We can't just rely on these tools to do the thinking for us, right?

Victoria Caplan 13:52
So that's one thing. The other is especially for, well, okay, I have to confess, I'm really, really terrible at using things like mid journey and all those different image generators. It's scary. What I create is absolutely terrifying. So instead, what I tend to do is I'll go to Wikimedia Commons, or I'll go to Flickr, and I will, I'll have an idea in my mind, and then I will search and find things that look like they are useful, and then I'll use those and cite them properly. And so then I don't have to think about how to make a cute bunny. I can find somebody else's cute bunny that they've uploaded to Wikimedia Commons. And that's right, that's lovely. And so, so all open educational resources. Let's say that you are looking for a textbook, then that might also be useful. Oh, you want some poetry, well, then maybe go to some public domain poetry. So there's so much out there that was created by. Humans that, of course, the GenAI tools are wonderful, but we shouldn't, in our excitement to use these shiny new toys, forget that we still have a lot of, I mean, think about Woody and Buzz Lightyear like and there's room for both Woody and Buzz Lightyear, absolutely okay. And then the other thing is also to remember that whatever you create, with or without generative AI is still subject to the intellectual property policies of your own institution, in addition to law, or if you are a student and after you graduate, if you're creating things in the course of your work, it's also important to understand, what is your relationship with your employer in if you were, say, a photographer or a composer, so, um, so in many ways, I don't want to say that GenAI, is just kind of restating the same questions that we've often had to deal with.

Christopher Chan 16:04
Yeah, I think that's a really important thing to bear in mind, that there are alternatives, of course, to using generative AI content. But I think given the way things are going, we'll definitely be using generative AI content more and more. So we certainly want to be using that content in a really responsible way, especially in our research and when students do their assignments. So I thought it would be a good idea to highlight some of the policies of the major scholarly publishers, of course, journal publishers, book publishers have been reacting to the availability of these tools, and now, of course, there are. There's a lot of variety, and you need to check with your specific publisher. But one that I wanted to highlight is Wiley, which is a major journal publisher. They've just updated their AI guidelines, and they make a really interesting distinction. So they actually have a table on their website where they say, you know, for example, in the area of drafting and editing, well, if you use a generative AI tool to actually draft entire sections of your manuscript, you must disclose that, that's pretty obvious, but if you just use it for spell checking and grammar, then it's not needed, which, again, it seems like common sense, but for some of them, even some of the guidance that publishers are giving, there's still some work to do. So, for example, for literature synthesis and analysis, they say things like, okay, if you use it for synthesising and analysing your literature review, you need to disclose that use. But for and I quote, "basic literature searches" it's not needed to disclose that, but I don't know how you would define a basic literature search. So it's still not entirely clear, but I think it is still useful guidance to kind of you know how, how much you're using AI in your work. Do you have any anything to share?

Victoria Caplan 18:42
Well, one thing that I'd like to share, to remind ourselves, is that, yes, indeed, the different publishers and different scholarly societies have their own standards, so that in addition to declarations, the American Psychological Association, APA, the IEEE, the International electrical engineering. I forget what the third E is for. Anyway, IEEE, ACS, American Chemical Society. They all have their own citation styles for which they're well known, and the way that they state that you need to cite when you're using generative AI is different from each other. And so that is just one more thing. And then the other thing too, is they also will differ between how you cite an AI generated text and citing age AI generated image. So that's also librarians have to keep up with what's going on with there as well and then share that with the students and teachers.

Christopher Chan 19:37
So, just as we've done for decades, we can help you with your citation questions.

Victoria Caplan 19:43
Absolutely, it just goes on and on. But then also so that it's interesting too, that you mentioned that Wiley has a table, and it's very explicit, whereas, for example, Cambridge is more general. But one thing that they all seem, Cambridge University Press… One thing that they all seem to though state very clearly, is that a certain degree of use must be acknowledged and also emphasising that the author is responsible for whatever they're presenting, that they're the final word on it. And so therefore, I think that it's really very important for people who are using these tools, especially if they hope to publish, to be very clear that they know what they're doing. Because it's hard enough when you're writing just to remember something that you read and what page it was on, and keeping track of it that way, but then needing to also keep track of like, five different prompts that you've done in the last seven minutes and which one you actually used exactly. It can be very, very complicated and tricky. So, I think that aware and it will continue to evolve. Other things that are interesting is that some people who have published they then later find out that their publisher is entering into agreements with generative AI companies to use the material that they previously published to train AI. And some people are concerned about that, because if they published with a particular publisher a couple of years ago when Gen AI really wasn't a thing, they might not have planned to have it be used to train so ITHAKA a scholarly publishing and other important group, they now have a generative AI licencing agreement tracker, if you're interested in publishing, you can see, as an academic publisher, what's Taylor and Francis doing versus what's Wiley doing and so forth. So that's quite interesting. And then that also leads us into the issue of protecting your own work. Yes, okay, so I was just talking about the AI licencing agreement tracker, and that leads us to the idea of, what if we don't want our own work to be used to train an AI model? And so especially, many visual artists are very concerned about this, because these image generating tools are so good now that they can, they can say, create me an image in the style of so and so, and it'll be very close to that original artist and destroying their livelihoods. So, the University of Chicago has created a software called glaze, G-L-A-Z-E, like the covering of like the glaze for ceramics, that will repel the different crawlers of the web, or web scrapers that would try to do that. So that's been deployed. And then also if, for example, you have a blog, or you like to post your things, there's different robot txt things that you can also deploy on your website to also try to keep away the scrapers and the machines. So, it's very important to balance because on the one hand, you want people to use your material. You want to be open. But on the other hand, we, especially visual artists, don't want to get into a situation. You don't want to be exploited, right? And that, getting back to copyright, that a creator has the right to their own work, and it shouldn't be stolen.

Christopher Chan 24:00
Absolutely. Well, thanks so much Victoria. This has been great. I hope it's been useful for our listeners as well. In the show notes, there should be links to all of the various guides and things that we mentioned. So, I hope listeners find that useful too. And if you have one common question is, well, how can I educate my students about this? So, I'd like to say you should ask your librarian for a workshop. I have myself delivered quite a few copyright workshops over the years, and we're happy to cover the basics of all we've talked about here to really equip our students with the evolving copyright landscape, especially in the context of these generative AI tools. So, with that, thank you very much, and thank you for listening.

Transcribed by https://otter.ai