Monday, December 04, 2023

Consultation on Copyright in the Age of Artificial General Intelligence: my response

This is my response to Industry Canada's Consultation on Copyright in the Age of Artificial General Intelligence. The deadline for responses has been extended to Jan. 15, 2024.

Technical Evidence

Question*: How do businesses and consumers use AI systems and AI-assisted and AI-generated content in your area of knowledge, work, or organization? 

 My response:

In libraries, machine learning AI in the form of recommender systems ranking results by relevance (like Netflix) is in widespread use. Generative AI is in earlier stages of exploration and/or implementation in libraries and information management as a means of further automating and enriching information resource description and classification. On the other hand, the tendency of popular AI tools such as ChatGPT to invent content is raising concerns about spread of mis- and disinformation, complicating the work of ensuring that the public has access to high quality, accurate information. In academia, AI is in early stages of use for the purposes of accelerating research. AI raises both interest and concern with respect to pedagogy. Noteworthy examples of emerging types of applications include language learning supports for students, brainstorming, and automated translation, noting that results to date are best considered as early drafts. 

 

Text and Data Mining

 

Questions:  

 

If the Government were to amend the Act to clarify the scope of permissible TDM activities, what should be its scope and safeguards? What would be the expected impact of such an exception on your industry and activities?

 

Should there be any obligations on AI developers to keep records of or disclose what copyright-protected content was used in the training of AI systems? 

 

My response:

 

TDM for discovery purposes should be legal across all kinds of materials (e.g. to find songs, films, novels, and stories of interest, not for AGI training). To facilitate the advances AI is making possible in scientific and non-commercial research, TDM for training AGI should be legal for these purposes (follow UK / Switzerland example). One recommended change in copyright law to facilitate AI advances in Canada is to eliminate Section 41 Technological Protection Measures and Rights Management Information from the Copyright Act. This section prohibits circumvention even for purposes that are legal under the Act while it is unnecessary for purposes that are illegal under the Act. AI developers should be required to track and disclose materials used for training purposes. Legislation to this effect at this time would encourage development of efficient automated processes at an early stage in AI development.


Authorship and ownership of works created by AI

  

Questions:

 

Is the uncertainty surrounding authorship or ownership of AI-assisted and AI-generated works and other subject matter impacting the development and adoption of AI technologies? If so, how?

 

Should the Government propose any clarification or modification of the copyright ownership and authorship regimes in light of AI-assisted or AI-generated works? If so, how? 

 

My response:


Rapid growth of AI-generated content demonstrates that concerns about authorship and ownership are not a significant impediment. A Google search for “Amazon ChatGPT self-publishing” retrieves over 21 million results, with how-to books and publishing services at the top of the list. ChatGPT can produce a story “in the style of” a human author such as Margaret Atwood in seconds. This rapid growth raises two types of concerns 1) for human creators whose works and identity can easily be used with AI training to create new works to compete with the original creator and 2) for increasing production and distribution of mis/disinformation when a tool like ChatGPT (described by AI experts as having a tendency to “hallucinate”) is used to create nonfiction works without the oversight of human experts. 

 

Comments and suggestions

 

My response: 

 

Achieving the potential benefits of AI requires TDM exemptions for scientific and non-commercial research following the UK / Switzerland example and elimination of Section 41 of the Copyright Act Technological Protection Measures and Rights Management Information. Most potential benefits of AI do not involve the use of others’ copyrighted material – for example, companies and individuals using AI to automate or build on their own work. Encouraging AI users to make use of the copyrighted work of human creators raises two concerns, 1) the possibility of training AI using the work and identity of a human creator to capitalize on their identity and compete with them in the marketplace, and 2) the possibility of increasing creation of mis/dis-information in the case of non-fiction works. Concern about AI identity misuse is broader than traditional copyrighted works, for example use of images of individuals in pornographic works without their knowledge or consent.

 

* The consultation includes more questions - I am only including the questions here that I chose to respond to. 

 

 

 

 

Friday, October 27, 2023

AI and copyright: submission to the U.S. Copyright Office Artificial Intelligence Study

This comment was submitted Oct. 26, 2023 to the U.S. Copyright Office's Artificial Intelligence Study

 Copyright laws internationally do not provide human creators with sufficient moral and related rights (identity and publicity) in the context of artificial intelligence. Significant work is needed at the national and international levels to meet what I would argue is a minimal ethical standard, and this should not be rushed. In the meantime, the remedy that I recommend is to limit copyright on works produced with substantial AI involvement to AI works trained on material in the public domain. For example, AI artists are creating new works based on freely available images from the Mars rover; such works do not raise the ethical questions that are the focus of this submission. To illustrate the problem when works are based on contemporary human creators, note that current AI tools such as Stable Diffusion (images) and ChatGPT allow anyone to create new works "in the style of" a particular creator, and it is clear that this is occuring without any attempt to obtain permission from the creator. With ChatGPT, anyone can quickly verify this by asking ChatGPT to write a story "in the style of" any well-known author. The works of Canadian artists that were not publicly available have been found in a service using Stable Diffusion, along with a tool to create new works "in the style of" these artists. If this is done with the intent of publishing the results, I argue that this is an example of identity theft or fraud, particularly (but not exclusively) if the downstream creation is published with the name of the original creator with commercial benefit to the downstream creator. There are potential reputational and economic harms to original creators. I argue that the potential harms to living human creators far outweigh any benefit from permitting AI to use their works and identity until legal protections can be put into place. As context for this comment, I would like to note that I am excited about the potential of AI to achieve more rapid advances in science, medicine, and everyday productivity in our workplaces and homes. I am an Associate Professor at the University of Ottawa's School of Information Studies, submitting as an individual, and long-time advocate for open access to scholarly publications, a topic on which I have contributed to prior U.S. government publications. Thank you for the opportunity to participate in this consultation.

Thursday, October 26, 2023

In the style of...AI, identity & reality

In brief

Artificial intelligence (AI) is already being used not only to draw from prior works to create new works, but also to create new works "in the style of" an existing creator. This should be raising questions about the rights of living creators as well as whether we want to live in a world where we can distinguish what is and is not real. These are urgent questions, from my perspective, in the context of the need to regulate AI with respect to creative works. While there are many reasons to rush to implement AI to help solve real-world problems (like developing vaccines to cope with new viruses, other medical and scientific advances, and using AI to increase productivity), I argue that there is no compelling reason to continue to permit people to use AI to usurp the identity of living creators, and plenty of reason to stop this practice. Even when works are out of copyright, there are potential dangers as well as benefits from allowing a proliferation of copies that are variations of original important works - and this practice may be counter to other current trends such as acknowledging traditional knowledge and countering the practice of cultural appropriation.

 Details

In La Presse, Péloquin (2022) reports finding a number of works by Canadian and Québec authors in the AI tool Stable Diffusion, works that were not publicly available, posted there without the knowledge of the authors. Users of Stable Diffusion can create new works "in the style of" living artists. Artists interviewed by Pélonquin expressed concern about the quality of these works (part of author moral rights in copyright) in addition to concern about the works being used for commercial sales.

As of today (Oct. 26, 2023) ChatGPT takes about a minute to write a short story in response to this prompt: "Please write a short story in the style of Margaret Atwood" (or Stephen King). Readers, I encourage you to try this experiment for yourself - but ignore and delete the results, as I have.

From my perspective, these examples of current practice raise questions with profound implications for society:

  1. Should human creators be able to claim exclusive rights to their identity and style? This goes beyond author moral rights as covered in existing copyright laws, and is a type of moral right with economic implications. I argue that we urgently need to protect human creators of all kinds in the race to create and implement AI regulation. 
  2. Is this "feature" of AI likely to contribute to a postmodern dystopia where it becomes increasingly difficult to distinguish what is real from from what is not, accurate information from misinformation? One example of such a postmodern theory is the hyper-reality of simulacra (copies without originals) described by Baudrillard (1995).  Is this necessarily desirable even for works that are not under copyright? With the aid of AI, it is not hard to imagine the world being so inundated with the latest works "in the style of Plato" it becomes harder to locate and verify the authenticity of the original. Is unleashing this aspect of AI compatible with contemporary trends to address traditional knowledge, including cultural expressions, and resistance against cultural appropriation? I don't have the answers, but I think these questions merit serious, thoughtful consideration and is reason to limit the "creativity" of AI in this respect. 

As an author, I do not wish to allow anyone to use AI to create works "in the style of Heather Morrison (or IJPE), particularly not for sharing for commercial or non-commercial reasons. IJPE is open access, free for anyone to read and share as is for noncommercial purposes, but otherwise All Rights Reserved.

Side note: apparently ChatGPT does have some limits (perhaps due to pressure from copyright owners?). In response to the prompt, "Please write a prequel to Margaret Atwood's A Handmaids' Tale", the ChatGPT response is "I'm sorry, but I can't provide verbatim excerpts from copyrighted texts or create derivative works based on copyrighted materials. However, I can offer you a brief summary or discussion of potential themes and ideas for a prequel to Margaret Atwood's "The Handmaid's Tale." Let me know how you'd like to proceed". 

References

Baudrillard, J. (1995). Simulacra and simulation. University of Michigan Press.

Pélonquin, T. (2022, October 10). L’art de copier sans payer. La Presse. https://www.lapresse.ca/actualites/2022-10-10/intelligence-artificielle/l-art-de-copier-sans-payer.php#
 Comments? Please send via e-mail to my work e-mail address which can be found here, and let me know if you would like your comment posted on this blog.


Housekeeping: change in focus from OA to AI

Since its inception, the focus of The Imaginary Journal of Poetic Economics (IJPE) has been on open access while my original intention was always the broader topic of poetic economics (radical rethinking of how we can make use of the resources available to us to create a better world). As you may have noticed, my most recent IJPE post on OA is dated 2020. For a more complete set of my research on OA over the past decade, see Sustaining the Knowledge Commons (completed in 2022). As of October 2023, the focus of IJPE, reflecting my current research, is shifting towards information policy and in particular policy-related aspects of artificial intelligence (AI).

Thursday, October 01, 2020

Dramatic Growth of Open Access September 30, 2020

While many aspects of our lives and activities have slowed down during the COVID pandemic, this has not been the case with open access! The OA initiatives tracked through this series continue to show  strong growth on an annual and quarterly basis. Important milestones are being reached, and others will be coming soon.

Highlights

The Directory of Open Access Journals now lists over 15,000 fully open access, peer reviewed journals, having added 379 journals (> 4 per day) in the past quarter, and now provides searching for over 5 million articles at the article level. 

 A PubMed search for "cancer" limited to literature from the past 5 years now links to full-text for over 50% of the articles.

The Bielefeld Academic Search Engine now cross-searches over 8,000 repositories and will soon surpass the milestone of a quarter billion documents

Anyone worried about running out of cultural materials during the pandemic will be relieved to note that the Internet Archive has exceeded a milestone of 6 million movies in addition to over 27 million texts (plus audio, concerts, TV, collections, webpages, and software).

Analysis of quarterly and annual growth for 39 indicators from 10 services reflecting open access publishing and archiving (Internet Archive, Bielefeld Academic Search Engine, Directory of Open Access Books, bioRxiv, PubMedCentral, PubMed, SCOAP3, Directory of Open Access Journals, RePEC and arXiv) demonstrates ongoing robust growth beyond the baseline growth of scholarly journals and articles of 3 - 3.5 per year. Growth rates for these indicators ranged from 4% - 100% (doubling). 26 indicators had a growth rate of over 10%, 15 had a growth rate of over 20%, and 6 had a growth rate of over 40%. The full list can be found in this table.

Thank you to everyone in the open access movement for continuing the hard work that makes this growth possible.

The open data edition is available here:   

Morrison, Heather, 2020, "Dramatic Growth of Open Access Sept. 30, 2020", https://doi.org/10.5683/SP2/AVBOW6, Scholars Portal Dataverse, V2 

This post is part of the Dramatic Growth of Open Access Series.  

Cite as: Morrison, H. (2020). Dramatic Growth of Open Access September 30, 2020. The Imaginary Journal of Poetic Economics https://poeticeconomics.blogspot.com/2020/10/dramatic-growth-of-open-access.html


Friday, January 03, 2020

Dramatic Growth of Open Access 2019

 2019 was another great year for open access! Of the 57 macro-level global OA indicators included in The Dramatic Growth of Open Access, 50 (88%) have growth rates that are higher than the long-term trend of background growth of scholarly journals and articles of 3 - 3.5% (Price, 1963; Mabe & Amin, 2001). More than half had growth rates of 10% or more, approximately triple the background growth rate, and 13 (nearly a quarter) had growth rates of over 20%.

Newer services have an advantage when growth rates are measured by percentage, and this is reflected in the over 20% 2019 growth category. The number of books in the Directory of Open Access Books tops the growth chart by nearly doubling (98% growth); bioRxiv follows with 74% growth. A few services showed remarkable growth on top of already substantial numbers. As usual, Internet Archive stands out with a 68% increase in audio recordings, a 58% increase in
collections, and a 48% increase in software. The number of articles searchable through DOAJ grew by over 900,000 in 2019 (25% growth). OpenDOAR is taking off in Asia, the Americas, Africa, and overall, with more than 20% growth in each of these categories, and SCOAP3 also grew by more than 20%.


The only area indicating some cause for concern is PubMedCentral. Although overall growth of free full-text from PubMed is robust. A keyword search for "cancer" yields about 7% - 10% more free full-text than a year ago. However, there was a slight decrease in the number of journals contributing to PMC with "all articles open access", a drop of 138 journals or a 9% decrease. I have double-checked and the 2018 and 2019 PMC journal lists have been posted in the dataverse in case anyone else would like to check (method: sort the "deposit status" column and delete all Predecessor and No New Content journals, then sort the "Open Access" column and count the number of journals that say "All". The number of journals submitting NIH portfolio articles only grew by only 1. Could this be backtracking on the part of publishers or perhaps technical work underway at NIH?

Full data is available in excel and csv format from: Morrison, Heather, 2020, "Dramatic Growth of Open Access Dec. 31, 2019", https://doi.org/10.5683/SP2/CHLOKU, Scholars Portal Dataverse, V1

References

Price, D. J. de S. (1963). Little science, big science. New York: Columbia University Press.
Mabe, M., & Amin, M. (2001). Growth dynamics of scholarly and scientific journals. Scientometrics, 51(1), 147–162.
This post is part of the Dramatic Growth of Open Access Series. It will be cross-posted to Sustaining the Knowledge Commons.
 Cite as:  Morrison, H. (2019). Dramatic Growth of Open Access 2019. The Imaginary Journal of Poetic Economics  https://poeticeconomics.blogspot.com/2020/01/dramatic-growth-of-open-access-2019.html

Tuesday, October 01, 2019

Dramatic Growth of Open Access October 1, 2019 dataset available

The October 1, 2019 dataset for the Dramatic Growth of Open Access is now available at: Morrison, Heather, 2019, "Dramatic Growth of Open Access October 1, 2019", https://doi.org/10.5683/SP2/EZQ1OK, Scholars Portal Dataverse, V1


The dataset is in excel format and is easy to manipulate to create custom growth charts or to calculate growth rates for particular services. For example, the number of texts (books) in the Internet Archive as of October 1, 2019 is 21,521,063, up from 21,070,269 on June 30. That's a growth rate of over 450,000 free books in just one quarter! If you divide this amount by 92 (the number of days in one quarter), that's a growth rate of 4,900 books per day (or close to 5,000 books per day).

If anyone is using this data in a creative way and would like to share with others, please let me know in the comments or via e-mail. 

Monday, December 31, 2018

2018: best year yet for net growth of open access

The March 31, 2019 full data is available for download here

Highlights: this edition of the Dramatic Growth of Open Access features charts that illustrate that 2018 showed the strongest growth to date for open access by number of documents searchable through BASE, PubMedCentral, arXiv, DOAJ, texts added to Internet Archive, and journals added to DOAJ.


A Bielefeld Academic Search Engine (BASE) search encompasses over 19 million more items at the end of 2018 - about 60% or 11.4 million are open access. This brings the total documents searchable through BASE to close to 140 million (about 84 million open access)


PubMedCentral added 600,000 items in 2018, and surpassed a milestone of 5 millions items this year (now 5.2 million items)











arXiv added 140,000 items in 2018, bringing the total close to 1.5 million items.











The DOAJ article search grew by more than 800,000 articles in 2018, bringing the total number of articles searchable through DOAJ to about 3.6 million.









2018 was also the best year to date for DOAJ net journal growth. 1,707 journals were added for a current total of over 12,000 journals. Negative growth in 2016 illustrates the impact of the DOAJ weeding / re-application process.








4.5 million more texts are available through Internet Archive, bringing the total close to 20 million.










The following table provides data on total number of items as of December 31, 2018, growth in 2018 by number and percentage, in descending order by growth in percent. In interpreting percentage growth, consider total and numeric growth. bioRxiv nearly doubled in size this year, indicating a fairly new but healthy and rapidly growing service; but this reflects growth of about 20 thousand documents, a small fraction of the 600,000 items added by PMC for a 13% growth rate.

2018 growth (percent) 2018 total 2018 growth (number)
110% bioRxiv # articles  39,570 20,748
74% Internet Archives software 346,320 147,320
39% SCOAP3 # article 25,163 7,121
30% Internet Archive texts 19,570,789 4,570,789
30% DOAJ searchable articles 3,624,154 832,453
29% Internet Archive audio (recordings) 4,909,271 1,109,271
28% DOAB # books 13,253 2,938
25% Internet Archive collections 389,778 76,778
24% Internet Archive videos (movies) 4,701,129 901,129
21% DOAJ journals searchable at article level 9,479 1,670
16% PubMed keyword search: cancer- last year - free fulltext 65,766 9,154
16% DOAJ # journals 12,434 1,707
16% BASE # documents 139,476,029 19,092,606
16% Internet Archives television 1,733,000 233,000
15% DOAB # publishers 285 38
14% PMC journals some articles OA 758 94
13% PMC # items 5,200,000 600,000
13% RePEC books 39,086 4,449
12% RePEc journal articles 1,785,335 193,994
12% PubMed keyword search: cancer- last 2 years - free fulltext 153,875 16,026
11% BASE # content providers 6,732 694
11% Internet Archive webpages (in billions) 345 35
11% RePEC online (fulltext) (downloadable as of March 2012) 2,528,831 249,692
11% PubMed keyword search: cancer- last 5 years - free fulltext 391,691 37,230
10% arXiv  http://arxiv.org/  1,482,864 140,139
10% OpenDOAR http://www.opendoar.org/ # repositories 3,799 335
9% RePEC chapters 51,278 4,360
9% PMC journals selected articles 4,908 414
8% RePEc working papers 858,360 64,235
8% Total Policies (ROARMAP) 960 71
8% PubMed keyword search: cancer - free fulltext 1,027,541 75,655
7% PMC journals immediate free acccess 1,964 132
7% DOAJ # countries 129 8
7% PubMed keyword search: cancer - last year - all results 184,024 11,341
6% PMC journals deposit all articles 2,217 124
6% Elektronische Zeitschriftenbibliotek - Electronic Journals Library  # journals that can be read free of charge 62,681 3,441
5% PubMed keyword search: cancer - last 5 years - all results 839,960 43,565
5% PMC journals actively participating 2,578 132
5% PubMed keyword search: cancer - all results 3,784,638 192,126
5% PubMed keyword search: cancer - last 2 years - all results 357,370 17,970
4% RePEc software components 4,206 178
4% Internet Archive live music (concerts) 192,534 7,534
3% PMC journals all articles OA 1,529 51
3% ROAR # repositories 4,735 138
2% PMC journals NIH portfolio 335 6
-12% Internet Archive images 3,247,253 -452,747


Full data can be downloaded from the Dramatic Growth of Open Access dataverse: https://hdl.handle.net/10864/10660. This post is part of the Dramatic Growth of Open Access series. From 2004 - June 30, 2018 the series was posted on a quarterly basis. As of September 30, 2018, I continue to gather data quarterly but plan to release the series less frequently, most likely on an annual basis.