Tuesday, October 09, 2012

CC-BY: the wrong goal for open access, and neither necessary nor sufficient for data and text-mining

An argument that I see as important but missing: is CC-BY even an appropriate goal for open access at all? This is a separate question from whether it should be a short or long-term priority.

I argue that CC-BY is NOT an appropriate goal for open access. There are many reasons for this argument, too many for one post, so I'll start off by challenging the assumption that CC-BY is what is needed for data and text mining.

1.    CC-BY is not necessary for data and text-mining. Internet search engines such as google and social media companies do extensive data and text mining, and they do not limit themselves to CC-BY material. This is true even in the EU, so is not prevented by the EU's support for copyright of data. To illustrate: if data and text-mining is not permissible without CC-BY, then Google must shut down, immediately.

2.  CC-BY is not sufficient for data and text-mining. The Creative Commons licenses are designed as a means for creators to waive rights that they would otherwise have under copyright; they do not place any obligations on the Licensor. There is nothing to stop a creator from using a CC-BY license with a locked-down PDF with extra DRM designed to prevent data and text-mining.

One of the reasons that it is important to begin giving such questions greater attention and analysis is funders' policies requiring CC-BY. If authors and their publishers adopt CC-BY through coercion rather than choice, the actual practice may differ considerably from earlier open access initiatives involving voluntary use of this license.

This argument leaves aside the question of whether allowing for ubiquitous data and text-mining is actually beneficial for scholarship. My perspective is that this is unknown, and it is premature to prescribe data and text-mining for all scholarly works until after a fuller exploration of this question. As one counter-example, consider that allowing data-mining and remix of health information can compromise privacy.

This is one of the topics that I begin to address in my draft dissertation, Freedom for Scholarship in the Internet Age. The defence draft is available for download from here:

See chapter 4 on open access and chapter 8, conclusions. These arguments are not meant to be exhaustive, but rather an illustration of the potential of the societal trend that I call irrational rationality to actually make things worse for scholars and scholarly communication in the transition to open access.


Heather G. Morrison
Open Access Advocate / Opponent of CC-BY Coercion
The Imaginary Journal of Poetic Economics
This was posted today to the GOAL Open Access List.