On 18-Mar-12, at 5:07 AM, David Prosser wrote:
Say I wanted to data mine 10,000 articles. I'm at a university, but I am co-funded by a pharmaceutical company and there is a possibility that the research that I'm doing may result in a new drug discovery, which that company will want to take to market. The 10,000 articles are all 'open access', but they are under CC-BY-NC-SA licenses. What mechanism is there by which I can contact all 10,000 authors and gain permission for my research?
First, before I comment on intellectual property issues, I would like to point out that the concept of "intellectual property" is a relatively recent invention, and one that arguably should be challenged. For details, see the second chapter of my draft thesis; from here, search for: The invention of “intellectual property”: enclosure of knowledge. Also, a disclaimer that I am a scholar whose work intersects with intellectual property issues, but not a copyright lawyer or expert. Given that the arguably fictional "intellectual property" is legally nonfiction throughout most of the world, following are some reflections arising from David's example.
Copyright covers the expression of ideas, not the ideas themselves. If a researcher employed by a pharmaceutical firm were to read 10,000 articles and this research resulted in an idea for a new drug, the pharmaceutical firm would not need to seek permission from any of the authors of the articles in order to apply for a patent. Text-mining is merely an automated form of reading, so again, no need to seek permission from authors to apply for a patent. The World Intellectual Property Organization (WIPO) provides a brief overview of intellectual property which explains well the various forms. In brief, there are about 5 forms of intellectual property, many of which actually have opposing expectations. Patent law is a public declaration of rights to use an idea or procedure, and openness is appropriate. Patent law is designed to protect rights to private profit. Trade secret law is also designed to protect private property, however in this case the protection is achieved through secret, private means rather than a public, open process.
The question of whether copyright permissions are, or should be, necessary for data or text mining is an important issue to address when considering libre open access (including broader re-use rights in contrast to the free-to-read gratis open access). I argue that no special copyright related permissions are necessary. As evidence, here is a quick illustration:
Try a google search for: "To pursue, within the limits of the STM Association's aims and objectives, the highest possible level of international protection of copyright works and of the services of publishers in making these works available" and it should be quite easy to find the Introduction to Copyright & Legal Affairs of the International Association of Scientific, Technical and Medical Publishers (STM): http://www.stm-assoc.org/copyright-legal-introduction/ There is nothing on the STM website to indicate that special rights have been granted for text mining. STM is certainly not naive or neutral about intellectual property rights; the founding reason for the existence of STM in protection of IP. Yet clearly Google, a commercial company, is crawling this site and returning results. There is nothing the slightest bit exceptional about this example. This is how the world wide web works! If anyone wants to post things on the web but not make them available for crawling, it is up to the website owner to opt out by indicating that they do not want their site crawled.
Some subscription-based scholarly publishers do not allow text or data mining of their databases. It seems likely that they are interpreting the multiple downloads often involved as pirating of their copyrighted content. That is, the basis for refusing to allow text or data mining is interpretation of the activity as a violation of copyright - or fear that the publisher cannot allow text or data mining while simultaneously preventing copyright violation - not because text or data mining actually violates copyright. If publishers' products contain DRM preventing text or data mining, that is a different matter. Legal protection for the publishers in this instance involves DMCA style laws and contract law - not copyright law. Within the context of library subscriptions, data and text mining can be included in contracts. Here is the relevant text from the BC Electronic Library Network model license: 3.1.11 "DATA and TEXT MINING. Members and Authorized Users may conduct research employing data or text mining of the Licensed Materials". This language is not original with BC ELN, but rather developed based on research on other model licenses, including those of JISC, CRKN, and OCUL. In the real world, copying this kind of work with informal permission but without attribution is actually the norm, as we all want to work towards standards and avoid re-inventing the wheel.
What is needed to provide for data and text mining, I argue, is not changes to copyright but rather content made available in formats that are easily crawled for these purposes, such as xhtml rather than locked-down PDFs, and made openly available over the World Wide Web.
I understand that Europe (as a whole, or just some countries) may have some odd laws that would prohibit text and data mining. This may help to explain why people are trying to use copyright law as a means of ensuring permissions for text and data mining. I would like to know more about this; if anyone can provide details, links, etc., that would be most helpful for all of us to really understand the issues.
My first response to David Prosser's question, challenging the underlying assumption that increasing corporatization of the university is acceptable, can be found here.
Discussion is welcome.