“Our machines are starting to speak a different language now, one that even the best coders can't fully understand. […] With machine learning, the engineer never knows precisely how the computer accomplishes its tasks. The neural network's operations are largely opaque and inscrutable. It is, in other words, a black box. […] If in the old view programmers were like gods, authoring the laws that govern computer systems, now they're like parents or dog trainers. And as any parent or dog owner can tell you, that is a much more mysterious relationship to find yourself in.”
Fascinated by the new horizons |
Data ownership considerations
Discussions around proprietary rights in data are as complex and diverse as the phenomenon of data itself. Can data be owned at all? Who owns it? And what is the scope of the ownership monopoly? What legal mechanism would be best suited to protect data ownership? What implications do personal data protection and competition/antitrust laws may have on data ownership regime?
It may be considered prejudicial to the public interest to grant a monopoly over control of data to a specific entity or individual, thereby depriving (i) users from free access to ‘building blocks’ of knowledge and innovation, (ii) data itself from being enriched and improved, and ultimately (iii) society committed to ‘the progress of science and useful arts”. Against this view, one can also argue that protection of data producer interests, coupled with the creation of an economic incentive in data production, are equally justifiable goals.
The entire debate boils down to categorisation of data. There is machine-generated/industrial data or data involving (usually wholly automated) measurements. And then there is processed data that results from a skilled analysis, such as inferences about person’s online behaviour.
Copyright law is clear – no protection is available for raw facts or discoveries – only for and to the extent of an original expression thereof. However, Teresa Scassa argues that the concept of fact should not be conflated with data and more robust protection should be offered to data than facts:
“often overlooked feature of data is their non-neutrality […], [d]ata inherently reflects choices – choices about which choices about which data to collect (or to exclude) and what tools or parameters will be used in their collection. […] These choices reflect the human agency present in the creation of data.”Notably, in a few instances the courts have reached a conclusion that certain data might overcome the threshold of copyrightablity. In New York Mercantile Exchange, the US Court of Appeals for the Second Circuit determined that the settlement prices that are continuously generated by computer algorithm were sufficiently creative, but failed to attract copyright protection due to the merger doctrine. More recently, a similar conclusion was reached by the US District Court in BanxCorp v Costco.
To address the shortcomings of copyright law, it has become a prevalent business practice to seek contractual protection of data as confidential information. To avoid any risk taking, most commercial agreements label data as proprietary to the data originator and then go on to also specify that any proprietary materials are confidential information, hence subject to use and dissemination restrictions. In contrast to copyright, confidentiality provides a strong protection for data both in its scope and duration. However, this legal vehicle may not equally serve all types of data.
The European Union has demonstrated the preference for a sui generis route for data protection. The Database Directive introduced a new database right, the effectiveness of which, however, continues to be questioned, see here.
In 2017, the European Commission published a working document and subsequently a position paper on the proposal for a new right in machine-generated and non-personal data that is not arranged into a database - the so called data producer’s right. It would encompass—
“the exclusive right to utilise certain data, including the right to licence its usage. This would include a set of rights enforceable against any party independent of contractual relations thus preventing further use of data by third parties who have no right to use the data, including the right to claim damages for unauthorised access to and use of data.”Professor Bernt Hugenholtz is not that optimistic about how such right might fit in the IPR realm:
“[I]ntroducing such an all-encompassing property right in data would seriously compromise the system of intellectual property law that currently exists in Europe. It would also contravene fundamental freedoms enshrined in the European Convention on Human Rights and the EU Charter, distort freedom of competition and freedom of services in the EU, restrict scientific freedoms and generally undercut the promise of big data for European economy and society. In sum, it would be a very bad idea.”
“There’s no organization in the world that could manage eleven million lines of code generated over eight months. There’s no possible way that we could understand how machine learning systems work, given that their architects can’t understand them because all they do is they throw neural nets together in a cookbook, and stuff comes out. There’s no way that all these projects could manage that reduction of fragmentation that you’re talking about unless they could all see one another’s code and understand what the common denominators were in their technical approaches. We’d lose our minds if we didn’t have open source.”
Based on the statements here, that data is better than oil, I wonder if data is a Giffen Good? These are unicorns of economics - commodity goods (i.e., not luxury goods) where consumption rises as the price rises. There aren't very many (or any?) perfect examples of Giffen Goods, but in this case, the value of data increases as the amount of data increases. It seems to fit....
ReplyDeleteYou mean "FiveThirtyEight", not "ThirtyFiveEight"
ReplyDelete