With its new open data licensing framework, announced on Tuesday, the Linux Foundation has created legal frameworks around sharing raw, unorganised data to tempt generous companies, nonprofits, government agencies and researchers to do so.
But an expert says their current ambiguity makes them risky, and others are concerned over licensing compatibility issues.
Mike Dolan, the Linux Foundation’s VP of strategic programs – who helped draft the licences – told El Reg that individuals or organisations working on machine learning, traffic flow or other data-heavy systems could gain a lot from sharing, such as improving algorithms and increasing resources.
But today (excluding sensitive data covered by law), you either keep your raw data a trade secret or release it with no IP restrictions, said Estelle Derclaye, an IP lawyer at the University of Nottingham. There are already comprehensive licence agreements for sharing and attributing data organised in a database (such as CC-BY, the Open Data Commons Open Database License, or the Open Data Commons Attribution License).
When Derclaye reviewed one of the two new licence agreements at The Reg‘s request, she told us: “I wouldn’t want to sign it.”
Why a new licence?
Dolan said the aim was “to ensure that data providers and users had clarity about their ability to curate, use, and share” in order to enable “the creation of open, collaborative data, collaborative data communities”. Drafting began during the third quarter of 2016 because of a perceived gap in one-shop licence agreements.
He gave the example of training a drone to fly autonomously – what if a dataset didn’t include any examples of trees, a user trained its drone on the data, and it crashed into one? Whose fault would that be?
One licence agreement requires that changes to data be shared….