The Open Algorithms Paradigm Proposes Better Insights into User Behavior
The Open Algorithms or OPAL paradigm wants to address the increasing need for individuals and organizations to share data in a privacy-preserving manner. Participants in the ecosystem will be able to obtain better insights through a collective sharing of algorithms, secured through a trust network.
Our digital world is fuelled by a massive generation of data from billions of digital devices, cameras, credit cards, and satellites. The collection and storage of digital information now encompasses a key aspect for all services and businesses. This mass datafication involves the physical world — be it weather, climate, the biosphere, human activity, which includes DNA, vital signs, consumption, credit history, and social connectivity, covering communications, economies, and mobility. Data literacy is now a fundamental human capability, as much as a professional skill set. It is in fact an enabler and marker of human agency.
Yet, little of this invaluable data is being used to its full potential for improving people’s lives. The ability to engage in society through and with data is essential for designing better public policies, interventions, and products. This is because building local capacities, connections, and shaping technological, political, ethical, and legal frameworks will govern the collection, control, and use of big data for social progress.
A growing trust deficit
But, a hydra-like monster seems to have been borne of these trillions of bytes of data, and its management. Today, serious concerns and consequences around privacy, fair use, and biased analysis are emerging. The problem lies with data being silo-ed within organizational boundaries. The sharing of raw data with parties outside an organization remains unattainable, due to regulatory constraints or business risks.
Private companies do not realize the public good their data is capable of delivering — including how they could themselves benefit from opening up some of their data, and help grow economies, prevent epidemics, and so on. Commercial, ethical and legal incentives also prevent them from opening their data further. At the same time, not all data should be open. Personal data collected through social, mobile, sensors, and connected devices, which inform quite an accurate picture of any human life, needs to be protected and secured. The major debates revolve around four major issues:
01 Privacy is inadequately addressed
Rapid technological changes and commercialization of personal data is undermining end-user confidence and trust. Current technologies and laws fall short of providing for a functional digital economy. And, the risks and liabilities exceed the economic returns, due to which personal privacy concerns remain inadequately addressed.
02 Algorithms operating as black boxes
Algorithms can be extremely complex, and opaque. But they are useful in bettering day-to-day life by simplifying the complexities of human life. The concern that algorithms operate as black boxes that can embed and entrench biases and discriminations has gained ground, feeding into people’s demand for greater control over the use of their data.
03 Challenges in the identity and access management space
Identity is tied to specific services leading to an unmanageable proliferation of user-accounts on the Internet. Add to this the massive duplication of data, across numerous service providers. The result is a user who has little knowledge about what, where, or how the data is collected, and used. With little or no control over other usages of their data, the trust in data holders has diminished further.
04 Misalignment of incentives
Customer-facing service providers have access only to poor quality user data. It is typically obtained from data aggregators, who in turn collate an incomplete picture of the user through various backchannel means. This incurs a high cost to service providers for new customer on-boarding, and low or inferior predictive capabilities.
The OPAL Project solution
To address the complex challenges of data access, enter the Open Algorithms or OPAL paradigm — a collaborative project developed by a number of partner organizations, which include the MIT Media Lab, Data-Pop Alliance, Imperial College London, World Economic Forum, supported by Agence Française de Development and the World Bank.
OPAL seeks to leverage the power of big data to solve public problems by developing a platform to unleash the power of big data held by private companies for public good in a privacy preserving, commercially sensible, stable, scalable and sustainable manner.
OPAL’s core will consist of an open technology platform and open algorithms running directly on the servers of partner companies, behind their firewalls. Accessible via an API it will provide access to statistical information coming from anonymized, secured and formatted data.
By sending the code to the data, then the other way around, OPAL seeks to address challenges, spur dialogue, and develop data services based on greater trust between all parties involved — users, data providers, analysts, and private corporations during the development of algorithms. And instead of exchanging static or fixed attributes, algorithms for specific datasets must be vetted to be fair, free from bias, and preserve privacy.
How will OPAL operate
The key concepts and principles underlying the open algorithms paradigm are:
01 Moving the algorithm to the data
Instead of pulling raw data into a centralized location for processing, it is the algorithms that should be sent to the data repositories and be processed there.
02 Raw data must never leave its repository
Raw data must never be exported from its repository, and must always be under the control of its owner.
03 Vetted algorithms & safe answers
Algorithms must be vetted to be ‘safe’ from bias, discrimination, privacy violations, and other unintended consequences. The data owner/provider must ensure that the published algorithms have been thoroughly analyzed for safety and preservation of privacy.
05 Trust Networks
In group-based information sharing configuration, referred to as the ‘Trust Network for Data Sharing Federation’ — algorithms must be vetted collectively by the trust network members.
06 Consent for algorithm execution
Data repositories that hold subject data must obtain explicit consent from the subject when this data is to be included in a given algorithm execution. Consent should be unambiguous and retractable.
07 Decentralized Data Architectures
By leaving raw data in its repository, the OPAL paradigm can provide for a decentralized architecture for data stores. These architectures based on standardized interfaces/APIs should be applicable to personal data stores as legitimate end-points, applicable regardless of the size of the data set.
Future directions
Currently, there is an accelerated application and interest in the use of AI and machine learning (ML) techniques, with fairness and accountability being a concern to advancements in ML, for obtaining better insights into data for various use-cases. Since the key focus is on ensuring non-discrimination, transparency, and understandability of data, better decision-making will be boosted.
For the OPAL paradigm, another possible area of interest is going to be ‘Distributed Machine Learning’. The principle of leaving raw data in their repositories points to deployment architectures based on distributed data stores and distributed computation. Corresponding to this architecture is the use of machine learning techniques in a distributed manner to improve performance.
And algorithms once vetted to be safe to run against a given data set, can be expressed as smart contracts for a given blockchain system or distributed ledger platform. Here a smart contract is defined to be the combined executable code and legal-prose, digitally signed and distributed on the P2P nodes of a blockchain system.