If most organizations are aware of the new European legislation GDPR few have thought through how this framework will change the practice of Data Science.
The General Data Protection Regulation will condition what and how personal data can be used commercially. Effective May 25th, 2018, the legislation will apply to all public and private organizations that process data of European citizens, regardless of their base of operations. The penalties for non-compliance can reach €20 Million or four percent of the organization’s annual turnover, whichever is greater.Let’s explore why GDPR should command your full attention, what the legislation is all about, how it will impact the practice of data science, and what you can do to turn this “threat” into an opportunity.
Data is no longer a simple by-product of a business process, it’s the fuel of modern economies. If data has traditionally been collected to provide a mirror of the realities of our organizations and markets, it is increasingly being leveraged today to amplify the tiny details of how individuals purchase and consume products and services. As a result, organizations use information technologies to create platforms to reveal, capture and analyze consumer experiences. The value of the resulting data depends less on how precisely it describes the relationships between consumers, goods, and services, than in use in building scenarios to predict and influence human behavior.
This constant pursuit of data profoundly impacts individual’s privacy and the confidentiality — hardly anything that an individual does, says, or even thinks is safe from public scrutiny. Europe’s General Data Protection Regulation explicitly recognizes these dangers in attempting to regulate the commercial use of private and sensitive data . Private data is defined as any information that can be used to identify and individual European consumer, as well as any data can be used to identify a person on an individual basis (personally identifiable information). Sensitive data include descriptions of an individual’s health, religion, ethnic background, political or sexual orientations that could be used to discriminate between individuals. Finally, a special prevision protects children generally under 16 years old- none of their personal data can be collected without explicit parental consent.
GDPR introduces a European citizen’s bill of digital rights. Data subjects (citizens) will now have the right to know what personal data concerning them is being collected, where and for what purpose. Citizens will have right to be forgotten by requesting that organizations delete their personal data, and/or cease to process or further disseminate their data. Finally, citizens may recover the personal data provided to an organization in a ‘commonly used and machine-readable format’ and may transfer this data to a third party.
Corporations, small businesses, national governments, and territorial administrations will be required to institute processes and internal record keeping requirements to insure compliance with these new regulations. These organizations, whether they are data collectors or data processers, will be required to implement the concept of privacy by design — which is based on the principle that data protection should be built into the very core of their information systems. Organizations will be required to collect only the data absolutely necessary for the business (data minimization), and to limit the access to personal data only to those needed to process it. Finally, all companies must inform their customers within 72 hours of any breach notification that might endanger “individual rights and liberties”.
In the next five months, businesses serving European citizens will need to institute an action plan to meet the minimum requirements for GDPR compliance. This begins with the designation of a project leader, or Data Protection Officer, for overseeing the data protection strategy and implementation. His or her project team will need to identify and analyze for personal data is currently being captured, stored, and processed. They will need to understand how the different organizational stakeholders process the data either within the organization or through third party subcontractors. They will need to propose the measures and the means needed to meet the legislative requirements. Finally, they will need to draw up and implement the necessary tasks and processes to insure compliance.
At first glance the vision of data privacy set out in GDPR appears diametrically opposed to a data scientist’s fundamental responsibilities of acquiring new data sources and of developing new use scenarios. Concretely, the legislation will impact the practice of Data Science in three areas: by imposing limits on data processing and consumer profiling, by imposing a “right to an explanation” when organizations use automated decision-making to evaluate credit applications, recruitment, and insurance decisions, and by holding organizations accountable for bias and discrimination in automated decisions.
None-the-less, the practice of Data Science will benefit in the long run from these constraints. Organizations will need to encourage data science processes based on robust anonymization of the data. Data scientists will need to take steps to prevent indirect bias from proxy variables, multicollinearity, and other causes to limit discriminatory outcomes. Finally, data scientists will need to concern themselves with data lineage in documenting the flow of data through all processing steps from source to target.
Lee Schlenker is a Professor at ESC Pau, and a Principal in the Business Analytics Institute http://baieurope.com. His LinkedIn profile can be viewed at www.linkedin.com/in/leeschlenker. You can him on Twitter at https://twitter.com/DSign4Analytics.