At our latest Hot Stacks event with Algolia, we talked GDPR and Site Reliability. Watch the full event here. Our next one is June 26th on modernizing legacy applications. Check our Eventbrite page to stay updated and see past events.
GDPR has been a hot topic lately, and it officially goes into effect tomorrow, May 25, 2018. But what does it actually mean for developers - especially those of us who have side projects or are interested in starting a new startup? We talked to people who have become experts on the subject from Optimizely, Twilio, NodeSource, and Kickbox, to find out.
If you’d like to search or read the law yourself, Algolia released an awesome tool for that.
What is GDPR?
GDPR is a continuation of and improvement upon the 1995 Data Protection Directive by the EU, which set guidelines for how companies should process user data. Peter Shafton, of Twilio, notes that it gives you control over your data and “the right to be forgotten”:
“It talks about how companies handle that data. Is it encrypted at rest? Where does it go? Do we have the right to use your data for processing purposes? Can we analyze your data to do other things?”
Joy Scharmen, Director of DevOps at Optimizely, notes that besides the right to be forgotten, GDPR also grants consumers the right to know what data is being collected about them and the right to be notified in a timely manner if there is a breach of that data. She says that at Optimizely, it’s something they’ve had to think very carefully about, since they are processing a large volume of click data and other events.
According to Lance Stone, Senior Compliance Analyst at Kickbox, GDPR is “the most significant data privacy law that has ever come about”. He says that while other frameworks and guidelines have existed in the past, they were fairly “lackadaisical” in their approach and enforcement. For tech companies, the implications are significant, since many are collecting data in the course of providing their services. If the product you make for another company involves storing their customer data, you are responsible for it as well.
What changes do companies have to make as a result of GDPR?
Chris Lea, of NodeSource, says that engineers need to start adhering to “least-access” principles, meaning that any service you’re developing should access only the minimum amount of user data it needs to function. He advises that engineers also need to carefully consider how to silo that data so that if a customer requests access to it, you know where it all lives. And if they want to delete it right now, you have to be able to do that.
Twilio is a company that by design processes huge amounts of personal information. Their product requires them to know participants in a phone call, sometimes recording that audio if a customer has requested it. They have similar features for video and text as well. Shafton says the biggest piece of work has been implementing the ability to delete all of that data easily if requested.
Ensuring that data is encrypted at rest has also been a key piece of GDPR compliance for Twilio. Once data gets to their servers, it’s encrypted, and each customer has their own encryption key, so that if a breach does occur, that key can’t be used to decrypt other customers’ data. They’ve also established “team-based access control”. If you’re not on the messaging team, for example, you don’t have access to those servers. And when customer data records are accessed, there are audit logs so they can see who accessed which records and what time.
At Optimizely, Joy says much of her work has involved figuring out how much of the processes to automate. Customers have to have a way to put requests to access their data or delete it, so they’ve had to determine where those requests go, and once it’s there, how to actually handle the request:
“We have our core datastore for our app that has account metadata, and then we also have our event infrastructure, which is a giant Hbase cluster that is storing click data and other information about experiments and personalization data. Those have varying levels of personally identifiable information in them.”
“You could run a Hadoop job across the cluster and find maybe most of it, but you need to make sure that you're finding all of it. Then we need to make sure that it's deleted. Then we need to make sure that it's actually deleted.”
Since Kickbox is an email verification company, all of the data they process is going to be on behalf of their customers. This means they’ve had to work perhaps even harder to ensure absolute compliance. Lance reveals that some of it was actually pretty easy. File uploads for example are sent to S3, encrypted at rest, and deleted after 90 days. Other things like caches and log data took some more thought. Yes, even log data may contain personally identifiable information under GDPR:
“Under GDPR, ‘personal data’ is anything that is even partly identified to a natural born person. Things like IP address, geolocation, don't seem very important, but you actually have to protect that data just like you would a social security number.”
Do side projects or small side projects need to be GDPR compliant?
If you’re collecting any type of data on European citizens, the answer is yes. The good news, though, is that there are special provisions for smaller companies. If you have less than 250 employees, you don’t need to hire a compliance person, keep detailed audit logs, or fill out a ton of forms and reports. You do need to have a privacy policy that discloses what data you’re collecting, how you’ll use that data, and provide someone signing up to your service with a means to consent to data collection. It can just be a checkbox, but it can’t be checked by default.
Lance also points out that, while this is not legal advice, if you are making a concerted effort towards GDPR compliance, you’re probably fine for now. The data protection authorities (DPAs) have the authority to rule as they see fit, like a judge on a trial. However, “If you are egregiously abusing your data subjects' trust and storing data insecurely, in plain text, in a notepad, on your desktop, then you're probably going to get the hammer.”
His advice is to start now. He strongly suggests having a “point person” for your GDPR efforts. Someone needs to coordinate with engineers, executives, marketers, etc., and rally everyone together in an “all-hands” approach.
Chris, from NodeSource, also advises that if you’re a small startup using one of the big cloud infrastructure providers like AWS, Azure, or Google Cloud Platform, they typically have a lot of functionality to support GDPR compliance:
“It's very easy to store things encrypted at rest if you're using AWS. It's very easy to set up audit logs. If you're not learning how to use those tools along with the typical cloud services, you're probably hurting yourself in this regard.”
Will GDPR give bigger companies an advantage over smaller ones?
The overwhelming consensus is, no. Peter Shafton, of Twilio, thinks it’s actually the reverse:
“The big tech companies are in for a lot of trouble here. It's hard to mobilize their workforce to get these things done and they often to have access or even visibility into where all the data hides. I was speaking to Autodesk a year ago about their GDPR efforts, and they mentioned that they have software written 37 years ago - stuff that was sitting in Access databases, written in Pascal, that no engineers even knew how to read anymore. So they were sort of struggling. If they got an email address from a marketing campaign 30 years ago, and someone now says 'Delete me.', where is it stored?”
A small company with maybe just one or two databases should find it relatively easy to become compliant. Tech giants that need to isolate individual records from terabytes of data in an HDFS cluster will have a harder time.
If you use a third-party service, like an analytics tool, are you still responsible for that data?
GDPR outlines the duties of what is referred to as a “sub-processor”:
“The initial processor shall remain fully liable to the controller for the performance of that other processor's obligations”
If you’re using some third-party tool to process customer data, you are responsible for ensuring they handle the data in a GDPR compliant way. As Peter points out, “Just because you hand the data off doesn't relieve you of that responsibility.”
This can even be something like a file upload utility. At Kickbox, they use a tool like this to allow customers to upload contact lists. Lance says that you are responsible for what that tool does with the data just as if you were processing it yourself:
“So if they do something egregiously bad, your customer has every right to sue you, and they will.”
It’s extremely important to ensure that all of the tools and services you use are GDPR compliant if you want to avoid lawsuits and fines. The good news is that we’ll be releasing an easy way to check this with StackShare soon, so stay tuned!
For more information on GDPR, check out this excellent post by Bozho.