Consumer websites need to be very careful about data deletion. There's a risk of an account being hacked and deleted without the owner's consent.

The GDPR includes a right to erasure, California's CCPA has a right to delete. These are good laws, they allow an individual to require a company delete all personal data they have on someone. However this right also contains a risk. What if someone unauthorized requests the deletion? Proper deletion cannot be undone, in theory even backups should be deleted.

One solution is to delay the deletion and make every effort to contact the user before it's done. Some users might interpret the delay as the company acting poorly but I think it's an important protection against accidental or malicious deletion. Facebook has had a reasonable system for this for many years now; when you delete an account you have 30 days to change your mind. As a side effect some Facebook users keep their accounts in a perpetual state of almost-deletion, the super-logoff. Even better if the user's data is hidden while in the delete-pending state.

I don't know the legal niceties of whether a company can inject a delay. The GDPR language talks about "without undue delay", which seems to leave room for a safety net. CCPA is explicit about businesses having 45 or 90 days to "respond to a request to delete".

This whole post is motivated by my Goodreads disaster. One explanation for what happened is someone could have hijacked my Goodreads account and then deleted it to hide their tracks. At first I was outraged my data could ever be deleted. But Goodreads would be correct to do that in response to a valid request for deletion. And it looks like Goodreads will delete irrevocably immediately. (I'm not certain.) If they'd put in a 30 day delay I would have noticed in time. Speculating about this scenario made me realize that instantaneous deletion is a dangerous feature for any product.

tech
  2022-03-26 15:07 Z

After my Goodreads disaster I went and got dumps from every cloud service I care about that I could think to try. 13 in total, Twitter and Facebook and others. I'm impressed with the results.

The best of the data exports comes from Google Takeout. They were a pioneer in making a proper product out of data export and the Google Data Liberation Front did a lot of activism both within Google and externally to sell the idea. It's not an obvious thing for a company to do; letting customers download all their data opens the door to competitors. But it's the decent and right thing to do and it allows your power users to do complex things without much support.

Data export is also increasingly the legally required thing to do. The GDPR enshrines a right to data portability in the law governing businesses in the EU. California's CCPA also has a data access right. It's a little weaker than GDPR's but a lot of sites seem to just provide GDPR to everyone, or at least to Californians. These are excellent regulations; they protect consumers and enable competition. They do put a regulatory burden on the companies implementing them but it's not too huge and the technical infrastructure has other uses too. (Imagine, Goodreads could have backups of user data!)

One thing I hadn't appreciated is how hard it is to build something to use the data. Recreating a product like Goodreads or Gmail is a lot of work! In practice the exports seem most useful when some other commercial service is designed to import them. There's not a big ecosystem of open source tools to work with export data. Some of the data exports I got are pretty rough, low level dumps in CSV or JSON format. Then again Twitter has a whole working live webapp, you can browse and search nicely formatted tweets right from the files.

My Google data is the most valuable to me; I wrote up notes on what I found in my 67GB export. It's impressive; Takeout covers some 50+ Google products, many of which have done a thoughtful job making their exports another designed product feature. Not only did the Data Liberation Front get the company to export the data but they created an infrastructure and culture of supporting and improving those exports. It's a good thing.

techgood
  2022-03-07 21:58 Z