After my Goodreads disaster I went and got dumps from every cloud service I care about that I could think to try. 13 in total, Twitter and Facebook and others. I'm impressed with the results.

The best of the data exports comes from Google Takeout. They were a pioneer in making a proper product out of data export and the Google Data Liberation Front did a lot of activism both within Google and externally to sell the idea. It's not an obvious thing for a company to do; letting customers download all their data opens the door to competitors. But it's the decent and right thing to do and it allows your power users to do complex things without much support.

Data export is also increasingly the legally required thing to do. The GDPR enshrines a right to data portability in the law governing businesses in the EU. California's CCPA also has a data access right. It's a little weaker than GDPR's but a lot of sites seem to just provide GDPR to everyone, or at least to Californians. These are excellent regulations; they protect consumers and enable competition. They do put a regulatory burden on the companies implementing them but it's not too huge and the technical infrastructure has other uses too. (Imagine, Goodreads could have backups of user data!)

One thing I hadn't appreciated is how hard it is to build something to use the data. Recreating a product like Goodreads or Gmail is a lot of work! In practice the exports seem most useful when some other commercial service is designed to import them. There's not a big ecosystem of open source tools to work with export data. Some of the data exports I got are pretty rough, low level dumps in CSV or JSON format. Then again Twitter has a whole working live webapp, you can browse and search nicely formatted tweets right from the files.

My Google data is the most valuable to me; I wrote up notes on what I found in my 67GB export. It's impressive; Takeout covers some 50+ Google products, many of which have done a thoughtful job making their exports another designed product feature. Not only did the Data Liberation Front get the company to export the data but they created an infrastructure and culture of supporting and improving those exports. It's a good thing.

  2022-03-07 21:58 Z