With the constantly decreasing threshold to gather, process and store more and more data points, ever more bits and pieces of information are translated into bytes and stored away on the never-ending harddrives of the so called "cloud".
Undeniably, there is great potential in data. However, the question needs to be asked: How much data is too much data?
From abundance to obesity
The term "data obesity" (along with the equally brilliant "digital fitness") was presented in a recent article by Owen Thomas. Underlining the general importance of data for the solution of problems, he also asks critically:
But what if the abundance of data, too, is part of the problem?
Gathering data has become so cheap and easy that it is now common practice to preferably collect a bit extra rather than too little. Similar to eating twice the amount of what is healthy just because it is cheap, little thought is being spent on the consequences (forms of "obesity", in both instances).
Now, obviously and thanks to smart database systems, the impact of data obesity on a system should not be as severe as physical obesity in a human being. It is mainly dead data, stored away just because its storage costs are marginal.
Yet, there are good reasons to consider a data collection "diet" - beyond the plain issue of storage and handling.
MAD - the "Minimum Actionable Dataset"
Thomas refers to a suggestion by Chris Stacy, who sets out from the thought that the voracious hunger for data has a serious impact on people's lives:
By this point, all thinking humans have come to accept that there are ethical problems with the current ungoverned way businesses, governments and organizations capture, store, use and monetize data.
Not condemning data and its storage in general, Stacy presents the concept of the "Minimum Actionable Dataset" - a philosophy to design systems based on a critical assessment of the minimum amount of data required to provide a service and to not collect any excess information:
When defining the dataset your application, site or service is going to capture, store and use - start with the minimum set of data needed to make critical product and business decisions. Only add additional data when it is required to make these decisions.
The arguments in favour of the MAD approach are not only his concern over future regulation, but that it could be a cure for lost trust in technology by apprehensive individuals.
From an indie tech advocate's point of view, the biggest argument however would be to put people back in control over their own information by collecting only what is absolutely necessary for the task a service promises to the user.
Meta data is data, too
Thomas Marzano, Head of Brand Design at Philips, found himself in the defensive (as highlighted in a note by Jeremy Keith) at July's Indie Tech Summit in Brighton when he presented what could be viewed as a big corporation's attempt at implementing MAD: a health-monitoring product that only keeps processed meta data rather than the sensors' raw data.
Today we can't know what we will be able to learn from that raw data in ten years' time. [...] You are not able to go back to the raw data to extract something you weren't able to extract before.
Highlighting how the decision to reduce the amount of stored data protects the user from future analyses they may not have opted in for, the critique from the panel was that meta data is data as well, and its storage just as questionable - a discourse highlighting the most obvious (apparent?) obstacle for reducing the amount of stored information, as it challenges existing data-based business models.
Data policy as part of the design
Finding new ways to build concepts and business models around this vision - to combine the user worth gained from data independence with (business) value for service providers - is one of the great frontiers in technology development today.
In his MAD proposal, Stacy goes as far as to call for an all new role in design processes:
Most companies do not have a Head of Privacy or a Head of Data Policy. [...] their role isn't to "protect the user" or "keep us in legal compliance" as much as it is to "defend the minimum threshold."
In other words... their job in this process is to be the one who challenges every new data point captured and requires a justification for why the business and the product couldn't be run effectively without that data.
In the fight to reverse the trend of excessive and uncontrolled storage of personal data and to put its human owners into focus, a discussion on "data obesity" and approaches like MAD should be part of any design process involving user data, as should considerations of alternative distributed solutions where users store personal data solely under their own control.